Philip Chan CSI Crime Scene Investigation wwwcbscomshows csi high tech forensics tools DNA profiling Use as evidence in court cases DNA Deoxyribonucleic Acid Each person is unique in DNA except for twins ID: 515805
Download Presentation The PPT/PDF document "Forensics and CS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Forensics and CS
Philip ChanSlide2
CSI: Crime Scene Investigation
www.cbs.com/shows/
csi
/
high tech forensics tools
DNA profiling
Use as evidence in court casesSlide3
DNA
Deoxyribonucleic Acid
Each person is unique in DNA (except for twins)
DNA samples can be collected at crime scenes
About .1% of human DNA varies from person to personSlide4
Forensics Analysis
Focus on loci (locations) of the DNA
Values at the those loci (DNA profile) are recorded for comparing DNA samples.Slide5
Forensics Analysis
Focus on loci (locations) of the DNA
Values at the those loci (DNA profile) are recorded for comparing DNA samples.
Two DNA profiles from the same person have matching values at all loci. Slide6
Forensics Analysis
Focus on loci (locations) of the DNA
Values at the those loci (DNA profile) are recorded for comparing DNA samples.
Two DNA profiles from the same person have matching values at all loci.
More or fewer loci are more accurate in identification?
Tradeoffs?Slide7
Forensics Analysis
Focus on loci (locations) of the DNA
Values at the those loci (DNA profile) are recorded for comparing DNA samples.
Two DNA profiles from the same person have matching values at all loci.
More or fewer loci are more accurate in identification?
Tradeoffs?
FBI uses 13 core loci
http://www.cstl.nist.gov/biotech/strbase/fbicore.htmSlide8
We do not want to wrongly accuse someone
How can we find out how likely another person has the same DNA profile?Slide9
We do not want to wrongly accuse someone
How can we find out how likely another person has the same DNA profile?
How many people are in the world?Slide10
We do not want to wrongly accuse someone
How can we find out how likely another person has the same DNA profile?
How many people are in the world?
How low the probability needs to be so that a DNA profile is unique in the world?Slide11
We do not want to wrongly accuse someone
How can we find out how likely another person has the same DNA profile?
How many people are in the world?
How low the probability needs to be so that a DNA profile is unique in the world?
Low probability doesn’t mean impossible
Just very unlikelySlide12
Review of basic probability
Joint probability of two independent events
P(A,B) = ?Slide13
Review of basic probability
Joint probability of two independent events
P(A,B) = P(A) * P(B)
Independent events mean knowing one event does not provide information about the other events
P(Die1=1, Die2=1)
= P(Die1=1) * P(Die2=1)
= 1/6 * 1/6 = 1/36. Slide14
Enumerating the events
1
2
3
4
5
6
1
1,1
1,2
…
2
3
4
5
6
36 events, each is equally likely, so 1/36Slide15
Joint probability
P(Die1=even, Die2=6) = ?Slide16
Joint probability
P(Die1=even, Die2=6)
= 1/2 * 1/6 = 1/12
P(Die1=1, Die2=5, Die3=4) = ?Slide17
Joint probability
P(Die1=even, Die2=6)
= 1/2 * 1/6 = 1/12
P(Die1=1, Die2=5, Die3=4)
= (1/6)
3
= 1/216Slide18
DNA profile probability
How to estimate?Slide19
DNA profile probability
How to estimate?
Assuming loci are independent
P(Locus1=value1, Locus2=value2, ...)
= P(Locus1=value1) * P(Locus2=value2) * ...Slide20
DNA profile probability
How to estimate?
Assuming loci are independent
P(Locus1=value1, Locus2=value2, ...)
= P(Locus1=value1) * P(Locus2=value2) * ...
How to estimate P(Locus1=value1)?Slide21
DNA profile probability
How to estimate?
Assuming loci are independent
P(Locus1=value1, Locus2=value2, ...)
= P(Locus1=value1) * P(Locus2=value2) * ...
How to estimate P(Locus1=value1)?
a random sample of size N from the population and
find out how many people out of N have value1 at Locus1Slide22
Database of DNA profiles
Id
Locus1
Locus2
Locus3
…
Locus13
A5212
A6921
…Slide23
Problem Formulation
Given
A sample profile (e.g. collected from the crime scene)
A database of known profiles
Find
The probability of the sample profile if it matches a known profile in the databaseSlide24
Breaking Down the Problem
Find
The probability of the sample profile if it matches a known profile in the database
What are the subproblems?Slide25
Breaking Down the Problem
Find
The probability of the sample profile if it matches a known profile in the database
What are the
subproblems
?
Subproblem
1
Find whether the sample profile matches
1a: ?
1b: ?
Subproblem
2
Calculate the probability of the profileSlide26
Breaking Down the Problem
Find
The probability of the sample profile if it matches a known profile in the database
What are the
subproblems
?
Subproblem
1
Find whether the sample profile matches
1a: check entries in the database
1b: check loci in each entry
Subproblem
2
Calculate the probability of the profileSlide27
Simpler Problem for 1a (very common)
Given
an array of integers (e.g. student IDs)
a
n integer (e.g. an ID)
Find
whether the integer is in the array
(e.g. whether you can enter your dorm)
int
[] directory; // student id’s
i
nt
id; // to be foundSlide28
Linear SearchSlide29
Linear/Sequential Search
Check one by one
Stop if you find it
Stop if you run out of items to check
Not foundSlide30
Number of Checks (speed of algorithm)
Consider N items in the array
Best-case scenario
When does it occur? How many checks?Slide31
Number of Checks (speed of algorithm)
Consider N items in the array
Best-case scenario
When does it occur? How many checks?
First item;1 check
Worst-case scenario
When does it occur? How many checks?Slide32
Number of Checks (speed of algorithm)
Consider N items in the array
Best-case scenario
When does it occur? How many checks?
First item;1 check
Worst-case scenario
When does it occur? How many checks?
Last item or not there; N checks
Average-case scenario
Average of all cases
(1 + 2 + … + N) / N = [N(N+1)/2] / N = (N+1)/2Slide33
Matching DNA profiles
Each profile has 13 loci
Do we always need to check all 13 loci to decide if a match occurs or not?Slide34
Can we do better? Faster algorithm?
What if the array is sorted, items are in an order
E.g. a phone bookSlide35
Binary SearchSlide36
Binary Search
Check the item at midpoint
If found, done
Otherwise, eliminate half and repeat 1 and 2Slide37
Breaking down the problem
While more items and not found in the mid point
What are the two
subproblems
?Slide38
Breaking down the problem
While more items and not found in the mid point
Eliminate half of the items
Determine the mid pointSlide39
Number of checks (Speed of algorithm)
Best-case scenario
When does it occur? How many checks?Slide40
Number of checks (Speed of algorithm)
Best-case scenario
When does it occur? How many checks?
In the middle; 1 checkSlide41
Number of checks (Speed of algorithm)
Best-case scenario
When does it occur? How many checks?
In the middle; 1 check
Worst-case scenario
When does it occur? How many checks?Slide42
Number of checks (Speed of algorithm)
Best-case scenario
When does it occur? How many checks?
In the middle; 1 check
Worst-case scenario
When does it occur? How many checks?
Dividing into two halves, half has only one item
? checksSlide43
Number of checks (Speed of algorithm)
T(1) = 1
T(N) = T(N/2) + 1
Slide44
Number of checks (Speed of algorithm)
T(1) = 1
T(N) =
T(N/2)
+ 1
Slide45
Number of checks (Speed of algorithm)
T(1) = 1
T(N) =
T(N/2)
+ 1
=
[ T(N/4) + 1 ]
+ 1Slide46
Number of checks (Speed of algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [
T(N/4)
+ 1 ] + 1Slide47
Number of checks (Speed of algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [
T(N/4)
+ 1 ] + 1
= [
[ T(N/8) + 1]
+ 1] + 1Slide48
Number of checks (Speed of algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
= [ [ T(N/8) + 1] + 1] + 1
= … any pattern?
Slide49
Number of checks (Speed of algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
= [ [ T(N/8) + 1] + 1] + 1
= …
= T(N/2
k
) + kSlide50
Number of checks (Speed of algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
= [ [ T(N/8) + 1] + 1] + 1
= …
= T(N/2
k
) + k
N/2
k
gets smaller and eventually becomes 1Slide51
Number of checks (Speed of algorithm)
T(1) = 1
T(N) = T(N/2) + 1
= [ T(N/4) + 1 ] + 1
= [ [ T(N/8) + 1] + 1] + 1
= …
= T(N/2
k
) + k
N/2
k
gets smaller and eventually becomes 1
solve for kSlide52
Number of Checks (Speed of Algorithm)
N/2
k
= 1
N = 2
k
k = ?Slide53
Number of Checks (Speed of Algorithm)
N/2
k
= 1
N = 2
k
k = log
2
NSlide54
Number of Checks (Speed of Algorithm)
N/2
k
= 1
N = 2
k
k = log
2
N
T(N) = T(N/2
k
) + k
= T(1) + log
2
N
= ? + log
2
NSlide55
Number of Checks (Speed of Algorithm)
N/2
k
= 1
N = 2
k
k = log
2
N
T(N) = T(N/2
k
) + k
= T(1) + log
2
N
= 1 + log
2
NSlide56
N (Linear search) vs
log N + 1 (Binary search)
N
100
7.6
1,000
11.0
10,000
14.3
100,000
17.6
1,000,000
20.9
10,000,000
24.3
100,000,000
27.6
N
100
7.6
1,000
11.0
10,000
14.3
100,000
17.6
1,000,000
20.9
10,000,000
24.3
100,000,000
27.6Slide57
Before using Binary Search
The array needs to be sorted (in order)Slide58
SortingSlide59
Sorting (arranging the items in a
desired order)
How is the phone book arranged?
Why?
Why not arranged by numbers?Slide60
Sorting (arranging the items in a
desired order)
How is the phone book arranged?
Why?
Why not arranged by numbers?
Order
Alphabetical
Low to high numbers
DNA profile with 13 loci?Slide61
Sorting
Imagine you have a thousand numbers in an array
How would you systemically sort them?Slide62
Selection Sort (ascending)
Find/select the smallest item
Swap the smallest item with the first itemSlide63
Selection Sort (ascending)
Find/select the smallest item
Swap the smallest item with the first item
Find/select the second smallest item
Swap the second smallest item with the second item
…Slide64
Example
6
7
2
5
1Slide65
Example
6
7
2
5
1Slide66
Example
6
7
2
5
1
1
7
2
5
6Slide67
Example
6
7
2
5
1
1
7
2
5
6Slide68
Example
6
7
2
5
1
1
7
2
5
6
1
2
7
5
6Slide69
Example
6
7
2
5
1
1
7
2
5
6
1
2
7
5
6Slide70
Example
6
7
2
5
1
1
7
2
5
6
1
2
7
5
6
1
2
5
7
6Slide71
Example
6
7
2
5
1
1
7
2
5
6
1
2
7
5
6
1
2
5
7
6Slide72
Example
6
7
2
5
1
1
7
2
5
6
1
2
7
5
6
1
2
5
7
6
1
2
5
6
7Slide73
Breaking down the problem
Get all the items in ascending order
Get one item at the wanted position/index
What are the two subproblems?Slide74
Breaking down the problem
Get all the items in ascending order
Get one item at the wanted position/index
Find the smallest itemSlide75
Breaking down the problem
Get all the items in ascending order
Get one item at the wanted position/index
Find the smallest item
Swap the smallest item with the item at the wanted positionSlide76
Algorithm Summary (Selection Sort)
For each “desired” position
B
etween the “desired” position and the end
Find the smallest item
Swap the smallest item with the item at the “desired” positionSlide77
Number of comparisons (Speed of Algorithm)
Consider counting
Number of comparisons between array itemsSlide78
Number of comparisons (Speed of Algorithm)
Consider counting
Number of comparisons between array items
Best-case scenario (least # of comparisons)
When does it occur? How many comparisons?Slide79
Number of comparisons (Speed of Algorithm)
Consider counting
Number of comparisons between array items
Best-case scenario (least # of comparisons)
When does it occur? How many comparisons?
Worst-case scenario (most # of comparisons)
When does it occur? How many comparisons?Slide80
Number of comparisons (Speed of Algorithm)
Consider counting
Number of comparisons between array items
Best-case scenario (least # of comparisons)
When does it occur? How many comparisons?
Worst-case scenario (most # of comparisons)
When does it occur? How many comparisons?
Same number of comparisons
For all cases (ie best case = worst case)Slide81
Number of comparisons (Speed of Algorithm)
To find the smallest item
How many comparisons?Slide82
Number of comparisons (Speed of Algorithm)
To find the smallest item
How many comparisons?
N-1
To find the second smallest item
How many comparisons?Slide83
Number of comparisons (Speed of Algorithm)
To find the smallest item
How many comparisons?
N-1
To find the second smallest item
How many comparisons?
N-2
…
Total # of comparisons?Slide84
Number of comparisons (Speed of Algorithm)
To find the smallest item
How many comparisons?
N-1
To find the second smallest item
How many comparisons?
N-2
…
Total # of comparisons
(N-1) + (N-2) + … + 1Slide85
Number of comparisons (Speed of Algorithm)
To find the smallest item
How many comparisons?
N-1
To find the second smallest item
How many comparisons?
N-2
…
Total # of comparisons
(N-1) + (N-2) + … + 1
N(N-1)/2 = (N
2
– N)/2Slide86
Selection Sort
Not the fastest sorting algorithm
Learn faster algorithms in more advanced courses.Slide87
Revisiting Binary SearchSlide88
Binary Search
While more items and not found in the mid point
Eliminate half of the items
Determine the mid pointSlide89
Eliminate half of the
array
How to
specify the focus region?
Hint:
index/positionSlide90
Eliminate half of the
array
How to
specify the focus region?
Hint: index/position
Start and endSlide91
How to determine if the region has items
(is not empty)?
with
start and endSlide92
How to determine if the region has items
(is not empty)?
with
start and end
Start <= endSlide93
How do we adjust start and end?Slide94
How do we adjust start and end?
What are the two different cases?Slide95
How do we adjust start and end?
What are the two different cases?
Item is before the middle item
Item is after the middle itemSlide96
How do we adjust start and end?
What are the two different cases?
Item is before the middle item
Start:
End:
Item is after the middle itemSlide97
How do we adjust start and end?
What are the two different cases?
Item is before the middle item
Start: no change
End: position before the mid point
Item is after the middle itemSlide98
How do we adjust start and end?
What are the two different cases?
Item is before the middle item
Start: no change
End: position before the mid point
Item is after the middle item
Start:
End:Slide99
How do we adjust start and end?
What are the two different cases?
Item is before the middle item
Start: no change
End: position before the mid point
Item is after the middle item
Start: position after the mid point
End: no changeSlide100
How to determine the mid point?
with
start and end?Slide101
How to determine the mid point?
with
start and end
(start + end) / 2
Integer division will eliminate the fractional partSlide102
Algorithm Summary
Initialize start, end, and mid point (I)
While region has items and
item
is not
at the mid point ( C )
Eliminate half of the items by adjusting start or end (U)
Update the mid point (U)
If region has items
Position is mid point
else
Position is -1Slide103
Overall SummarySlide104
Overall Summary
DNA samples from crime scene
Identify people using known DNA profiles
If there is a match
estimate probability of DNA profile
Matching a sample to known DNA profiles
Linear/sequential search [N checks]
Binary search [log
2
N + 1 checks]
Faster but needs sorted data/profiles
Selection Sort [(N
2
– N)/2 comparisons]