Ravi Sandhu Executive Director and Endowed Chair Lecture 13 raviutsagmailcom wwwprofsandhucom Ravi Sandhu WorldLeading Research with RealWorld Impact CS 5323 Ravi Sandhu ID: 640433
Download Presentation The PPT/PDF document "1 Base Rate Fallacy Prof" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
1
Base Rate FallacyProf. Ravi SandhuExecutive Director and Endowed ChairLecture 13ravi.utsa@gmail.comwww.profsandhu.com
© Ravi Sandhu
World-Leading Research with Real-World Impact!
CS 5323Slide2
© Ravi Sandhu
2World-Leading Research with Real-World Impact!Base-Rate FallacyS: Patient is Sick(has the disease)R: Test Result is positiveR¬R
S
¬SR ᴧ S
R ᴧ ¬S
¬R
ᴧ
S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negativeSlide3
© Ravi Sandhu
3World-Leading Research with Real-World Impact!Base-Rate FallacyS: Patient is Sick(has the disease)System is under attackR: Test Result is positiveAlarm is raisedR
¬R
S¬S
R ᴧ SR ᴧ
¬S
¬R
ᴧ
S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negativeSlide4
© Ravi Sandhu
4World-Leading Research with Real-World Impact!Malware Detection TechniquesNwokedi Idika and Aditya Mathur, A Survey of Malware Detection Techniques, Purdue University, Feb 2007.I know what is bad and can detect itFalse positives: noneFalse negatives: ever increasingI know what is good and can detect when you go beyond specificationFalse positives: incomplete specificationFalse negatives: incorrect specificationI will learn what is good and badFalse positives: incorrect learningFalse negatives
: incorrect learningSlide5
© Ravi Sandhu
5World-Leading Research with Real-World Impact!Base-Rate FallacyS: Patient is Sick(has the disease)R: Test Result is positiveR¬R
S
¬SR ᴧ S
R ᴧ ¬S
¬R
ᴧ
S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negativeSlide6
© Ravi Sandhu
6World-Leading Research with Real-World Impact!Base-Rate FallacyS: Patient is Sick(has the disease)R: Test Result is positiveR¬R
S¬S
R ᴧ SR
ᴧ ¬S¬R ᴧ S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negative
P(R|S) = 0.99
P(¬R|S) = 0.01
P(¬R|
¬
S) = 0.99
P(R|
¬
S) = 0.01
These probabilities can be empirically estimatedSlide7
© Ravi Sandhu
7World-Leading Research with Real-World Impact!Estimating P(R|S) etc2000 sick1000 not sick
Test R
is positive
Test R
is negative
Test R
is positive
Test R
is negative
1980
20
10
990
P(R|S) = 0.99
P(¬R|S) = 0.01
P(¬R|
¬
S) = 0.99
P(R|
¬
S) = 0.01
estimate
Coincidentally equalSlide8
© Ravi Sandhu
8World-Leading Research with Real-World Impact!Estimating P(R|S) etc2000 sick1000 not sick
Test R
is positive
Test R
is negative
Test R
is positive
Test R
is negative
1980
20
30
970
P(R|S) = 0.99
P(¬R|S) = 0.01
P(¬R|
¬
S) = 0.97
P(R|
¬
S) = 0.03
estimate
In general will not be equalSlide9
© Ravi Sandhu
9World-Leading Research with Real-World Impact!Base-Rate FallacyS: Patient is Sick(has the disease)R: Test Result is positiveR¬R
S¬S
R ᴧ SR
ᴧ ¬S¬R ᴧ S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negative
P(R|S) = 0.99
P(¬R|S) = 0.01
P(¬R|
¬
S) = 0.97
P(R|
¬
S) = 0.03
These probabilities can be empirically estimated
Columns must total 1
Rows must total between 0 and 2Slide10
© Ravi Sandhu
10World-Leading Research with Real-World Impact!Base-Rate FallacyS: Patient is Sick(has the disease)R: Test Result is positiveR¬R
S¬S
R ᴧ SR
ᴧ ¬S¬R ᴧ S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negative
P(R|S) = 0.99
P(¬R|S) = 0.01
P(¬R|
¬
S) = 0.99
P(R|
¬
S) = 0.01
These probabilities can be empirically estimated
We will continue
w
ith these numbersSlide11
© Ravi Sandhu
11World-Leading Research with Real-World Impact!Real InterestS: Patient is Sick(has the disease)R: Test Result is positiveR¬R
S
¬SR ᴧ S
R ᴧ ¬S
¬R
ᴧ
S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negative
P(S|R) = ??
P(S|
¬
R) =
??
P(¬S|¬R) =
??
P(¬S|R) =
??
These probabilities can be computed by Bayes’ theorem if we know P(S)
Columns must total
between 0 and 2
Rows must total 1Slide12
P(S|R) = (P(S)×P(R|S))/
(P(S)×P(R|S)+P(¬S) )×P(R|¬S))P(¬S|R) = 1 - P(S|R)P(S|¬R) = (P(S)×P(¬R|S))/(P(S)×P(¬R|S)+P(¬S) )×P(¬R|¬S))P(¬S|¬R) = 1 - P(S|¬R)
© Ravi Sandhu
12
World-Leading Research with Real-World Impact!
Bayes’ TheoremSlide13
© Ravi Sandhu
13World-Leading Research with Real-World Impact!Base-Rate FallacyS: Patient is Sick(has the disease)R: Test Result is positiveR¬R
S¬S
R ᴧ SR
ᴧ ¬S¬R ᴧ S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negative
P(R|S) = 0.99
P(¬R|S) = 0.01
P(¬R|
¬
S) = 0.99
P(R|
¬
S) = 0.01
These probabilities can be empirically estimated
We will continue
w
ith these numbersSlide14
© Ravi Sandhu
14World-Leading Research with Real-World Impact!Real InterestS: Patient is Sick(has the disease)R: Test Result is positiveR¬R
S
¬SR ᴧ S
R ᴧ ¬S
¬R
ᴧ
S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negative
P(S|R) = 0.009804
P(S|
¬
R) = 0.000001
P(¬S|¬R) = 0.999999
P(¬S|R) = 0.990196
These probabilities can be computed by Bayes’ theorem if we know P(S)
Columns must total
between 0 and 2
Rows must total 1
Assume P(S)=0.0001
1 in 10,000 has diseaseSlide15
© Ravi Sandhu
15World-Leading Research with Real-World Impact!False Alarms Predominate!Assume P(S)=0.00011 in 10,000 has diseaseP(S|R) requires P(R|¬S)0.01 0.010.09 0.0010.5 0.00010.9 0.000010.99 0.000001Slide16
© Ravi Sandhu
16World-Leading Research with Real-World Impact!Base-Rate FallacyS: Patient is Sick(has the disease)R: Test Result is positiveR¬R
S
¬SR ᴧ S
R ᴧ ¬S
¬R
ᴧ
S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negative
Total population = 1,000,000
1 in 10,000 has disease
100
999,900
R is 99% accurate
for sick and non-sick
populationsSlide17
© Ravi Sandhu
17World-Leading Research with Real-World Impact!Base-Rate FallacyS: Patient is Sick(has the disease)R: Test Result is positiveR¬R
S
¬SR ᴧ S
R ᴧ ¬S
¬R
ᴧ
S
¬R
ᴧ
¬S
True positive
False positive
False negative
True negative
Total population = 1,000,000
1 in 10,000 has disease
100
999,900
R is 99% accurate
for sick and non-sick
populations
99
1
9,999
989,901