Aint What You Think It Is Al M Best PhD Professor Periodontics School of Dentistry Professor Biostatistics School of Medicine Outline Idea for the editorial A history of significance testing ID: 617469
Download Presentation The PPT/PDF document "A P-value" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A P-value
Ain’t
What You Think It Is
Al M Best, PhD
Professor, Periodontics, School of Dentistry
Professor, Biostatistics, School of MedicineSlide2
Outline
Idea for the editorial
A history of significance testingA guide to misinterpretationUsing a dental exampleMy practice as a collaborator
Best AM, Greenberg BL, Glick M. From tea tasting to
t
test: A
P
value
ain’t
what you think it is.
Journal of the American Dental Association.
2016 Jul;147(7):527-9. PMID:
27350642
.Slide3
7-Mar-2017 retractionwatch.com blog
http
://retractionwatch.com/2016/03/07/were-using-a-common-statistical-test-all-wrong-statisticians-want-to-fix-that/Slide4
TAS
http://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108Slide5
Metrics
amstat.tandfonline.com/
doi/citedby/10.1080/00031305.2016.1154108Slide6
Supplemental Material
Greenland,
S, Senn, SJ, Rothman, KJ, Carlin, JB, Poole, C, Goodman, SN and Altman, DG: “Statistical Tests, P-values, Confidence Intervals, and Power:A
Guide to
Misinterpretations”
Altman
, Naomi: Ideas from multiple testing of high dimensional data provide insights about reproducibility and false discovery rates of hypothesis supported by p-values
Benjamin, Daniel J, and Berger, James O: A simple alternative to p-values
Benjamini
,
Yoav
: It’s not the p-values’ fault
Berry, Donald A: P-values are not what they’re cracked up to be
Carlin, John B: Comment: Is reform possible without a paradigm shift?
Cobb, George: ASA statement on p-values: Two consequences we can hope for Gelman, Andrew: The problems with p-values are not just with p-values Goodman, Steven N: The next questions: Who, what, when, where, and why? Greenland, Sander: The ASA guidelines and null bias in current teaching and practice
Ioannidis, John PA: Fit-for-purpose inferential methods: abandoning/changing P-values versus abandoning/changing research Johnson, Valen E: Comments on the “ASA Statement on Statistical Significance and P-values" and marginally significant p-values Lavine, Michael, and Horowitz, Joseph: Comment Lew, Michael J: Three inferential questions, two types of P-value Little, Roderick J: Discussion Mayo, Deborah G: Don’t throw out the error control baby with the bad statistics bathwaterMillar, Michele: ASA statement on p-values: some implications for education Rothman, Kenneth J: Disengaging from statistical significance Senn, Stephen: Are P-Values the Problem? Stangl, Dalene: Comment Stark, PB: The value of p-values Ziliak, Stephen T: The significance of the ASA statement on statistical significance and p-valuesSlide7
Supplemental Material
Greenland,
S, Senn, SJ, Rothman, KJ, Carlin, JB, Poole, C, Goodman, SN and Altman, DG: “Statistical Tests, P-values, Confidence Intervals, and Power: A Guide to Misinterpretations” Eur J
Epidemiol
.
2016 Apr;31(4):337-50.Slide8
The Lady Tasting Tea
Classical example
Salsburg
D.
The Lady Tasting Tea
. New York,
NY: WH
Freeman and Co; 2001.
Fisher
RA.
Statistical Methods and
Scientific Inference
. 3rd ed.
New York, NY:
Hafner Press; 1973.Slide9
Coke
vs
PepsiSay I poured, hidden from you, two soft-drink cups. One with Coke and one with Pepsi. Then I ask you: “Which is Coke? And which is Pepsi?” What are the possible outcomes?From: Maita Levine and Raymond H.
Rolwing
(1993). Teaching Statistics, 15, 4-5. Slide10
Likelihood of outcomes
L
ook at the exact distribution of the number of correct. Calculate the probability of each result.Would this experiment be convincing?Slide11
Coke
vs
Pepsi: 4 cupsAssuming an equal number of Cokes and Pepsis, the next larger experiment would be 4 cups.What are the possible outcomes?Slide12
Likelihood of Outcomes
With each outcome equally likely, we calculate the
p-values for all the possibilities:Would this experiment be convincing?So if someone got all 4 right, we would be able to conclude that this person could “… tell the difference between Coke and Pepsi,
p
-value
= .1667.” Would this be convincing?Slide13
Fisher’s tea lady used 8 cups
All the possible outcomesSlide14
Likelihood of Outcomes
We calculate the p-values
If someone got all 8 right, we could conclude that this person could “… tell the difference between Coke and Pepsi, p-value = .0143.” Would this be convincing?Slide15
Inference?
“Statistical analysis of medical studies is based on the key idea that we make observations on a sample of subjects and then draw inferences about the population of all such subjects from which the sample is drawn.”
Altman D, Machin D., Bryant T, & Gardner M (Eds.) (2013) Statistics with confidence: confidence intervals and statistical guidelines. John Wiley & Sons. ISBN 0-7279-1375-1. Page 3.Gardner MJ, Altman DG. (1988) Estimating with confidence. Br Med
J. 30;296(6631
):1210-1.
PMID
: 3133015; PubMed Central
PMCID:
PMC2545695
.Slide16
Jerzy
Neyman
& Egon PearsonViewed Fisher’s work as mathematically fuzzy and heuristicInstead of focusing on what a scientist thinks about the evidence, an experiment should tell the scientist what to do.Out of this came Ha, type-I and type-II error rates, powerSlide17
Greenland’s “Guide to Misinterpretations”
Lapidus
et al. “Effect of premedication to provide analgesia as a supplement to inferior alveolar nerve block in patients with irreversible pulpitis.” JADA 2016 147(6):427-37.CONCLUSIONS: There is moderate evidence to support the use of oral NSAIDs-in particular, ibuprofen-1 hour before the administration of IANB local anesthetic to provide additional analgesia to the patient.
Greenland et al. “Statistical
Tests, P-values, Confidence Intervals, and Power:
A Guide
to Misinterpretations”
Eur
J
Epidemiol
.
2016 Apr;31(4):337-50
.Slide18
Severely infected irreversible pulpitisSlide19
Tom Hanks (
2000
) A FedEx executive must transform himself physically and emotionally to survive a crash landing on a deserted islandSlide20Slide21
Ibuprofen versus placebo, frequency of participants
in each
group having “little or no pain during endodontic treatment.”“The probability of … is .020.”Slide22
Benzodiazepine versus placebo, frequency
of participants
in each group having “little or no pain during endodontic treatment.”“The probability of … is .954.”Slide23
True
or False?
The
p
-value is the probability that
the
null hypothesis is true.
For example, the test of the ibuprofen null hypothesis gave
P
= 0.02, the null hypothesis has only a 2% chance of being true.
Greenland et al. “Statistical
Tests, P-values, Confidence Intervals, and Power: A
Guide
to Misinterpretations”
Eur
J
Epidemiol
.
2016 Apr;31(4):337-50
.Slide24
The
p
-value is the probability that the null
hypothesis is true.
No
!
The
p
-value
simply indicates the degree to which the data conform to the pattern predicted by the
null hypothesis
and
all the other assumptions used in the test (the underlying statistical model). Slide25
Backwards
The absurdity of the common backwards interpretation might be appreciated
by pondering how the p-value, which is a probability deduced from a set of assumptions, can possibly refer to the probability of those assumptions.Slide26
True
or
False? The p-value is the probability that chance alone produced the observed association.For example, the
p
-value for the ibuprofen null hypothesis is 0.02.
And so there is a 2% probability that chance alone produced the association.Slide27
The
p
-value for the null hypothesis is the probability that chance alone produced the observed association.No! To say this is asserting that every assumption used to compute the p-value is correct, including the null hypothesis. Slide28
Greenland.
et al.’s Guide
14 misinterpretations of a single study’s p-value(s)4 misinterpretations of p-values across studies or in subgroups5 misinterpretations of confidence intervals2 misinterpretations of powerSlide29
p
< .05 means … ?
Ho is false, should be rejectedHa is trueScientifically important effect detectedSubstantially important relationship demonstratedChance of false positive finding is 5%p < .05 does NOT meanSlide30
p
> .05 means …?
Ho is true, should be acceptedHa is falseEvidence in favor of HoThere is no effectThe effect size is smallp > .05
does NOT meanSlide31
Greenland. et al.’s
Conclusions included:
The probability, likelihood, certainty, etc. for a hypothesis cannot be derived from statistical methods alone.Significance tests and confidence intervals do not by themselves provide a logically sound basis for concluding an effect is present or absent with a given probability.Slide32
Not even scientists can
easily explain
p-valuesYou can get it right, or you can make it intuitive, but it’s all but impossible to do both.Slide33
ASA: Conclusion
Good statistical practice, as an essential component of good scientific practice,
emphasizes: principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean
.
No
single index should substitute for scientific reasoning
.Slide34
ASA: Conclusion
Good statistical practice, as an essential component of good scientific practice,
emphasizes: principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation
of results in context,
complete
reporting and
proper
logical and quantitative understanding of what data summaries mean
.
No
single index should substitute for scientific reasoning
.Slide35
Study Design and Conduct
PICO-T
Bias, Confounding, ContaminationAnd, eventually, chanceSlide36
Publication
Bias
1. Bias of rhetoric 2. All’s well literature bias 3. Reference bias 4. Positive results bias 5. Hot stuff bias 6. Pre-publication bias 7. Post-publication bias 8. Sponsorship bias 9. Meta-analysis bias
Selection Bias (susceptibility bias)
1. Popularity bias
2. Centripetal bias
3. Referral filter bias
4. Diagnostic access bias
5. Diagnostic suspicion bias
6. Unmasking bias
7. Mimicry bias
8. Previous opinion bias
9. Wrong sample size bias
10. Admission rate bias (
Berkson
) 11. Prevalence-incidence bias (Neyman) 12. Diagnostic vogue bias 13. Diagnostic purity bias 14. Procedure selection bias 15. Missing clinical data bias 16. Non-contemporaneous control bias 17. Starting time bias 18. Unacceptable disease bias 19. Migrator bias 20. Membership bias 21. Nonrespondent bias 22. Volunteer bias
23. Allocation bias
24. Vulnerability bias
25. Authorization bias
Exposure Bias (performance bias)
1. Contamination bias
2. Withdrawal bias
3. Compliance bias
4. Therapeutic personality bias
5. Bogus control bias
6. Misclassification bias
7. Proficiency bias
Detection Bias (measurement bias)
1. Insensitive measure bias
2. Underlying cause bias (rumination bias)
3. End-digit preference bias
4. Apprehension bias
5. Unacceptability bias
6. Obsequiousness bias
7. Expectation bias
8. Substitution game bias
9. Family information bias
10. Exposure suspicion bias
11. Recall bias
12. Attention bias
13. Instrument bias
14. Surveillance bias
15. Comorbidity bias
16.
Nonspecification
bias
17. Verification bias (work-up bias)
Analysis Bias (Transfer Bias)
1. Post-hoc significance bias
2. Data dredging bias
3. Scale degradation bias
4. Tidying-up bias (deliberate elimination bias)
5. Repeated peeks bias
Interpretation Bias
1. Mistaken identity bias
2. Cognitive dissonance bias
3. Magnitude bias
4. Significance bias
5. Correlation bias
6. Under-exhaustion bias
The Dunning-Kruger effect
Hartman
JM,
Forsen
JW Jr, Wallace MS, Neely JG. “Tutorials in clinical research: part IV: recognizing and controlling bias
.”
Laryngoscope.
2002 Jan;112(1):23-31
.
Expanded from:
Sackett
DL.
“Bias in analytic research
.”
J
Chronic Dis. 1979;32(1-2):51-63
.Slide37
Cognitive Bias CodexSlide38
ASA: Conclusion
Good statistical practice, as an essential component of good scientific practice,
emphasizes: principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete
reporting and
proper
logical and quantitative understanding of what data summaries mean
.
No
single index should substitute for scientific reasoning
.Slide39
Context
David Moore:
“Data are numbers, but they are not ‘just numbers.’ They are numbers with a context.”Moore and Notz 2006, Statistics: Concepts and Controversies,
NY
: Freeman, p xxiSlide40
Context
Tonight we’re going to let the statistics
speak for themselvesEd Koren, © The New Yorker, 9 December 1974Slide41
ASA: Conclusion
Good statistical practice, as an essential component of good scientific practice,
emphasizes: principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study,
interpretation
of results in context,
complete
reporting and
proper
logical and quantitative understanding of what data summaries mean
.
No
single index should substitute for scientific reasoning
.Slide42
Words Matter
CONSORT 2010
How to Report Statistics in MedicineAMA Manual of StyleMoore and Notz 2006, Statistics: Concepts and Controversies, NY: Freeman, p xxiSlide43
CON
solidated
Standards of Reporting TrialsThe CONSORT Statement comprises a 25-item checklist and a flow diagram. The checklist items focus on reporting how the trial was designed, analysed, and interpreted; the flow diagram displays the progress of all participants through the trial. The CONSORT “Explanation and Elaboration” document explains and illustrates the principles underlying the CONSORT Statement.
www.consort-statement.orgSlide44
Specialized CONSORT
Harms (safety)
Non-inferiorityCluster randomized trialsHerbal, AcupunctureNon-pharmacologic agentsPragmatic trialsParent reported outcomesN-of-1 trialsOrthodontic trialsPilot and feasibility trialsSlide45
E
nhancing the
QUAlity and Transparency of health ResearchSTROBE – Observational studiesPRISMA – Systematic reviewsCARE – Case reportsSRQR – Qualitative researchSTARD – Diagnostic/prognostic studiesSQUIRE – Quality improvement studies… a total of 358 reporting guidelines
http://www.equator-network.org/Slide46
Dedication
Lang: To anyone who has encountered the frustration of what
I call “Statistical Buddhism”To those who know, no explanation is necessary.To those who do not know, no explanation is possible.Slide47
Everitt
BS.
The Cambridge Dictionary of Statistics in the Medical Sciences. Cambridge, England: Cambridge University Press; 1995.GlossaryP value: probability of
obtaining
the
observed
data
(or data that are
more
extreme) if the null hypothesis
were
exactly
true.
www.amamanualofstyle.comSlide48
Al’s
Conclusion
Good statistical practice, is an essential component of good scientific practiceData are information in context. Insist on a full and complete description of the context of a study. A p-value is calculated from a set of numbers encased in certain assumptions. Viewed alone, the p-value may be meaningless.
No
single index
can substitute
for scientific reasoning
.Slide49
Thank youSlide50
ASA: Six Principles
P-values
can indicate how incompatible the data are with a specified statistical model.P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.Proper inference requires full reporting and transparency.A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.Slide51
George Cobb—Looking
Ahead:
Five ImperativesGeorge Cobb (2015) Mere Renovation is Too Little Too Late: We Need to Rethink our Undergraduate Curriculum from the Ground Up, The American Statistician, 69:4, 266-282, DOI: 10.1080/00031305.2015.1093029Flatten prerequisitesCalc I → Calc II → Calc III → Probability → Math Stat → BiostatisticsStrip away technical formalism and formulasEmbrace computationExploit context
Interpretation, motivation,direction
Teach through research