The year 2013 marks the 250th anniversary of Bayes rule one of the two fundamental inferential principles of mathematical statistics The rule has been inuential over the entire period and controversial over most of it Its reliance on prior beliefs h ID: 63110
Download Pdf The PPT/PDF document "BULLETIN New Series OF THE AMERICAN MATH..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
2BRADLEYEFRON Figure1.Thegreaterworldofmathematicsandscience.NeartheendofthetalkIwillgivesomehintsofanemergingBayesian-frequentistalliance,designedtodealwiththeenormousandcomplicateddatasetsmodernscientictechnologyisproducing.Firstthough,IbeginwithathumbnailsketchofBayesianhistoryandpractice.2.ThePhysicist'sTwinsAphysicistfriendofmineandherhusbandfoundout,thankstothemiracleofsonograms,thatshewasgoingtohavetwinboys.Oneafternoonatthestudentunionshesuddenlyaskedme,\WhatarethechancesmytwinswillbeIdenticalratherthanFraternal?"AsanexperiencedstatisticalconsultantIstalledfortime,andaskedifthedoctorhadtoldheranythingelse.\Yes,hesaidthatone-thirdoftwinbirthsareIdenticalandtwo-thirdsareFraternal."BayeswouldhavediedinvainifIdidn'tusehisruletoanswerthequestion.Weneedtocombinetwopiecesofpartiallycontradictoryevidence.PastexperiencefavorsFraternalaccordingtothedoctor,theprioroddsratiobeingPrfidenticalg Prffraternalg=1=3 2=3=1 2(priorexperience):Currentevidenceobservedfromthesonogram,however,favorsIdentical:identicaltwinsarealwaysthesamesexwhilefraternalsareequallylikelytobethesameordierentsexes.Instatisticsterminology,the\likelihoodratio"ofthecurrentevidenceistwo-to-oneinfavorofIdentical,Prfsamesexjidenticalg Prfsamesexjfraternalg=1 1=2=2(currentevidence):(Thegender,\boys"inthiscase,doesn'taectthecalculation.) Applied Sciences A250-YEARARGUMENT5Thisdidn'tendthedesiretouseBayesruleinsituationswithoutgenuinepriorexperience.HaroldJereys,arguablytheworld'sleadinggeophysicistatthetime,devisedanimprovedprinciplethatwasinvariantundertransformations.(Moreonthisalittlelater.)Forthetwinsproblem,hisrulewouldtakethepriordensityforp,thepopulationproportionofIdenticaltwins,nottobe atbutratherU-shaped,goingupsharplynearzeroandone,(p)=cp1 2(1p)1 2:\ObjectiveBayes"isthecontemporarynameforBayesiananalysiscarriedoutintheLaplace{Jereysmanner. Figure3.Threepossiblepriordensitiesforp,thepopulationpro-portionIdentical,andtheirpredictionsforthephysicist'stwins.Figure3graphsthreedierentpriordistributionsforp:thedoctor'sdeltafunc-tionatp=1=3,Laplace's atprior,andJerey'sU-shapedpriordensity.Ofcoursedierentpriordistributionsproducedierentresults.Myanswertothephysicist,thatshehad50%chanceofIdenticaltwins,changesto58.6%withJereysprior,andtoawhopping61.4%witha atLaplaceprior.AsIsaidearlier,therehasrecentlybeenastrongBayesianrevivalinscienticapplications.Ieditanappliedstatisticsjournal.PerhapsonequarterofthepapersemployBayestheorem,andmostofthesedonotbeginwithgenuinepriorinforma-tion.Jereyspriors,andtheirmanymodernvariations,aretheruleratherthantheexception.Theyrepresentanaggressiveapproachtomathematicalmodelingandstatisticalinference.AlargemajorityofworkingstatisticiansdonotfullyacceptJereysBayespro-cedures.Thisbringsustoamoredefensiveapproachtoinference,frequentism,thecurrentlydominantstatisticalphilosophy. 6BRADLEYEFRON4.FrequentismFrequentismbeginswiththreeofthesamefouringredientsasBayestheory:anunknownparameter,orstateofnature,;someobserveddatax,andaprobabilitymodelf(x).Whatismissingis(),thepriorbeliefs.Inplaceof(),attentionfocusesonsomestatisticalprocedurethatthestatisticianintendstouseontheproblemathand.HereIwillcallitt(x),perhapsanestimateoracondenceinterval,orateststatisticorapredictionrule.Inferenceisbasedonthebehavioroft(x)inrepeatedlong-termuse.Forexample,apredictionrulemightbeshowntobecorrectatleast90%ofthetime,nomatterwhatthetruehappenstobe.Inthisframework,optimality,ndingtheproceduret(x)thathasthebestlong-termbehavior,becomesthecentralmathematicaltask.Onemight,forinstance,lookforthepredictionrulewiththesmallestpossibleerrorrates.(Bayestheoryhasnoneedforoptimalitycalculationssince,withinitsownframework,theruleautomaticallyprovidesidealanswers.)Optimalitytheoryiswheremathematicshasplayeditsgreatestroleinstatistics.Thefrequentistbandwagonreallygotrollingintheearly1900s.RonaldFisherdevelopedthemaximumlikelihoodtheoryofoptimalestimation,showingthebestpossiblebehaviorforanestimate;andJerzyNeymandidthesameforcondenceintervalsandtests.Fisher'sandNeyman'sprocedureswereanalmostperfectttothescienticneedsandthecomputationallimitsoftwentiethcenturyscience,castingBayesianismintoashadowexistence. Figure4.Scoresof22studentsontwotests,\mechanics"and\vectors";samplecorrelationcoecientis0.498??Figure4showsasmalldatasetofthetypefamiliartoFisherandNeyman.Twenty-twostudentshaveeachtakentwotests,called\mechanics"and\vectors."Eachofthepointsrepresentsthetwoscoresforonestudent,rangingfromthewinnersattheupperrighttothelosersatlowerleft.Wecalculatethesample A250-YEARARGUMENT7correlationcoecientbetweenthetwotests,whichturnsouttoequal0.498,andwonderhowaccuratethisis.WhatIpreviouslycalledthedataxisnowall22points,whilethestatistic,or\method"t(x)isthesamplecorrelationcoecient.Ifthepointshadfallenexactlyalongastraightlinewithpositiveslopethesamplecorrelationcoecientwouldbe1.00,inwhichcasethemechanicsscorewouldbeaperfectpredictorforthevectorsscore,andviceversa(andtheywouldn'thavehadtogivetwotests).Theactualobservedvalue,0.498,suggestsamoderatebutnotoverwhelmingpredictiverelationship.Twenty-twopointsisnotalot,andwemightworrythatthecorrelationwouldbemuchdierentifwetestedmorestudents.Alittlebitofnotation:n=22isthenumberofstudents,yiisthedatafortheithstudent,thatis,hisorhertwoscores,andthefulldatasetyisthecollectionofall22yi's.Theparameterofinterest,theunknownstateofnature,isthetruecorrelation:thecorrelationwewouldseeifwehadamuchlargersamplethan22,maybeevenallpossiblestudents.NowI'vecalledthesamplecorrelationcoecient0.498\^."ThisisFisher'snotation,indicatingthatthestatistic^isstrivingtoestimatethetruecorrelation.The\??"after0.498saysthatwe'dlikesomeideaofhowwell^islikelytoperform.R.A.Fisher'srstpaperin1915derivedtheprobabilitydistributionforthecorrelationproblem:whatIpreviouslycalledf(x),nowf(^)with^takingtheplaceofx.(Itisarathercomplicatedhypergeometricfunction.)Muchmoreimportantly,between1920and1935hedevelopedthetheoryofmaximumlikelihoodestimation,andtheoptimalityoftheMLE.Speakingloosely,maximumlikelihoodisthebestpossiblefrequentistestimationmethod,inthesensethatitminimizestheexpectedsquareddierencebetween^andtheunknown,nomatterwhatmaybe.Fisher's1915calculationswerecarriedoutinthecontextofabivariatenormaldistribution,thatis,foratwo-dimensionalversionofabell-shapedcurve,whichI'lldiscussalittlelater.Despitepursuingquitesimilarscienticgoals,thetwofoundersofmathemati-calstatistics,FisherandNeyman,becamebitterrivalsduringthe1930s,withnotagoodwordtosayforeachother.Nevertheless,NeymanessentiallycompletedFisher'sprogrambydevelopingoptimalfrequentistmethodsfortestingandcon-denceintervals.Neyman's90%condenceintervalforthestudentcorrelationcoecientisper-hapsshockinglywide.Itsaysthatexistsin[0:164;0:717];witha5%chanceofmissingoneitherend.Againspeakingroughly,Neyman'sintervalisasshortaspossibleintheabsenceofpriorinformationconcerning.Thepointestimate,^=0:498,lookedprecise,buttheintervalestimateshowshowlittleinformationthereactuallywasinoursampleof22students.Figure5isapictureoftheNeymanconstruction,asphysicistsliketocallit.TheblackcurveisFisher'sprobabilitydistributionfor^iftheparameterequaled0.164,thelowerendpointofthe90%condenceinterval.Here0.164waschosentoputexactly5%oftheprobabilityabovetheobservedvalue^=0:498.Similarly A250-YEARARGUMENT9 Figure6.JereysBayesposteriordensity(jx)forthe22stu-dents;90%crediblelimits=[0:164;0:718];Neymanlimits[0:164;0:717].Ifwealwayshadsuchniceagreement,peacewouldbreakoutinthelandofstatistics.Thereissomethingspecial,however,aboutthecorrelationproblem,whichI'llgettosoon.Table1.MoreStudents n ^ 22 .498 44 :663 66 :621 88 :553 1 [:415;:662] Iactuallyselectedour22studentsrandomlyfromabiggerdatasetof88.Table1showsthesamplecorrelationcoecient^asthesamplesizeincreased:atn=44theestimatejumpedupfrom0.498to0.663,comingdownabitto0.621atn=66andendingat0.553fortheall88students.Theinnityrowrepresentstheunknownfuture,framedbythe90%Neymanintervalbasedonall88students,2[0:415;0:662];nowagooddealmoreprecisethantheinterval[0:164;0:717]basedononlytheoriginal22.Statisticiansusuallydonothavetheluxuryofpeeringintothefuture.Frequen-tismandBayesianismarecompetingphilosophiesforextrapolatingfromwhatwe 10BRADLEYEFRONcanseetowhatthefuturemighthold.Thatiswhatthe250-yearargumentisreallyabout.5.NuisanceParameters Figure7.Galton's1886distributionofchild'sheightvsparents';ellipsesarecontoursofbest-tbivariatenormaldensity;reddotatbivariateaverage(68:3;68:1).Figure7representstheveryrstbivariatenormaldistribution,datingfrom1886.ItisduetoFrancisGalton,eccentricVictoriangenius,earlydeveloperofngerprintanalysisandscienticweatherprediction,andbest-sellingauthorofadventuretravelbooks.Eachofthe928pointsshowsanadultchild'sheightalongthey-axisandtheparent'saverageheightalongthex-axis.Thebigreddotisatthetwograndaverages,about68incheseachwayin1886.SomehowGaltonrealizedthatthepointsweredistributedaccordingtoatwo-dimensionalcorrelatedversionofthebell-shapedcurve.Hewasnomathematicianbuthehadfriendswhowere,andtheydevelopedtheformulaforthebivariatenormaldensity,whichI'lldiscussnext.Theellipsesshowcurvesofequaldensityfromtheformula.Galtonwassomekindofstatisticalsavant.Besidesthebivariatenormaldistri-bution,heusedthispicturetodevelopcorrelation(calledbyhimoriginally\co-relation")andregressiontheory(calledbyhim\regressiontothemean":extremelytallparentshavelessextremelytallchildren,andconverselyforshortness.)Galton'sformulafortheprobabilitydensityfunctionofabivariatenormalran-domvectory=(y1;y2)0isf;(y)=1 2jj1 2e1 2(y)01(y):Here=(1;2)0isthetwo-dimensionalmeanvectorwhileisthe2-by-2positivedenitecovariancematrix.(Itdescribesthevariabilitiesofyiandy2aswellastheir A250-YEARARGUMENT11correlation.)StandardnotationforthedistributionisyN2(;);read\yisbivariatenormalwithmeanandcovariance."Aperspectivepictureofthedensityfunctionwouldshowanestheticallypleasingbell-shapedmountain.InFigure7Ichosetomatchthereddotatthecenter,andtogivethebest-matchingellipsestothepointcloud|inotherwordsIusedthemaximumlikelihoodestimatesofand.Themainthingtonotehereisthatabivariatenormaldistributionhasvefreeparameters,twoforthemeanvectorandthreeforthesymmetricmatrix,andthatallvewillbeunknownintypicalapplications.Forreasonshavingtodowithrelationshipsamongtheveparameters,thecor-relationproblemturnsouttobemisleadinglyeasy.Hereisamoredicult,andmoretypical,problem:supposeweareinterestedintheeigenratio,theratioofthelargesteigenvalueofthematrixtothesumofthetwoeigenvalues,=1 1+2(12eigenvaluesof):TheMLEestimateof,^,obtainedfromthe22datapointsgivesmaximumlikelihoodestimate^=0:793??wherethequestionmarksindicatethatwewantsomeassessmentofhowaccurate^isforestimatingthetruevalue.Whatisnottruefortheeigenratioisthatthedistributionofthequantity^we'reinterestedindependsonlyon.Thiswastrueforthecorrelation,andeectivelyreducedallthecalculationstoonedimension.Nomatterhowwetrytoreparameterizetheve-parameterbivariatenormaldistribution,therewillstillbefournuisanceparametersinvolved,inadditiontotheeigenratio,andtheydon'tconvenientlygoaway.Somehowtheyhavetobetakenintoaccountbeforeonecananswerthe??question.Bayesianinferencehasasimplewayofdealingwithnuisanceparameters:theyareintegratedoutoftheve-dimensionalposteriordistribution.However\simple"isn'tnecessarily\correct,\andthiscanbeamajorpointofpracticaldisagreementbetweenfrequentistandBayesianstatisticians.TheheavycurveinFigure8istheBayesposteriordensityfortheeigenratio,startingfromJereys've-dimensionaluninformativepriorandintegratingoutthefournuisanceparameters.Dashedlinesindicatethe90%Bayesposteriorlimitsforthetrueeigenratiogiventhedataforthe22students.Theredtrianglesarefrequen-tist90%limits,obtainedfromabootstrapcalculationIwilldescribenext.Thereisnotabledisagreement|thefrequentistlimitsareshiftedsharplydownwards.Jereys'prior,infact,doesnotgivefrequentisticallyaccuratecondencelimitsinthiscase,orinamajorityofproblemsaictedwithnuisanceparameters.Other,better,uninformativepriorshavebeenputforward,butforthekindofmassivedataanalysisproblemsI'lldiscusslast,mostBayesiansdonotfeelcompelledtoguaranteegoodfrequentistperformance.6.TheBootstrapandGibbsSamplingIbeganthistalkfromapointin1763,andsofarhavebarelyprogressedpast1950.Sincethattimemodernscientictechnologyhaschangedthescopeoftheproblemsstatisticiansdealwith,andhowtheysolvethem.AsI'llshowlast,data A250-YEARARGUMENT13 Figure9.10,000bootstrapeigenratiovaluesfromthestudentscoredata(bivariatenormalmodel);dashedlineshowscondenceweights.10,000^s,sothatsmallervaluescountmore.(Thedashedcurveisthereweightingfunction.)Thebootstrapcondencelimitsarethe5thand95thpercentilesofthereweighted^s.TheBayesianworldhasalsobeenautomated.\Gibbssampling"isaMarkovChainrandomwalkprocedure,namedafterGibbsdistributioninstatisticalphysics.Giventhepriorandthedata,MarkovchainMonteCarlo(MCMC)producessam-plesfromanotherwisemathematicallyintractableposteriordistribution(jx).(ThehistoryoftheideahassomethingtodowithLosAlamosandtheA-bomb.)MCMCtheoryisperfectlygeneral,butinpracticeitfavorstheuseofconvenientuninformativepriorsoftheJereysstyle|whichhasalottodowiththeirdomi-nanceincurrentBayesianapplications.7.EmpiricalBayesIwantedtoendwithabig-dataexample,moretypicalofwhatstatisticiansareseeingthesedays.Thedataisfromaprostatecancerstudyinvolving102men,52withprostatecancerand50healthycontrols.Eachmanwasmeasuredonapanelof6033genes(usingmicroarrays,thearchetypeofmodernscientichigh-throughputdevices.)Foreachgene,astatisticxiwascalculated,thedierenceinmeansbetweenthecancerpatientsandthehealthycontrols,which,suitablynormalized,shouldbedistributedaccordingtoabell-shapedcurve.Forgenei,thecurvewouldbecenteredati,the\trueeectsize,"xiN(i;1):Wecan'tdirectlyobservei,onlyitsestimatexi. 14BRADLEYEFRONPresumably,ifgeneidoesn'thaveanythingtodowithprostatecancer,theniwillbenearzero.Ofcourse,theinvestigatorswerehopingtospotgeneswithbigeectssizes,eitherpositiveornegative,asacluetothegeneticbasisofprostatecancer. Figure10.Prostatecancerstudy:dierenceestimatesxicom-paringcancerpatientswithhealthycontrols,6033genes.Dashesindicatethe10largestestimates.ThehistograminFigure10showsthe6033eectsizeestimatesxi.Thelightdashedcurveindicateswhatwewouldseeifnoneofthegeneshadanythingtodowithprostatecancer,thatis,ifalltheeectsizeswerezero.Fortunatelyfortheinvestigators,thatdoesn'tseemtobethecase.Abetterttothehistogram,called^m(x),showstheheaviertailsofthehistogram,presumablyre ectinggeneswithsubstantiallybigeectsizes.Lookingjustattherightside,I'vemarkedwithlittlereddashesthe10largestxi's.Theseseemwaytoobigtohavezeroeectsize.Inparticular,thelargestoneofall,fromgene610,hasxi=5:29,almostimpossiblybigif610reallyequalledzero.Butwehavetobecarefulhere.With6033genestoconsideratonce,thelargestobservedvalueswillalmostcertainlyoverstatetheircorrespondingeectsizes.(An-otherexampleofGalton'sregressiontothemeaneect.)Gene610haswonabig-gnesscontestwith6033competitors.It'swonfortworeasons:ithasagenuinelylargeeectsize,andit'sbeenlucky|therandomnoiseinxihasbeenpositiveratherthannegative|orelseitprobablywouldnothavewon!Thequestionishowtocompensateforthecompetitioneectsandgethonestestimatesforthecontestwinners.There'sanintriguingBayesiananalysisforthissituation.Consideringanyonegene,supposeitseectsizehassomepriordensity().Wedon'tgettosee, A250-YEARARGUMENT15butratherx,whichisplussomenormalnoise.Ifweknow()wecanuseBayestheoremtooptimallyestimate.Bydenition,themarginaldensityofxisitsdensitytakingaccountofthepriorrandomnessinandthenormalnoise,m(x)=Z111 p 2e1 2(x)2()d:Tweedie'sformulaisaneatexpressionfortheBayesposteriorexpectationofdeltahavingobservedx,Efjxg=x+d dxlogm(x):ThetroublewithapplyingTweedie'sformulatotheprostatestudyisthatwith-outpriorexperiencewedon'tknow()or,therefore,m(x).ThisisthekindofsituationwherefrequentistsrebelagainstusingBayesianmethods.Thereis,however,anicecompromisemethodavailable,thatgoesbythename\EmpiricalBayes."Ifwedrawasmoothcurvethroughthegreenhistogram,liketheheavycurveinFigure10,wegetareasonableestimate^m(x)ofthemarginaldensitym(x).WecanplugthisintoTweedie'sformulatoestimatetheBayesposteriorexpectationofanyoneigivenitsxi,^Efijxig=xi+d dxlog^m(x)xi:Atthispointwe'veobtainedafrequentistestimateofourBayesexpectation,with-outmakinganypriorassumptionsatall! Figure11.EmpiricalBayesestimatesofEfjxg,theexpectedtruedierenceigiventheobserveddierencexi.Figure11graphstheempiricalBayesestimationcurvefortheprostatestudydata.Forgene610attheextremeright,itsobservedvaluex=5:29isreducedto 16BRADLEYEFRONanestimatedeectsizeof4.07(aquantitativeassessmentoftheregressiontothemeaneect).Inasimilarway,allofthexi'sareshrunkbacktowardzero,anditcanbeshownthatdoingsonicelycompensatesforthecompetitioneectsIwasworriedaboutearlier.Thecurvehasaninterestingshape,witha atspotbetween2and2.Thismeansthatmostofthegenes,93%ofthem,haveeectsizeestimatesnearzero,suggesting,sensibly,thatmostofthegenesaren'tinvolvedinprostatecancerde-velopment.EmpiricalBayesisthatBayes-frequentistcollaborationIreferredtoatthebe-ginningofthistalk|ahopefulsignforfuturestatisticaldevelopments.8.AScoreSheetTable2.ScoreSheet Bayes Frequentist (1)Belief(prior)(2)Principled(3)Onedistribution(4)Dynamic(5)Individual(subjective)(6)Aggressive (1)Behavior(method)(2)Opportunistic(3)Manydistributions(bootstrap?)(4)Static(5)Community(objective)(6)Defensive Table2isascoresheetforthefrequentist/Bayesianargument,thatyoucanusetodecidewhichphilosophicalpartyyouwouldjoinifyouwereanappliedstatistician:(1)Firstandforemost,Bayesianpracticeisboundtopriorbeliefs,whilefre-quentismfocusesonbehavior.TheBayesianrequirementforapriordis-tribution,whatIcalled(),isadeal-breakerforfrequentists,especiallyintheabsenceofgenuinepriorexperience.Ontheotherhand,frequentistanalysisbeginswiththechoiceofaspecicmethod,whichstrikesBayesiansasarticialandincoherent.Evenoptimalfrequentistmethodsmaybedis-paragedsincetheoptmalityreferstoaveragesoverhypotheticalfuturedatasets,dierentthantheobserveddatax.Thisleadstoasecondmajordis-tinction:(2)Bayesianismisaneatandfullyprincipledphilosophy,whilefrequentismisagrab-bagofopportunistic,individuallyoptimal,methods.PhilosophersofscienceusuallycomedownstronglyontheBayesianside.(3)OnlyoneprobabilitydistributionisinplayforBayesians,theposteriordistributionIcalled(jx).Frequentistsmuststruggletobalancebehav-ioroverafamilyofpossibledistributions,asillustratedwithNeyman'sconstructionforcondenceintervals.Bayesproceduresoftenhaveanal-luringlysimplejustication,perhapsdangerouslyalluringaccordingtofre-quentists.(Bootstrapmethodsareanattempttoreducefrequentismtoaone-distributiontheory.TherearedeeperBayes/bootstrapconnectionsthanIhavediscussedhere.)(4)ThesimplicityoftheBayesianapproachisespeciallyappealingindynamiccontexts,wheredataarrivessequentially,andwhereupdatingone'sbeliefs 18BRADLEYEFRON7.RobertE.KassandLarryWasserman,Theselectionofpriordistributionsbyformalrules,J.Amer.Statist.Assoc.91(1996),no.435,1343{1370,thoroughdiscussionofJereyspriorsintheiroriginalandmodernforms.8.ErichL.LehmannandJosephP.Romano,TestingStatisticalHypotheses,3rded.,SpringerTextsinStatistics,Springer,NewYork,2005,Section3.5discussesNeyman'sconstruction.MR2135927(2006m:62005)9.KantilalVarichandMardia,JohnT.Kent,andJohnM.Bibby,MultivariateAnalysis,AcademicPress,London,1979,Table1.2.1givesthestudentscoredata.MR560319(81h:62003)StanfordUniversityCurrentaddress:DepartmentofStatistics,390SerraMall,Stanford,CA94305-4065E-mailaddress:brad@stat.stanford.edu