/
BULLETIN New Series OF THE AMERICAN MATHEMATICAL SOCIE BULLETIN New Series OF THE AMERICAN MATHEMATICAL SOCIE

BULLETIN New Series OF THE AMERICAN MATHEMATICAL SOCIE - PDF document

debby-jeon
debby-jeon . @debby-jeon
Follow
432 views
Uploaded On 2015-05-08

BULLETIN New Series OF THE AMERICAN MATHEMATICAL SOCIE - PPT Presentation

The year 2013 marks the 250th anniversary of Bayes rule one of the two fundamental inferential principles of mathematical statistics The rule has been inuential over the entire period and controversial over most of it Its reliance on prior beliefs h ID: 63110

The year 2013 marks

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "BULLETIN New Series OF THE AMERICAN MATH..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

2BRADLEYEFRON Figure1.Thegreaterworldofmathematicsandscience.NeartheendofthetalkIwillgivesomehintsofanemergingBayesian-frequentistalliance,designedtodealwiththeenormousandcomplicateddatasetsmodernscienti ctechnologyisproducing.Firstthough,IbeginwithathumbnailsketchofBayesianhistoryandpractice.2.ThePhysicist'sTwinsAphysicistfriendofmineandherhusbandfoundout,thankstothemiracleofsonograms,thatshewasgoingtohavetwinboys.Oneafternoonatthestudentunionshesuddenlyaskedme,\WhatarethechancesmytwinswillbeIdenticalratherthanFraternal?"AsanexperiencedstatisticalconsultantIstalledfortime,andaskedifthedoctorhadtoldheranythingelse.\Yes,hesaidthatone-thirdoftwinbirthsareIdenticalandtwo-thirdsareFraternal."BayeswouldhavediedinvainifIdidn'tusehisruletoanswerthequestion.Weneedtocombinetwopiecesofpartiallycontradictoryevidence.PastexperiencefavorsFraternalaccordingtothedoctor,theprioroddsratiobeingPrfidenticalg Prffraternalg=1=3 2=3=1 2(priorexperience):Currentevidenceobservedfromthesonogram,however,favorsIdentical:identicaltwinsarealwaysthesamesexwhilefraternalsareequallylikelytobethesameordi erentsexes.Instatisticsterminology,the\likelihoodratio"ofthecurrentevidenceistwo-to-oneinfavorofIdentical,Prfsamesexjidenticalg Prfsamesexjfraternalg=1 1=2=2(currentevidence):(Thegender,\boys"inthiscase,doesn'ta ectthecalculation.) Applied Sciences A250-YEARARGUMENT5Thisdidn'tendthedesiretouseBayesruleinsituationswithoutgenuinepriorexperience.HaroldJe reys,arguablytheworld'sleadinggeophysicistatthetime,devisedanimprovedprinciplethatwasinvariantundertransformations.(Moreonthisalittlelater.)Forthetwinsproblem,hisrulewouldtakethepriordensityforp,thepopulationproportionofIdenticaltwins,nottobe atbutratherU-shaped,goingupsharplynearzeroandone,(p)=cp�1 2(1�p)�1 2:\ObjectiveBayes"isthecontemporarynameforBayesiananalysiscarriedoutintheLaplace{Je reysmanner. Figure3.Threepossiblepriordensitiesforp,thepopulationpro-portionIdentical,andtheirpredictionsforthephysicist'stwins.Figure3graphsthreedi erentpriordistributionsforp:thedoctor'sdeltafunc-tionatp=1=3,Laplace's atprior,andJe rey'sU-shapedpriordensity.Ofcoursedi erentpriordistributionsproducedi erentresults.Myanswertothephysicist,thatshehad50%chanceofIdenticaltwins,changesto58.6%withJe reysprior,andtoawhopping61.4%witha atLaplaceprior.AsIsaidearlier,therehasrecentlybeenastrongBayesianrevivalinscienti capplications.Ieditanappliedstatisticsjournal.PerhapsonequarterofthepapersemployBayestheorem,andmostofthesedonotbeginwithgenuinepriorinforma-tion.Je reyspriors,andtheirmanymodernvariations,aretheruleratherthantheexception.Theyrepresentanaggressiveapproachtomathematicalmodelingandstatisticalinference.AlargemajorityofworkingstatisticiansdonotfullyacceptJe reysBayespro-cedures.Thisbringsustoamoredefensiveapproachtoinference,frequentism,thecurrentlydominantstatisticalphilosophy. 6BRADLEYEFRON4.FrequentismFrequentismbeginswiththreeofthesamefouringredientsasBayestheory:anunknownparameter,orstateofnature,;someobserveddatax,andaprobabilitymodelf(x).Whatismissingis(),thepriorbeliefs.Inplaceof(),attentionfocusesonsomestatisticalprocedurethatthestatisticianintendstouseontheproblemathand.HereIwillcallitt(x),perhapsanestimateoracon denceinterval,orateststatisticorapredictionrule.Inferenceisbasedonthebehavioroft(x)inrepeatedlong-termuse.Forexample,apredictionrulemightbeshowntobecorrectatleast90%ofthetime,nomatterwhatthetruehappenstobe.Inthisframework,optimality, ndingtheproceduret(x)thathasthebestlong-termbehavior,becomesthecentralmathematicaltask.Onemight,forinstance,lookforthepredictionrulewiththesmallestpossibleerrorrates.(Bayestheoryhasnoneedforoptimalitycalculationssince,withinitsownframework,theruleautomaticallyprovidesidealanswers.)Optimalitytheoryiswheremathematicshasplayeditsgreatestroleinstatistics.Thefrequentistbandwagonreallygotrollingintheearly1900s.RonaldFisherdevelopedthemaximumlikelihoodtheoryofoptimalestimation,showingthebestpossiblebehaviorforanestimate;andJerzyNeymandidthesameforcon denceintervalsandtests.Fisher'sandNeyman'sprocedureswereanalmostperfect ttothescienti cneedsandthecomputationallimitsoftwentiethcenturyscience,castingBayesianismintoashadowexistence. Figure4.Scoresof22studentsontwotests,\mechanics"and\vectors";samplecorrelationcoecientis0.498??Figure4showsasmalldatasetofthetypefamiliartoFisherandNeyman.Twenty-twostudentshaveeachtakentwotests,called\mechanics"and\vectors."Eachofthepointsrepresentsthetwoscoresforonestudent,rangingfromthewinnersattheupperrighttothelosersatlowerleft.Wecalculatethesample A250-YEARARGUMENT7correlationcoecientbetweenthetwotests,whichturnsouttoequal0.498,andwonderhowaccuratethisis.WhatIpreviouslycalledthedataxisnowall22points,whilethestatistic,or\method"t(x)isthesamplecorrelationcoecient.Ifthepointshadfallenexactlyalongastraightlinewithpositiveslopethesamplecorrelationcoecientwouldbe1.00,inwhichcasethemechanicsscorewouldbeaperfectpredictorforthevectorsscore,andviceversa(andtheywouldn'thavehadtogivetwotests).Theactualobservedvalue,0.498,suggestsamoderatebutnotoverwhelmingpredictiverelationship.Twenty-twopointsisnotalot,andwemightworrythatthecorrelationwouldbemuchdi erentifwetestedmorestudents.Alittlebitofnotation:n=22isthenumberofstudents,yiisthedatafortheithstudent,thatis,hisorhertwoscores,andthefulldatasetyisthecollectionofall22yi's.Theparameterofinterest,theunknownstateofnature,isthetruecorrelation:thecorrelationwewouldseeifwehadamuchlargersamplethan22,maybeevenallpossiblestudents.NowI'vecalledthesamplecorrelationcoecient0.498\^."ThisisFisher'snotation,indicatingthatthestatistic^isstrivingtoestimatethetruecorrelation.The\??"after0.498saysthatwe'dlikesomeideaofhowwell^islikelytoperform.R.A.Fisher's rstpaperin1915derivedtheprobabilitydistributionforthecorrelationproblem:whatIpreviouslycalledf(x),nowf(^)with^takingtheplaceofx.(Itisarathercomplicatedhypergeometricfunction.)Muchmoreimportantly,between1920and1935hedevelopedthetheoryofmaximumlikelihoodestimation,andtheoptimalityoftheMLE.Speakingloosely,maximumlikelihoodisthebestpossiblefrequentistestimationmethod,inthesensethatitminimizestheexpectedsquareddi erencebetween^andtheunknown,nomatterwhatmaybe.Fisher's1915calculationswerecarriedoutinthecontextofabivariatenormaldistribution,thatis,foratwo-dimensionalversionofabell-shapedcurve,whichI'lldiscussalittlelater.Despitepursuingquitesimilarscienti cgoals,thetwofoundersofmathemati-calstatistics,FisherandNeyman,becamebitterrivalsduringthe1930s,withnotagoodwordtosayforeachother.Nevertheless,NeymanessentiallycompletedFisher'sprogrambydevelopingoptimalfrequentistmethodsfortestingandcon -denceintervals.Neyman's90%con denceintervalforthestudentcorrelationcoecientisper-hapsshockinglywide.Itsaysthatexistsin[0:164;0:717];witha5%chanceofmissingoneitherend.Againspeakingroughly,Neyman'sintervalisasshortaspossibleintheabsenceofpriorinformationconcerning.Thepointestimate,^=0:498,lookedprecise,buttheintervalestimateshowshowlittleinformationthereactuallywasinoursampleof22students.Figure5isapictureoftheNeymanconstruction,asphysicistsliketocallit.TheblackcurveisFisher'sprobabilitydistributionfor^iftheparameterequaled0.164,thelowerendpointofthe90%con denceinterval.Here0.164waschosentoputexactly5%oftheprobabilityabovetheobservedvalue^=0:498.Similarly A250-YEARARGUMENT9 Figure6.Je reysBayesposteriordensity(jx)forthe22stu-dents;90%crediblelimits=[0:164;0:718];Neymanlimits[0:164;0:717].Ifwealwayshadsuchniceagreement,peacewouldbreakoutinthelandofstatistics.Thereissomethingspecial,however,aboutthecorrelationproblem,whichI'llgettosoon.Table1.MoreStudents n ^ 22 .498 44 :663 66 :621 88 :553 1 [:415;:662] Iactuallyselectedour22studentsrandomlyfromabiggerdatasetof88.Table1showsthesamplecorrelationcoecient^asthesamplesizeincreased:atn=44theestimatejumpedupfrom0.498to0.663,comingdownabitto0.621atn=66andendingat0.553fortheall88students.Thein nityrowrepresentstheunknownfuture,framedbythe90%Neymanintervalbasedonall88students,2[0:415;0:662];nowagooddealmoreprecisethantheinterval[0:164;0:717]basedononlytheoriginal22.Statisticiansusuallydonothavetheluxuryofpeeringintothefuture.Frequen-tismandBayesianismarecompetingphilosophiesforextrapolatingfromwhatwe 10BRADLEYEFRONcanseetowhatthefuturemighthold.Thatiswhatthe250-yearargumentisreallyabout.5.NuisanceParameters Figure7.Galton's1886distributionofchild'sheightvsparents';ellipsesarecontoursofbest- tbivariatenormaldensity;reddotatbivariateaverage(68:3;68:1).Figure7representsthevery rstbivariatenormaldistribution,datingfrom1886.ItisduetoFrancisGalton,eccentricVictoriangenius,earlydeveloperof ngerprintanalysisandscienti cweatherprediction,andbest-sellingauthorofadventuretravelbooks.Eachofthe928pointsshowsanadultchild'sheightalongthey-axisandtheparent'saverageheightalongthex-axis.Thebigreddotisatthetwograndaverages,about68incheseachwayin1886.SomehowGaltonrealizedthatthepointsweredistributedaccordingtoatwo-dimensionalcorrelatedversionofthebell-shapedcurve.Hewasnomathematicianbuthehadfriendswhowere,andtheydevelopedtheformulaforthebivariatenormaldensity,whichI'lldiscussnext.Theellipsesshowcurvesofequaldensityfromtheformula.Galtonwassomekindofstatisticalsavant.Besidesthebivariatenormaldistri-bution,heusedthispicturetodevelopcorrelation(calledbyhimoriginally\co-relation")andregressiontheory(calledbyhim\regressiontothemean":extremelytallparentshavelessextremelytallchildren,andconverselyforshortness.)Galton'sformulafortheprobabilitydensityfunctionofabivariatenormalran-domvectory=(y1;y2)0isf;(y)=1 2jj�1 2e�1 2(y�)0�1(y�):Here=(1;2)0isthetwo-dimensionalmeanvectorwhileisthe2-by-2positivede nitecovariancematrix.(Itdescribesthevariabilitiesofyiandy2aswellastheir A250-YEARARGUMENT11correlation.)StandardnotationforthedistributionisyN2(;);read\yisbivariatenormalwithmeanandcovariance."Aperspectivepictureofthedensityfunctionwouldshowanestheticallypleasingbell-shapedmountain.InFigure7Ichosetomatchthereddotatthecenter,andtogivethebest-matchingellipsestothepointcloud|inotherwordsIusedthemaximumlikelihoodestimatesofand.Themainthingtonotehereisthatabivariatenormaldistributionhas vefreeparameters,twoforthemeanvectorandthreeforthesymmetricmatrix,andthatall vewillbeunknownintypicalapplications.Forreasonshavingtodowithrelationshipsamongthe veparameters,thecor-relationproblemturnsouttobemisleadinglyeasy.Hereisamoredicult,andmoretypical,problem:supposeweareinterestedintheeigenratio,theratioofthelargesteigenvalueofthematrixtothesumofthetwoeigenvalues,=1 1+2(1�2eigenvaluesof):TheMLEestimateof,^,obtainedfromthe22datapointsgivesmaximumlikelihoodestimate^=0:793??wherethequestionmarksindicatethatwewantsomeassessmentofhowaccurate^isforestimatingthetruevalue.Whatisnottruefortheeigenratioisthatthedistributionofthequantity^we'reinterestedindependsonlyon.Thiswastrueforthecorrelation,ande ectivelyreducedallthecalculationstoonedimension.Nomatterhowwetrytoreparameterizethe ve-parameterbivariatenormaldistribution,therewillstillbefournuisanceparametersinvolved,inadditiontotheeigenratio,andtheydon'tconvenientlygoaway.Somehowtheyhavetobetakenintoaccountbeforeonecananswerthe??question.Bayesianinferencehasasimplewayofdealingwithnuisanceparameters:theyareintegratedoutofthe ve-dimensionalposteriordistribution.However\simple"isn'tnecessarily\correct,\andthiscanbeamajorpointofpracticaldisagreementbetweenfrequentistandBayesianstatisticians.TheheavycurveinFigure8istheBayesposteriordensityfortheeigenratio,startingfromJe reys' ve-dimensionaluninformativepriorandintegratingoutthefournuisanceparameters.Dashedlinesindicatethe90%Bayesposteriorlimitsforthetrueeigenratiogiventhedataforthe22students.Theredtrianglesarefrequen-tist90%limits,obtainedfromabootstrapcalculationIwilldescribenext.Thereisnotabledisagreement|thefrequentistlimitsareshiftedsharplydownwards.Je reys'prior,infact,doesnotgivefrequentisticallyaccuratecon dencelimitsinthiscase,orinamajorityofproblemsaictedwithnuisanceparameters.Other,better,uninformativepriorshavebeenputforward,butforthekindofmassivedataanalysisproblemsI'lldiscusslast,mostBayesiansdonotfeelcompelledtoguaranteegoodfrequentistperformance.6.TheBootstrapandGibbsSamplingIbeganthistalkfromapointin1763,andsofarhavebarelyprogressedpast1950.Sincethattimemodernscienti ctechnologyhaschangedthescopeoftheproblemsstatisticiansdealwith,andhowtheysolvethem.AsI'llshowlast,data A250-YEARARGUMENT13 Figure9.10,000bootstrapeigenratiovaluesfromthestudentscoredata(bivariatenormalmodel);dashedlineshowscon denceweights.10,000^s,sothatsmallervaluescountmore.(Thedashedcurveisthereweightingfunction.)Thebootstrapcon dencelimitsarethe5thand95thpercentilesofthereweighted^s.TheBayesianworldhasalsobeenautomated.\Gibbssampling"isaMarkovChainrandomwalkprocedure,namedafterGibbsdistributioninstatisticalphysics.Giventhepriorandthedata,MarkovchainMonteCarlo(MCMC)producessam-plesfromanotherwisemathematicallyintractableposteriordistribution(jx).(ThehistoryoftheideahassomethingtodowithLosAlamosandtheA-bomb.)MCMCtheoryisperfectlygeneral,butinpracticeitfavorstheuseofconvenientuninformativepriorsoftheJe reysstyle|whichhasalottodowiththeirdomi-nanceincurrentBayesianapplications.7.EmpiricalBayesIwantedtoendwithabig-dataexample,moretypicalofwhatstatisticiansareseeingthesedays.Thedataisfromaprostatecancerstudyinvolving102men,52withprostatecancerand50healthycontrols.Eachmanwasmeasuredonapanelof6033genes(usingmicroarrays,thearchetypeofmodernscienti chigh-throughputdevices.)Foreachgene,astatisticxiwascalculated,thedi erenceinmeansbetweenthecancerpatientsandthehealthycontrols,which,suitablynormalized,shouldbedistributedaccordingtoabell-shapedcurve.Forgenei,thecurvewouldbecenteredati,the\truee ectsize,"xiN(i;1):Wecan'tdirectlyobservei,onlyitsestimatexi. 14BRADLEYEFRONPresumably,ifgeneidoesn'thaveanythingtodowithprostatecancer,theniwillbenearzero.Ofcourse,theinvestigatorswerehopingtospotgeneswithbige ectssizes,eitherpositiveornegative,asacluetothegeneticbasisofprostatecancer. Figure10.Prostatecancerstudy:di erenceestimatesxicom-paringcancerpatientswithhealthycontrols,6033genes.Dashesindicatethe10largestestimates.ThehistograminFigure10showsthe6033e ectsizeestimatesxi.Thelightdashedcurveindicateswhatwewouldseeifnoneofthegeneshadanythingtodowithprostatecancer,thatis,ifallthee ectsizeswerezero.Fortunatelyfortheinvestigators,thatdoesn'tseemtobethecase.Abetter ttothehistogram,called^m(x),showstheheaviertailsofthehistogram,presumablyre ectinggeneswithsubstantiallybige ectsizes.Lookingjustattherightside,I'vemarkedwithlittlereddashesthe10largestxi's.Theseseemwaytoobigtohavezeroe ectsize.Inparticular,thelargestoneofall,fromgene610,hasxi=5:29,almostimpossiblybigif610reallyequalledzero.Butwehavetobecarefulhere.With6033genestoconsideratonce,thelargestobservedvalueswillalmostcertainlyoverstatetheircorrespondinge ectsizes.(An-otherexampleofGalton'sregressiontothemeane ect.)Gene610haswonabig-gnesscontestwith6033competitors.It'swonfortworeasons:ithasagenuinelylargee ectsize,andit'sbeenlucky|therandomnoiseinxihasbeenpositiveratherthannegative|orelseitprobablywouldnothavewon!Thequestionishowtocompensateforthecompetitione ectsandgethonestestimatesforthecontestwinners.There'sanintriguingBayesiananalysisforthissituation.Consideringanyonegene,supposeitse ectsizehassomepriordensity().Wedon'tgettosee, A250-YEARARGUMENT15butratherx,whichisplussomenormalnoise.Ifweknow()wecanuseBayestheoremtooptimallyestimate.Byde nition,themarginaldensityofxisitsdensitytakingaccountofthepriorrandomnessinandthenormalnoise,m(x)=Z1�11 p 2e�1 2(x�)2()d:Tweedie'sformulaisaneatexpressionfortheBayesposteriorexpectationofdeltahavingobservedx,Efjxg=x+d dxlogm(x):ThetroublewithapplyingTweedie'sformulatotheprostatestudyisthatwith-outpriorexperiencewedon'tknow()or,therefore,m(x).ThisisthekindofsituationwherefrequentistsrebelagainstusingBayesianmethods.Thereis,however,anicecompromisemethodavailable,thatgoesbythename\EmpiricalBayes."Ifwedrawasmoothcurvethroughthegreenhistogram,liketheheavycurveinFigure10,wegetareasonableestimate^m(x)ofthemarginaldensitym(x).WecanplugthisintoTweedie'sformulatoestimatetheBayesposteriorexpectationofanyoneigivenitsxi,^Efijxig=xi+d dxlog^m(x) xi:Atthispointwe'veobtainedafrequentistestimateofourBayesexpectation,with-outmakinganypriorassumptionsatall! Figure11.EmpiricalBayesestimatesofEfjxg,theexpectedtruedi erenceigiventheobserveddi erencexi.Figure11graphstheempiricalBayesestimationcurvefortheprostatestudydata.Forgene610attheextremeright,itsobservedvaluex=5:29isreducedto 16BRADLEYEFRONanestimatede ectsizeof4.07(aquantitativeassessmentoftheregressiontothemeane ect).Inasimilarway,allofthexi'sareshrunkbacktowardzero,anditcanbeshownthatdoingsonicelycompensatesforthecompetitione ectsIwasworriedaboutearlier.Thecurvehasaninterestingshape,witha atspotbetween�2and2.Thismeansthatmostofthegenes,93%ofthem,havee ectsizeestimatesnearzero,suggesting,sensibly,thatmostofthegenesaren'tinvolvedinprostatecancerde-velopment.EmpiricalBayesisthatBayes-frequentistcollaborationIreferredtoatthebe-ginningofthistalk|ahopefulsignforfuturestatisticaldevelopments.8.AScoreSheetTable2.ScoreSheet Bayes Frequentist (1)Belief(prior)(2)Principled(3)Onedistribution(4)Dynamic(5)Individual(subjective)(6)Aggressive (1)Behavior(method)(2)Opportunistic(3)Manydistributions(bootstrap?)(4)Static(5)Community(objective)(6)Defensive Table2isascoresheetforthefrequentist/Bayesianargument,thatyoucanusetodecidewhichphilosophicalpartyyouwouldjoinifyouwereanappliedstatistician:(1)Firstandforemost,Bayesianpracticeisboundtopriorbeliefs,whilefre-quentismfocusesonbehavior.TheBayesianrequirementforapriordis-tribution,whatIcalled(),isadeal-breakerforfrequentists,especiallyintheabsenceofgenuinepriorexperience.Ontheotherhand,frequentistanalysisbeginswiththechoiceofaspeci cmethod,whichstrikesBayesiansasarti cialandincoherent.Evenoptimalfrequentistmethodsmaybedis-paragedsincetheoptmalityreferstoaveragesoverhypotheticalfuturedatasets,di erentthantheobserveddatax.Thisleadstoasecondmajordis-tinction:(2)Bayesianismisaneatandfullyprincipledphilosophy,whilefrequentismisagrab-bagofopportunistic,individuallyoptimal,methods.PhilosophersofscienceusuallycomedownstronglyontheBayesianside.(3)OnlyoneprobabilitydistributionisinplayforBayesians,theposteriordistributionIcalled(jx).Frequentistsmuststruggletobalancebehav-ioroverafamilyofpossibledistributions,asillustratedwithNeyman'sconstructionforcon denceintervals.Bayesproceduresoftenhaveanal-luringlysimplejusti cation,perhapsdangerouslyalluringaccordingtofre-quentists.(Bootstrapmethodsareanattempttoreducefrequentismtoaone-distributiontheory.TherearedeeperBayes/bootstrapconnectionsthanIhavediscussedhere.)(4)ThesimplicityoftheBayesianapproachisespeciallyappealingindynamiccontexts,wheredataarrivessequentially,andwhereupdatingone'sbeliefs 18BRADLEYEFRON7.RobertE.KassandLarryWasserman,Theselectionofpriordistributionsbyformalrules,J.Amer.Statist.Assoc.91(1996),no.435,1343{1370,thoroughdiscussionofJe reyspriorsintheiroriginalandmodernforms.8.ErichL.LehmannandJosephP.Romano,TestingStatisticalHypotheses,3rded.,SpringerTextsinStatistics,Springer,NewYork,2005,Section3.5discussesNeyman'sconstruction.MR2135927(2006m:62005)9.KantilalVarichandMardia,JohnT.Kent,andJohnM.Bibby,MultivariateAnalysis,AcademicPress,London,1979,Table1.2.1givesthestudentscoredata.MR560319(81h:62003)StanfordUniversityCurrentaddress:DepartmentofStatistics,390SerraMall,Stanford,CA94305-4065E-mailaddress:brad@stat.stanford.edu