TheCounterintuitiveNoninformativePriorfortheBernoulliFamilyMuZhuUniv - PDF document

singh . @singh

342 views
Uploaded On 2021-09-25

TheCounterintuitiveNoninformativePriorfortheBernoulliFamilyMuZhuUniv - PPT Presentation

httpwwwamstatorgpublicationsjsev12n2zhupdfCopyrightc2004byMuZhuandArthurYLuallrightsreservedThistextmaybefreelysharedamongindvidualsbutitmaynotberepublishedinanymediumwithoutexpresswrittenconsen ID: 885450

rey x0000 beta informative x0000 rey informative beta pbayes newyork 1996 xnjp informativeprior reys whenc isbeta seee d2logf forarbitrarilysmall

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/885450" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Pdf The PPT/PDF document "TheCounterintuitiveNoninformativePriorfo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

1 TheCounter-intuitiveNon-informativePrior
TheCounter-intuitiveNon-informativePriorfortheBernoulliFamilyMuZhuUniversityofWaterlooArthurY.LuRenaissanceTechnologiesCorp.JournalofStatisticsEducationVolume12,Number2(2004), http://www.amstat.org/publications/jse/v12n2/zhu.pdf Copyrightc 2004byMuZhuandArthurY.Lu,allrightsreserved.Thistextmaybefreelysharedamongindviduals,butitmaynotberepublishedinanymediumwithoutexpresswrittenconsentfromtheauthorandadvancenoticationoftheeditor. KeyWords:BetaDistribution;Conjugatepriors;Maximumlikelihoodestimation;Poste-riormean.AbstractInBayesianstatistics,thechoiceofthepriordistributionisoftencontroversial.Dierentrulesforselectingpriorshavebeensuggestedintheliterature,which,sometimes,producepriorsthataredicultforthestudentstounderstandintuitively.Inthisarticle,weuseasimpleheuristictoillustratetothestudentstherathercounter-intuitivefactthat atpriorsarenotnecessarilynon-informative;andnon-informativepriorsarenotnecessarily at.1IntroductionInBayesiananalysis,selectingpriorsusingJereys'rule(e.g.,Tanner1996;Gelman,etal.1995)canyieldsomerathercounter-intuitiveresultsthatarehardforstudentstograsp.Considerthefollowingsimplescenario:LetX1;X2;:::;Xnbei.i.d.observationsfromtheBernoulli(p)distribution.Toestimatep,aBayesiananalystwouldputapriordistributiononpandusetheposteriordistributionofptodrawvariousconclusions,e.g.,estimatingpwiththeposteriormean.Whenthereisnostrongprioropiniononwhatpis,itisdesirabletopickapriorthatisnon-informative. 1 Inthissimpl

2 ecase,itismostintuitivetousetheuniformdi
ecase,itismostintuitivetousetheuniformdistributionon[0,1]asanon-informativeprior;itisnon-informativebecauseitsaysthatallpossiblevaluesofpareequallylikelyapriori.However,anon-informativepriorconstructedusingJerey'sruleisoftheform(seee.g.,Gelman1995)(p)/1 p p(1p): (1) Jerey'sruleismotivatedbyaninvarianceargument:Supposeonepicksp(p)asthepriorforpaccordingtoacertainrule.Inorderforp(p)tobenon-informative,itisarguedthattheparameterizationmustnotin uencethechoiceofp(p),i.e.,ifonere-parameterizestheproblemintermsof=h(p),thentherulemustpick()=p(h1())dp dasthepriorfor.Givenp,letf(xjp)bethelikelihoodfunction.Jerey'sruleistopickp(p)/(I(p))1 2asaprior,whereI(p)=Ed2logf(xjp) dp2istheFisherInformation.Toseethatthisisinvariantwithrespecttoparameterization,supposewere-parameterizeintermsof=h(p),thenI()=Ed2logf(xj) d2=E d2logf(xjp) dp2dp d2!=I(p)dp d2:ApplyingJerey'srule,onewouldpickaprioronas()/(I())1 2=(I(p))1 2dp d=p(p)dp d;whichsatisiestheinvarianceargument.Jerey'spriorforthissimpleproblemcanbequitecounter-intuitive.Undertheprior(1),itappearsthatsomevaluesofparemorelikelythanothers(seee.g.,Figure1).Therefore 2 intuitively,itappearsthatthispriorisactuallyquiteinformative.Thisisaverydicultpointtoexplaintothestudents.Inthisarticle,weconstructasimple(albeitnaive)argumentandillustratetothestudentswhytheuniformpriorisnotnecessarilythemostnon-i

3 nformative.WewillnotrelyontheFisherinfor
nformative.WewillnotrelyontheFisherinformationorJereys'invarianceargument.Instead,werelyonaverysimpleandnaiveheuristictojudgethenon-informativenessofaprior:Sincethemaximumlikelihoodestimator(MLE)isnotaectedbyanyprioropinion,wesimplyask:isthereapriorwhichwouldproduceaBayesianestimate(e.g.,posteriormean)thatcoincideswiththeMLE?Ifso,thatpriorcouldberegardedasnon-informativesincetheprioropinionexertsnoin uenceonthenalestimatewhatsoever.Usingthisnaiveheuristic,wecanseethattheuniformpriorisactuallymoreinformativethanJereys'prior(1);whereastheleastinformativeprioris,surprisinglyenough,anextremely\opinionated"distributionapproachingtwopointmassesat0and1!Weemphasizeherethatitisnotourintentiontoimplythatournaiveheuristicisthebestorevenanappropriatepointofviewforjudgingthenon-informativenessofdierentpriorsinBayesiananalysis.Weonlyprovidethisargumentasaninterestingdemonstrationthatcanbeusedintheclassroom.2TheMaximumLikelihoodEstimator(MLE)Withoutconsideringanyprioropinion,atypicalapproachforestimatingpisthemethodofmaximumlikelihood.LetXibeaBernoullirandomvariablewithP(Xi=1)=p;P(Xi=0)=1p;thelog-likelihoodfunctionfortheBernoullidistributionisl(p)=nXi=1logpxi(1p)1xi=nXi=1xilogp+(1xi)log(1p):Weshallwritex=Pni=1xithroughoutthearticle.Tomaximizethislog-likelihood,therstorderconditionisx pnx 1p=0;whichgives^pmle=x n:3TheBayesianEstimatorTheBayesianapproachtoestimationstartswithapriordistributionontheparameterofinterest.Often,weha

4 venopriorknowledgeonp.Tore ectsuchlackof
venopriorknowledgeonp.Tore ectsuchlackofknowledge,themostintuitivechoiceistoputauniformprioronp,i.e.,0(p)=1forp2[0;1].Thissaysthat,apriori,pcouldbeanythingbetween0and1withequalchance.Then,theposterior 3 distributionofpisgivenby(Bayes'Theorem):1(pjx1;:::;xn)=f(x1;:::;xnjp)0(p) R10f(x1;:::;xnjp)0(p)dp:Weshallndthisposteriordistributionmoregenerallybelowusingtheideaoftheconjugateprior.3.1TheBetaConjugatePriorConsidertheBeta(;)distributionasthepriorforp,i.e.,0(p)=(+) ()()p1(1p)1:TheuniformdistributionisaspecialcaseoftheBetadistribution,with==1.ThereasonwhyonewouldconsiderusingtheBetadistributionasthepriorisbecausetheBetadistributionandtheBernoullidistributionformaconjugatepair,sothattheposteriordis-tributionisstillaBeta(e.g.,DeGroot1970).Thisgivesussomeanalyticconvenience.Toseethis,notethat1(p)/f(x1;x2;:::;xnjp)0(p) (2) =px(1p)nxp1(1p)1 (3) =p+x1(1p)+nx1 (4) isBeta(+x;+nx).ThefollowingpropertiesoftheBetadistributionareuseful:IfpBeta(;),thenE(p)= +; (5) andVar(p)= (+)2(++1): (6) WecannowwritedownageneralformulaforobtainingtheBayesianposteriormean,^pbayes:if0(p)isBeta(;),then^pbayes=+x ++n: (7) Therefore,undertheuniformprior(i.e.,==1),theposteriormeanis^pbayes=1+x 2+n: 4 Remark1.Forthesimplepurposeofdemonstration,weusetheposteriormeanastheBayesianestimate.However,weemphasizethatgenerallyonecouldusetheposteriormed

5 ianormodeasaBayesianpointestimateaswell.
ianormodeasaBayesianpointestimateaswell.3.2TheEectsofDierentPriorsHowdotheparametersand(i.e.,usingdierentpriorsintheBetafamily)aecttheoutcome?Forthisdiscussion,wefocusonaparticularsub-familyofBetadistributionswith==c,i.e.,0(p)isBeta(c,c).Again,theuniformdistributionisamemberofthissub-family,withc=1.Furthermore,iftheprioronpBeta(c;c),thenE(p)=c c+c=1 2andVar(p)=c2 4c2(2c+1)=1 4(2c+1): (8) Itisclearfrom(7)thatthepriorparametercin uencestheposteriormeanasifanextra2cobservations,equallysplitbetweenzerosandones,wereaddedtothesample.Therefore,thelargercis,themorein uencethepriorwillhaveontheposteriormean.Theuniformprior(c=1)addstwoobservations;Jerey'sprior,whichaccordingtoequation(1)correspondstoc=1 2,addsoneextraobservation.ItisinthissensethatJerey'spriorisactuallylessin uentialthantheuniformprior.Sincethepriorvarianceisclearlyadecreasingfunctioninc(8),thisalsosaysthatthelargerthepriorvariance,thelessin uentialtheprioris,whichmakesintuitivesense:alargepriorvariancewouldnormallyindicatearelativelyweakprioropinion.Inviewofthis,twoextremecasesbecomequiteinteresting:c!1andc!0.Case1:c!1.Itiseasytoseefrom(7)thatasc!1,wehave^pbayes=1 2,thesameasthepriormeanregardlessofwhattheobservedoutcomesare.Inotherwords,ourprioropinionofpissostrongthatitcannotbechangedbytheobservedoutcomes.From(8),weseethatthepriorvarianceapproaches0asc!1.Thisis,again,consistentwithourintuition:thesmallpriorvariancemeansthatone'spriorbeliefisheavilyconce

6 ntratedonthepointp=1 2,soheavythattheobs
ntratedonthepointp=1 2,soheavythattheobservedoutcomescouldnotalterthisbeliefinanyway.Case2:c!0.Followingthesamelogic,itisclearfrom(7)thattheleastin uentialpriorinoursub-familywouldhavebeentheonewithc=0.Usingsuchaprior,theposteriormeanwouldhavebeenthesameastheMLE;i.e.,itwouldhavebeenentirelydeterminedbytheobservedoutcomes.ButtheBeta(0,0)distributionisnotdened.Therefore,weconsiderthedistributionBeta(;)forarbitrarilysmall�0.Tounderstandthebehaviorofthisdistribution,wecanexaminethelimitingdistributionasc!0:B0;0=limc!0Beta(c;c): Theorem1 ThelimitingdistributionB0;0consistsoftwoequalpointmassesat0and1. 5 From(8),itcanbeseenthatthevarianceofB0;0is1 4;theabovetheorem(seeAppendixforaproof)isduetothefollowingfact:forasymmetricdistributionwithacompactsupportontheunitintervaltohavevariance1 4,itmustconsistofjusttwoequalpointmassesat0and1.Theorem1saysthatthepriordistributionBeta(;)witharbitrarilysmall�0approachestwopointmassesat0and1.Suchapriorbelief,ofcourse,seemsextremelystrong,sinceitsayspisessentiallyeither0or1.Intuitively,onewouldconsidersuchastrongpriorbelieftobeextremelyunreasonable,butthisisthepriorthatwouldyieldaposteriormeanascloseaspossibletotheMLE.Inthissense,thepriorBeta(;)forarbitrarilysmall�0,whichwouldotherwiseappearstrong,couldactuallyberegardedastheleastin uentialpriorinthisfamily.Remark2.Theorem1statesthatthelimitingdistributionB0;0istheBernoulli(1 2)distri-bution,which,strictlyspeaking,isnot

7 amemberoftheBetafamily.Moreover,notethat
amemberoftheBetafamily.Moreover,notethatifB0;0isactuallyusedasaprior,thentheposteriordistributionisnotdenedunlessalltheobservationsX1;X2;:::;Xnareidentical.ThereforeB0;0isinitselfquiteanin uentialprior,butBeta(;)isnot,althoughforarbitrarilysmall,itencodesessentiallythesameprioropinionasB0;0,whosepredictivedistributionputshalfprobabilityonallonesandhalfonallzeros.4TheNon-informativePriorThelessonfromthisdiscussionisextremelyinteresting;ittellsusthat atpriors(suchastheuniformprior)arenotalwaysthesamethingasnon-informativepriors.Aseeminglyinformativepriorcanactuallybequiteweakinthesensethatitdoesnotin uencetheposterioropinionverymuch.ItisclearinourexamplethattheMLEistheresultofusingaweakprior,whereasthemostintuitivenon-informativeprior(theuniformprior)isnotasweakornon-informativeasonewouldhavethought.We'vealsoseenthattheleastin uentialprior,Beta(;)forarbitrarilysmall�0,isalsotheonewiththelargestvarianceinthesub-family,whereasthemost\stubborn"prior(whenc!1)isalsotheonewiththesmallestvariance(8).Generally,alargervariancewouldalsoimplya atterdistribution.ButsincethefamilyofBetadistributionshascompactsupportontheunitinterval,thevarianceismaximizedbyaratherextremepriorinsteadoftheusual atprior,Beta(1,1).Remark3.Anothercommonpriorthatisusedintheliteratureforthisproblem(seee.g.,Zellner1996,p.40)isanimproperprioroftheform0(p)/1 p(1p); (9) alsocalledtheHaldaneprior.ItisimproperbecauseR100(p)dp=1andhenceitisnotaproperdistrib

8 utionfunction.Zellner(1996)notesthatthis
utionfunction.Zellner(1996)notesthatthisimproperpriorcorrespondsto 6 Figure1:TheBetafamilyofdistributionson[0;1].When==c,theBetadistributionissymmetric.Whenc=1,itistheuniformdistribution.Whenc�1,ithasamaximumat1 2.Whenc1,ithasaminimumat1 2andtwomaximaat0and1. 7 puttinga atprioron=logp 1p;thelog-odds.Inotherwords,onthelog-oddsscale,thisimproperprioristheusual at,non-informativeprior.Otherthantheimproperness,however,wecanseethat(9)iscloselyrelatedtothelimitingdistributionB0;0:theyaretwodierentwaysofexpressingtheoth-erwiseundenedBeta(0,0)distribution.Remark4.Onecanusethissimpleheuristicinothersituationsaswelltoevaluatethenon-informativenessofdierentpriors.Forexample,considerX1;X2;:::;XnN(;2),where2isknown.Let()N(0;20)betheprioron.Thenitcanbeshown(e.g.,Tanner1996,p.17)thattheposteriormeanis01=20 1=20+n=2+xn=2 1=20+n=2:ItisthenclearthattheposteriormeanagreeswiththeMLExifandonlyif0!1,i.e.,ifweputaprioronthatisessentially at.Inthiscase,usingoursimpleheuristic,wearriveattheintuitiveconclusionthatthemostnon-informativepriorisindeed at.Remark5.InRemark1,weemphasizedthatthechoicetofocusontheposteriormeanastheBayesianestimateisbasedonconvenience.TheposteriorBetadistributionisgenerallynotsymmetric,sotheposteriormean,medianandmodedonotcoincide.Thismeanssomeofthephenomenaillustratedabovewillnotholdifadierentposteriorpointestimateisused.Wedonotworryaboutthispointsomuchheresincethematerialpresentedhere

9 isonlymeantasaclassroomdemonstrationtohe
isonlymeantasaclassroomdemonstrationtohelpthestudentsappreciatewhynon-informativepriorsarenotalways at.However,theposteriormean,medianandmodedoagreeforthenormalfamily,butofcourse,therethepriorregardedasnon-informativebythecurrentheuristicargumentisessentially at(seeRemark4).5ConclusionThattheposteriormeancoincideswiththeMLEisnotnecessarilytherightcriteriontojudgewhetherapriorisnon-informative,butitprovidesanextremelysimpleandeectivedemonstrationofwhysometimes atpriorsarenotnecessarilynon-informative,whilenon-informativepriorsarenotalways at,withouthavingtoresorttodeepstatisticaltheory.Thissimpleargument,inourexperiences,hasprovenrelativelyeasyforthestudentstoappreciate.Theproofsbelowintheappendixcanalsobeeasilyunderstoodwithbasicstatisticsknowledgeatthelevelof,say,Rice1995.Sinceitisclearfromthisdiscussionthatdierentpriorsmayleadtodierentposteriorestimates,itisoftendesirabletoreportseveralposteriorestimatesusingdierentpriors,especiallywhenthechoiceofthepriorisfairlyarbitrary,i.e.,whenthereisnostrongprioropinion. 8 AcknowledgmentTherstauthorwouldliketothankHughChipmanforaninterestingdiscussiononthissubjectaswellasthetheNaturalSciencesandEngineeringResearchCouncilofCanadaforprovidingpartialresearchsupport.TheauthorsarealsogratefultotheEditor,anAssociateEditorandtworefereesfortheirsuggestions,correctionsandencouragement.AppendixThisappendixcontainsarelativelysimpleproofofTheorem1.LetXbearandomvariablewithB0;0asitsdistribution;l

10 etf(x)beitsprobabilityfunction.Clearlyf(
etf(x)beitsprobabilityfunction.Clearlyf(x)issymmetricabout1 2.From(8),weknowthatVar(X)=1 4.Nowsupposef(x)isnotjusttwopointmassesat0and1.Thenthereexist01 2and�0suchthatg()Z1f(x)dx�:Becausef(x)issymmetricabout1 2,itfollowsthatVar(X)=Z10x1 22f(x)dx=2Z0x1 22f(x)dx+Z1x1 22f(x)dx201 22Z0f(x)dx+1 22Z1f(x)dx=1 21g() 2+2+1 4g()=1 4(1)g()1 4(1):Since01 2=)(1)�0and�0,thisisacontradiction.Thereforef(x)mustbejusttwopointmassesat0and1.Thesymmetryabout1 2immediatelyimpliesthatthetwopointmassesareequal. 9 ReferencesDeGroot,M.H.(1970),OptimalStatisticalDecisions,NewYork:McGrawHill.Gelman,A.B.,Carlin,J.S.,Stern,H.S.,andRubin,D.B.(1995),BayesianDataAnalysis,London;NewYork:ChapmanandHall.Rice,J.A.(1995),MathematicalStatisticsandDataAnalysis2nded.,Belmont,CA:DuxburyPress.Tanner,A.T.(1996),ToolsforStatisticalInference:MethodsfortheExplorationofPosteriorDistributionsandLikelihoodFunctions,NewYork:Springer-Verlag.Zellner,A.(1996),AnIntroductiontoBayesianInferenceinEconometrics,NewYork;Chich-ester:JohnWiley. MuZhuDepartmentofStatisticsandActuarialScienceUniversityofWaterlooWaterloo,ONN2L3G1Canada m3zhu@uwaterloo.ca ArthurY.LuRenaissanceTechnologiesCorp.600Route25AEastSetauket,NY11733 alu@rentec.com Volume12(2004) j Archive j Index j DataArchive j InfromationService j EditorialBoard j GuidelinesforAuthors j GuidelinesforDataContributors j HomePage j ContactJSE j ASAPublicat

TheCounterintuitiveNoninformativePriorfortheBernoulliFamilyMuZhuUniv - PDF document

TheCounterintuitiveNoninformativePriorfortheBernoulliFamilyMuZhuUniv - PPT Presentation

Share:

Link:

Embed:

Related Contents