/
TheCounterintuitiveNoninformativePriorfortheBernoulliFamilyMuZhuUniv TheCounterintuitiveNoninformativePriorfortheBernoulliFamilyMuZhuUniv

TheCounterintuitiveNoninformativePriorfortheBernoulliFamilyMuZhuUniv - PDF document

singh
singh . @singh
Follow
342 views
Uploaded On 2021-09-25

TheCounterintuitiveNoninformativePriorfortheBernoulliFamilyMuZhuUniv - PPT Presentation

httpwwwamstatorgpublicationsjsev12n2zhupdfCopyrightc2004byMuZhuandArthurYLuallrightsreservedThistextmaybefreelysharedamongindvidualsbutitmaynotberepublishedinanymediumwithoutexpresswrittenconsen ID: 885450

rey x0000 beta informative x0000 rey informative beta pbayes newyork 1996 xnjp informativeprior reys whenc isbeta seee d2logf forarbitrarilysmall

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "TheCounterintuitiveNoninformativePriorfo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 TheCounter-intuitiveNon-informativePrior
TheCounter-intuitiveNon-informativePriorfortheBernoulliFamilyMuZhuUniversityofWaterlooArthurY.LuRenaissanceTechnologiesCorp.JournalofStatisticsEducationVolume12,Number2(2004), http://www.amstat.org/publications/jse/v12n2/zhu.pdf Copyrightc 2004byMuZhuandArthurY.Lu,allrightsreserved.Thistextmaybefreelysharedamongindviduals,butitmaynotberepublishedinanymediumwithoutexpresswrittenconsentfromtheauthorandadvancenoti cationoftheeditor. KeyWords:BetaDistribution;Conjugatepriors;Maximumlikelihoodestimation;Poste-riormean.AbstractInBayesianstatistics,thechoiceofthepriordistributionisoftencontroversial.Di erentrulesforselectingpriorshavebeensuggestedintheliterature,which,sometimes,producepriorsthataredicultforthestudentstounderstandintuitively.Inthisarticle,weuseasimpleheuristictoillustratetothestudentstherathercounter-intuitivefactthat atpriorsarenotnecessarilynon-informative;andnon-informativepriorsarenotnecessarily at.1IntroductionInBayesiananalysis,selectingpriorsusingJe reys'rule(e.g.,Tanner1996;Gelman,etal.1995)canyieldsomerathercounter-intuitiveresultsthatarehardforstudentstograsp.Considerthefollowingsimplescenario:LetX1;X2;:::;Xnbei.i.d.observationsfromtheBernoulli(p)distribution.Toestimatep,aBayesiananalystwouldputapriordistributiononpandusetheposteriordistributionofptodrawvariousconclusions,e.g.,estimatingpwiththeposteriormean.Whenthereisnostrongprioropiniononwhatpis,itisdesirabletopickapriorthatisnon-informative. 1 Inthissimpl

2 ecase,itismostintuitivetousetheuniformdi
ecase,itismostintuitivetousetheuniformdistributionon[0,1]asanon-informativeprior;itisnon-informativebecauseitsaysthatallpossiblevaluesofpareequallylikelyapriori.However,anon-informativepriorconstructedusingJe rey'sruleisoftheform(seee.g.,Gelman1995)(p)/1 p p(1p): (1) Je rey'sruleismotivatedbyaninvarianceargument:Supposeonepicksp(p)asthepriorforpaccordingtoacertainrule.Inorderforp(p)tobenon-informative,itisarguedthattheparameterizationmustnotin uencethechoiceofp(p),i.e.,ifonere-parameterizestheproblemintermsof=h(p),thentherulemustpick()=p(h1()) dp d asthepriorfor.Givenp,letf(xjp)bethelikelihoodfunction.Je rey'sruleistopickp(p)/(I(p))1 2asaprior,whereI(p)=Ed2logf(xjp) dp2istheFisherInformation.Toseethatthisisinvariantwithrespecttoparameterization,supposewere-parameterizeintermsof=h(p),thenI()=Ed2logf(xj) d2=E d2logf(xjp) dp2dp d2!=I(p)dp d2:ApplyingJe rey'srule,onewouldpickaprioronas()/(I())1 2=(I(p))1 2 dp d =p(p) dp d ;whichsatisi estheinvarianceargument.Je rey'spriorforthissimpleproblemcanbequitecounter-intuitive.Undertheprior(1),itappearsthatsomevaluesofparemorelikelythanothers(seee.g.,Figure1).Therefore 2 intuitively,itappearsthatthispriorisactuallyquiteinformative.Thisisaverydicultpointtoexplaintothestudents.Inthisarticle,weconstructasimple(albeitnaive)argumentandillustratetothestudentswhytheuniformpriorisnotnecessarilythemostnon-i

3 nformative.WewillnotrelyontheFisherinfor
nformative.WewillnotrelyontheFisherinformationorJe reys'invarianceargument.Instead,werelyonaverysimpleandnaiveheuristictojudgethenon-informativenessofaprior:Sincethemaximumlikelihoodestimator(MLE)isnota ectedbyanyprioropinion,wesimplyask:isthereapriorwhichwouldproduceaBayesianestimate(e.g.,posteriormean)thatcoincideswiththeMLE?Ifso,thatpriorcouldberegardedasnon-informativesincetheprioropinionexertsnoin uenceonthe nalestimatewhatsoever.Usingthisnaiveheuristic,wecanseethattheuniformpriorisactuallymoreinformativethanJe reys'prior(1);whereastheleastinformativeprioris,surprisinglyenough,anextremely\opinionated"distributionapproachingtwopointmassesat0and1!Weemphasizeherethatitisnotourintentiontoimplythatournaiveheuristicisthebestorevenanappropriatepointofviewforjudgingthenon-informativenessofdi erentpriorsinBayesiananalysis.Weonlyprovidethisargumentasaninterestingdemonstrationthatcanbeusedintheclassroom.2TheMaximumLikelihoodEstimator(MLE)Withoutconsideringanyprioropinion,atypicalapproachforestimatingpisthemethodofmaximumlikelihood.LetXibeaBernoullirandomvariablewithP(Xi=1)=p;P(Xi=0)=1p;thelog-likelihoodfunctionfortheBernoullidistributionisl(p)=nXi=1logpxi(1p)1xi=nXi=1xilogp+(1xi)log(1p):Weshallwritex=Pni=1xithroughoutthearticle.Tomaximizethislog-likelihood,the rstorderconditionisx pnx 1p=0;whichgives^pmle=x n:3TheBayesianEstimatorTheBayesianapproachtoestimationstartswithapriordistributionontheparameterofinterest.Often,weha

4 venopriorknowledgeonp.Tore ectsuchlackof
venopriorknowledgeonp.Tore ectsuchlackofknowledge,themostintuitivechoiceistoputauniformprioronp,i.e.,0(p)=1forp2[0;1].Thissaysthat,apriori,pcouldbeanythingbetween0and1withequalchance.Then,theposterior 3 distributionofpisgivenby(Bayes'Theorem):1(pjx1;:::;xn)=f(x1;:::;xnjp)0(p) R10f(x1;:::;xnjp)0(p)dp:Weshall ndthisposteriordistributionmoregenerallybelowusingtheideaoftheconjugateprior.3.1TheBetaConjugatePriorConsidertheBeta( ; )distributionasthepriorforp,i.e.,0(p)=( + ) ( )( )p 1(1p) 1:TheuniformdistributionisaspecialcaseoftheBetadistribution,with = =1.ThereasonwhyonewouldconsiderusingtheBetadistributionasthepriorisbecausetheBetadistributionandtheBernoullidistributionformaconjugatepair,sothattheposteriordis-tributionisstillaBeta(e.g.,DeGroot1970).Thisgivesussomeanalyticconvenience.Toseethis,notethat1(p)/f(x1;x2;:::;xnjp)0(p) (2) =px(1p)nxp 1(1p) 1 (3) =p +x1(1p) +nx1 (4) isBeta( +x; +nx).ThefollowingpropertiesoftheBetadistributionareuseful:IfpBeta( ; ),thenE(p)= + ; (5) andVar(p)= ( + )2( + +1): (6) WecannowwritedownageneralformulaforobtainingtheBayesianposteriormean,^pbayes:if0(p)isBeta( ; ),then^pbayes= +x + +n: (7) Therefore,undertheuniformprior(i.e., = =1),theposteriormeanis^pbayes=1+x 2+n: 4 Remark1.Forthesimplepurposeofdemonstration,weusetheposteriormeanastheBayesianestimate.However,weemphasizethatgenerallyonecouldusetheposteriormed

5 ianormodeasaBayesianpointestimateaswell.
ianormodeasaBayesianpointestimateaswell.3.2TheE ectsofDi erentPriorsHowdotheparameters and (i.e.,usingdi erentpriorsintheBetafamily)a ecttheoutcome?Forthisdiscussion,wefocusonaparticularsub-familyofBetadistributionswith = =c,i.e.,0(p)isBeta(c,c).Again,theuniformdistributionisamemberofthissub-family,withc=1.Furthermore,iftheprioronpBeta(c;c),thenE(p)=c c+c=1 2andVar(p)=c2 4c2(2c+1)=1 4(2c+1): (8) Itisclearfrom(7)thatthepriorparametercin uencestheposteriormeanasifanextra2cobservations,equallysplitbetweenzerosandones,wereaddedtothesample.Therefore,thelargercis,themorein uencethepriorwillhaveontheposteriormean.Theuniformprior(c=1)addstwoobservations;Je rey'sprior,whichaccordingtoequation(1)correspondstoc=1 2,addsoneextraobservation.ItisinthissensethatJe rey'spriorisactuallylessin uentialthantheuniformprior.Sincethepriorvarianceisclearlyadecreasingfunctioninc(8),thisalsosaysthatthelargerthepriorvariance,thelessin uentialtheprioris,whichmakesintuitivesense:alargepriorvariancewouldnormallyindicatearelativelyweakprioropinion.Inviewofthis,twoextremecasesbecomequiteinteresting:c!1andc!0.Case1:c!1.Itiseasytoseefrom(7)thatasc!1,wehave^pbayes=1 2,thesameasthepriormeanregardlessofwhattheobservedoutcomesare.Inotherwords,ourprioropinionofpissostrongthatitcannotbechangedbytheobservedoutcomes.From(8),weseethatthepriorvarianceapproaches0asc!1.Thisis,again,consistentwithourintuition:thesmallpriorvariancemeansthatone'spriorbeliefisheavilyconce

6 ntratedonthepointp=1 2,soheavythattheobs
ntratedonthepointp=1 2,soheavythattheobservedoutcomescouldnotalterthisbeliefinanyway.Case2:c!0.Followingthesamelogic,itisclearfrom(7)thattheleastin uentialpriorinoursub-familywouldhavebeentheonewithc=0.Usingsuchaprior,theposteriormeanwouldhavebeenthesameastheMLE;i.e.,itwouldhavebeenentirelydeterminedbytheobservedoutcomes.ButtheBeta(0,0)distributionisnotde ned.Therefore,weconsiderthedistributionBeta(;)forarbitrarilysmall�0.Tounderstandthebehaviorofthisdistribution,wecanexaminethelimitingdistributionasc!0:B0;0=limc!0Beta(c;c): Theorem1 ThelimitingdistributionB0;0consistsoftwoequalpointmassesat0and1. 5 From(8),itcanbeseenthatthevarianceofB0;0is1 4;theabovetheorem(seeAppendixforaproof)isduetothefollowingfact:forasymmetricdistributionwithacompactsupportontheunitintervaltohavevariance1 4,itmustconsistofjusttwoequalpointmassesat0and1.Theorem1saysthatthepriordistributionBeta(;)witharbitrarilysmall�0approachestwopointmassesat0and1.Suchapriorbelief,ofcourse,seemsextremelystrong,sinceitsayspisessentiallyeither0or1.Intuitively,onewouldconsidersuchastrongpriorbelieftobeextremelyunreasonable,butthisisthepriorthatwouldyieldaposteriormeanascloseaspossibletotheMLE.Inthissense,thepriorBeta(;)forarbitrarilysmall�0,whichwouldotherwiseappearstrong,couldactuallyberegardedastheleastin uentialpriorinthisfamily.Remark2.Theorem1statesthatthelimitingdistributionB0;0istheBernoulli(1 2)distri-bution,which,strictlyspeaking,isnot

7 amemberoftheBetafamily.Moreover,notethat
amemberoftheBetafamily.Moreover,notethatifB0;0isactuallyusedasaprior,thentheposteriordistributionisnotde nedunlessalltheobservationsX1;X2;:::;Xnareidentical.ThereforeB0;0isinitselfquiteanin uentialprior,butBeta(;)isnot,althoughforarbitrarilysmall,itencodesessentiallythesameprioropinionasB0;0,whosepredictivedistributionputshalfprobabilityonallonesandhalfonallzeros.4TheNon-informativePriorThelessonfromthisdiscussionisextremelyinteresting;ittellsusthat atpriors(suchastheuniformprior)arenotalwaysthesamethingasnon-informativepriors.Aseeminglyinformativepriorcanactuallybequiteweakinthesensethatitdoesnotin uencetheposterioropinionverymuch.ItisclearinourexamplethattheMLEistheresultofusingaweakprior,whereasthemostintuitivenon-informativeprior(theuniformprior)isnotasweakornon-informativeasonewouldhavethought.We'vealsoseenthattheleastin uentialprior,Beta(;)forarbitrarilysmall�0,isalsotheonewiththelargestvarianceinthesub-family,whereasthemost\stubborn"prior(whenc!1)isalsotheonewiththesmallestvariance(8).Generally,alargervariancewouldalsoimplya atterdistribution.ButsincethefamilyofBetadistributionshascompactsupportontheunitinterval,thevarianceismaximizedbyaratherextremepriorinsteadoftheusual atprior,Beta(1,1).Remark3.Anothercommonpriorthatisusedintheliteratureforthisproblem(seee.g.,Zellner1996,p.40)isanimproperprioroftheform0(p)/1 p(1p); (9) alsocalledtheHaldaneprior.ItisimproperbecauseR100(p)dp=1andhenceitisnotaproperdistrib

8 utionfunction.Zellner(1996)notesthatthis
utionfunction.Zellner(1996)notesthatthisimproperpriorcorrespondsto 6 Figure1:TheBetafamilyofdistributionson[0;1].When = =c,theBetadistributionissymmetric.Whenc=1,itistheuniformdistribution.Whenc�1,ithasamaximumat1 2.Whenc1,ithasaminimumat1 2andtwomaximaat0and1. 7 puttinga atprioron=logp 1p;thelog-odds.Inotherwords,onthelog-oddsscale,thisimproperprioristheusual at,non-informativeprior.Otherthantheimproperness,however,wecanseethat(9)iscloselyrelatedtothelimitingdistributionB0;0:theyaretwodi erentwaysofexpressingtheoth-erwiseunde nedBeta(0,0)distribution.Remark4.Onecanusethissimpleheuristicinothersituationsaswelltoevaluatethenon-informativenessofdi erentpriors.Forexample,considerX1;X2;:::;XnN(;2),where2isknown.Let()N(0;20)betheprioron.Thenitcanbeshown(e.g.,Tanner1996,p.17)thattheposteriormeanis01=20 1=20+n=2+xn=2 1=20+n=2:ItisthenclearthattheposteriormeanagreeswiththeMLExifandonlyif0!1,i.e.,ifweputaprioronthatisessentially at.Inthiscase,usingoursimpleheuristic,wearriveattheintuitiveconclusionthatthemostnon-informativepriorisindeed at.Remark5.InRemark1,weemphasizedthatthechoicetofocusontheposteriormeanastheBayesianestimateisbasedonconvenience.TheposteriorBetadistributionisgenerallynotsymmetric,sotheposteriormean,medianandmodedonotcoincide.Thismeanssomeofthephenomenaillustratedabovewillnotholdifadi erentposteriorpointestimateisused.Wedonotworryaboutthispointsomuchheresincethematerialpresentedhere

9 isonlymeantasaclassroomdemonstrationtohe
isonlymeantasaclassroomdemonstrationtohelpthestudentsappreciatewhynon-informativepriorsarenotalways at.However,theposteriormean,medianandmodedoagreeforthenormalfamily,butofcourse,therethepriorregardedasnon-informativebythecurrentheuristicargumentisessentially at(seeRemark4).5ConclusionThattheposteriormeancoincideswiththeMLEisnotnecessarilytherightcriteriontojudgewhetherapriorisnon-informative,butitprovidesanextremelysimpleande ectivedemonstrationofwhysometimes atpriorsarenotnecessarilynon-informative,whilenon-informativepriorsarenotalways at,withouthavingtoresorttodeepstatisticaltheory.Thissimpleargument,inourexperiences,hasprovenrelativelyeasyforthestudentstoappreciate.Theproofsbelowintheappendixcanalsobeeasilyunderstoodwithbasicstatisticsknowledgeatthelevelof,say,Rice1995.Sinceitisclearfromthisdiscussionthatdi erentpriorsmayleadtodi erentposteriorestimates,itisoftendesirabletoreportseveralposteriorestimatesusingdi erentpriors,especiallywhenthechoiceofthepriorisfairlyarbitrary,i.e.,whenthereisnostrongprioropinion. 8 AcknowledgmentThe rstauthorwouldliketothankHughChipmanforaninterestingdiscussiononthissubjectaswellasthetheNaturalSciencesandEngineeringResearchCouncilofCanadaforprovidingpartialresearchsupport.TheauthorsarealsogratefultotheEditor,anAssociateEditorandtworefereesfortheirsuggestions,correctionsandencouragement.AppendixThisappendixcontainsarelativelysimpleproofofTheorem1.LetXbearandomvariablewithB0;0asitsdistribution;l

10 etf(x)beitsprobabilityfunction.Clearlyf(
etf(x)beitsprobabilityfunction.Clearlyf(x)issymmetricabout1 2.From(8),weknowthatVar(X)=1 4.Nowsupposef(x)isnotjusttwopointmassesat0and1.Thenthereexist01 2and�0suchthatg()Z1f(x)dx�:Becausef(x)issymmetricabout1 2,itfollowsthatVar(X)=Z10x1 22f(x)dx=2Z0x1 22f(x)dx+Z1x1 22f(x)dx201 22Z0f(x)dx+1 22Z1f(x)dx=1 21g() 2+2+1 4g()=1 4(1)g()1 4(1):Since01 2=)(1)�0and�0,thisisacontradiction.Thereforef(x)mustbejusttwopointmassesat0and1.Thesymmetryabout1 2immediatelyimpliesthatthetwopointmassesareequal. 9 ReferencesDeGroot,M.H.(1970),OptimalStatisticalDecisions,NewYork:McGrawHill.Gelman,A.B.,Carlin,J.S.,Stern,H.S.,andRubin,D.B.(1995),BayesianDataAnalysis,London;NewYork:ChapmanandHall.Rice,J.A.(1995),MathematicalStatisticsandDataAnalysis2nded.,Belmont,CA:DuxburyPress.Tanner,A.T.(1996),ToolsforStatisticalInference:MethodsfortheExplorationofPosteriorDistributionsandLikelihoodFunctions,NewYork:Springer-Verlag.Zellner,A.(1996),AnIntroductiontoBayesianInferenceinEconometrics,NewYork;Chich-ester:JohnWiley. MuZhuDepartmentofStatisticsandActuarialScienceUniversityofWaterlooWaterloo,ONN2L3G1Canada m3zhu@uwaterloo.ca ArthurY.LuRenaissanceTechnologiesCorp.600Route25AEastSetauket,NY11733 alu@rentec.com Volume12(2004) j Archive j Index j DataArchive j InfromationService j EditorialBoard j GuidelinesforAuthors j GuidelinesforDataContributors j HomePage j ContactJSE j ASAPublicat

Related Contents


Next Show more