3 largesample properties CB 101 1 FINITESAMPLE PROPERTIES How an estimator performs for 64257nite number of observations Estimator Parameter Criteria for evaluating estimators Bias does EW Variance of you would like an estimator with a smaller varia ID: 28797
Download Pdf The PPT/PDF document "Point Estimation properties of estimator..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
PointEstimation:propertiesofestimatorsnite-sampleproperties(CB7.3)large-sampleproperties(CB10.1)1FINITE-SAMPLEPROPERTIESHowanestimatorperformsfornitenumberofobservationsn.Estimator:WParameter:Criteriaforevaluatingestimators:Bias:doesEW=?VarianceofW(youwouldlikeanestimatorwithasmallervariance)Example:X1;:::;Xni.i.d.(;2)Unknownparametersareand2.Consider:^n1 nPiXi,estimatorof^2n1 nPi(XiXn)2,estimatorof2.Bias:E^n=1 nn=.Sounbiased.Var^n=1 n2n2=1 n2.E^2=E(1 nXi(XiXn)2)=1 nXi(EX2i2EXiXn+EX2n)=1 nn(2+2)2(2+2 n)+2 n+2=n1 n2:1 Henceitisbiased.Toxthisbias,considertheestimators2n1 n1Pi(XiXn)2,andEs2n=2(unbiased).Mean-squarederror(MSE)ofWisE(W)2.Commoncriterionforcomparingestimators.Decompose:MSE(W)=VW+(EW)2=Variance+(Bias)2.Hence,foranunbiasedestimator:MSE(W)=VW.Example:X1;:::;XnU[0;].f(X)=1=;x2[0;].Considerestimator^n2Xn.E^n=21 nEPiXi=21 n1 2n=.SounbiasedMSE(^N)=V^n=4 n2PiVXi=2 3nConsiderestimator~nmax(X1;:::;Xn).Inordertoderivemoments,startbyderivingCDF:P(~nz)=P(X1z;X2z;:::;Xnz)=nYi=1P(Xiz)=z nifz1otherwiseThereforef~n(z)=nz n11 ,for0x.E(~n)=Z0znz n11 dz=n nZ0zndz=n n+1:Bias(~n)==(n+1)E(~2n)=n nR0zn+1dz=n n+22.HenceV~n=2n n+2n n+12=2n (n+2)(n+1)2.Accordingly,MSE=22 (n+2)(n+1)2 Continuethepreviousexample.Redene~n=n+1 nmax(X1;:::;Xn).Nowbothestimators^nand~nareunbiased.Whichisbetter?V^n=2 3n=O(1=n).V~n=n+1 n2V(max(X1;:::;Xn))=21 n(n+2)=O(1=n2).Hence,fornlargeenough,~nhasasmallervariance,andinthissenseitisbetter.Bestunbiasedestimator:ifyouchoosethebest(intermsofMSE)estimator,andrestrictyourselftounbiasedestimators,thenthebestestimatoristheonewiththelowestvariance.AbestunbiasedestimatorisalsocalledtheUniformminimumvarianceunbiasedestimator(UMVUE).Formally:anestimatorWisaUMVUEofsatises:(i)EW=,forall(unbiasedness)(ii)VWV~W,forall,andallotherunbiasedestimators~W.Theuniformconditioniscrucial,becauseitisalwayspossibletondestimatorswhichhavezerovarianceforaspecicvalueof.ItisdifcultingeneraltoverifythatanestimatorWisUMVUE,sinceyouhavetoverifycondition(ii)ofthedenition,thatVWissmallerthanallotherunbiasedestimators.Luckily,wehaveanimportantresultforthelowestattainablevarianceofanesti-mator.Theorem7.3.9(Cramer-RaoInequality):LetX1;:::;Xnbeasamplewithjointpdff(~Xj),andletW(~X)beanyestimatorsatisfying(i)d dEW(~X)=Z@ @hW(~X)f(~Xj)id~X;(ii)VW(~X)1:ThenVW(~X)d dEW(~X)2 E@ @logf(~Xj)2:3 TheRHSaboveiscalledtheCramer-RaoLowerBound.Proof:CB,pg.336.Inshort,weapplytheCauchy-SchwarzinequalityV(S)cov(S;T)=V(T)withS=W(~X)andT=@ @logf(~Xj).ThechoiceofTheremayseemabitarbitrary.Togetsomeintuition,considerCramer'sderivation.1StartwiththefollowingmanipulationoftheequalityEW(X)=RW(X)f(Xj)dX:d dEW(X)=d dZW(X)f(Xj)dX=ZW(X)@ @f(Xj)dX=Z(W(X)EW(X))@ @f(Xj)dX(noteZEW(X)@ @f(Xj)dX=0)=Z(W(X)EW(X))@ @logf(Xj)f(Xj)dXApplyingtheCauchy-Schwarzinequality,wehave[d dEW(X)]2VarW(X)E[@ @logf(Xj)]2orVarW(X)[d dEW(X)]2 E[@ @logf(Xj)]2:TheLHSofcondition(i)aboveisd dRW(~X)f(Xj)dX,sobyLeibniz'rule,thisconditionrulesoutcaseswherethesupportofXisdependenton.ThecrucialstepinthederivationoftheCR-boundistheinterchangeofdif-ferentiationandintegrationwhichimpliesE@ @logf(~Xj)=Z1 f(~Xj)@f(~Xj) @f(~Xj)dx=@ @Zf(~Xj)dx=@ @1=0(1) 1Cramer,MathematicalMethodsofStatistics,p.475ff.4 (skip)Theabovederivationisnoteworthy,because@ @logf(~Xj)=0istheFOCofmaximumlikelihoodestimationproblem.Inthei.i.d.case,thisbecomesthesampleaverage1 nXi@ @logf(xij)=0:AndbytheLLN:1 nXi@ @logf(xij)p!E0@ @logf(xij);where0isthetruevalueof0.Thisshowsthatmaximumlikelihoodestima-tionofisequivalenttoestimationbasedonthemomentconditionE0@ @logf(xij)=0whichholdsonlyatthetruevalue=0.(ThusMLEisconsistentforthetruevalue0,aswe'llseelater.)(However,notethatEq.(1)holdsatallvaluesof,notjust0.)[Thinkabout]Whatifmodelismisspecied,inthesensethattruedensityof~Xisg(~x),andthatforall2,f(~xj)6=g(~x)(thatis,thereisnovalueoftheparametersuchthatthepostulatedmodelfcoincideswiththetruemodelg)?DoesEq.(1)stillhold?WhatisMLElookingfor?Intheiidcase,theCRlowerboundcanbesimpliedCorollary7.3.10:ifX1;:::;Xni.i.d.f(Xj),thenVW(~X)d dEW(~X)2 nE@ @logf(Xj)2:(2)Uptothispoint,Cramer-Raoresultsnotthatoperationalforustondabestestimator,becausetheestimatorW(~X)isonbothsidesoftheinequality.However,foranunbiasedestimator,EW(~X)=,sothatd dEW(~X)=1.5 Example:X1;:::;Xni.i.d.N(;2).WhatisCRLBforanunbiasedestimatorof?Unbiased!numerator=1.logf(xj)=logp 2log1 2x 2@ @logf(xj)=x 1 =x 2E@ @logf(Xj)2=E(X)24=1 4VX=1 2:HencetheCRLB=1 n1 2=2 n.Thisisthevarianceofthesamplemean,sothatthesamplemeanisaUMVUEfor.SometimeswecansimplifythedenominatoroftheCRLBfurther:Lemma7.3.11(Informationinequality):iff(Xj)satises(*)d dE@ @logf(Xj)=Z@ @@ @logf(Xj)f(Xj)dx;thenE@ @logf(Xj)2=E@2 @2logf(Xj):Proof:LHSof(*):UsingEq.(1)above,wegetthatLHSof(*)=0.RHSof(*):Z@ @@ @logffdx=Z@2logf @2fdx+Z1 f@f @2dx=E@2logf @2+E@logf @2:PuttingtheLHSandRHStogetheryieldsthedesiredresult.6 TheLHSoftheabovecondition(*)isjustd dR@ @logf(Xj)f(Xj)dX.Asbefore,thecrucialstepistheinterchangeofdifferentiationandintegration.(skip)Also,theinformationinequalitydependscruciallyontheequalityE@ @logf(Xj)=0,whichdependsonthecorrectspecicationofthemodel.Example:forthepreviousexample,considerCRLBforunbiasedestimatorof2.Wecanusetheinformationinequality,becausecondition(*)issatisedforthenormaldistribution.Hence:@ @2logf(xj)=1 22+1 2(x)2 4@ @2@ @2logf(xj)=1 24(x)2 6E@ @2@ @2logf(xj)=1 241 4=1 24:HencetheCRLBis24 n.Example:X1;:::;XnU[0;].CheckconditionsforCRLBforanunbiasedestimatorW(~X)of.d dEW(~X)=1(becauseitisunbiased)Z@ @hW(~X)f(~Xj)id~X=ZW(~X)1 2d~X6=d dEW(~X)=1Hence,condition(i)oftheoremnotsatised.LossfunctionoptimalityLet~Xf(~Xj).ConsideralossfunctionL(;W(~X)),takingvaluesin[0;+1),whichpenalizesyouwhenyourW(~X)estimatorisfarfromthetrueparameter.NotethatL(;W(~X))isarandomvariable,since~X(andW(~X))arerandom.Considerestimatorswhichminimizeexpectedloss:thatisminW()EL(;W(~X))minW()R(;W())7 whereR(;W())istheriskfunction.(Note:theriskfunctionisnotarandomvariable,because~Xhasbeenintegratedout.)LossfunctionoptimalityisamoregeneralcriterionthanminimumMSE.Infact,becauseMSE(W(~X))=EW(~X)2,theMSEisactuallytheriskfunctionassociatedwiththequadraticlossfunctionL(;W(~X))=W(~X)2.Otherexamplesoflossfunctions:Absoluteerrorloss:jW(~X)jRelativequadraticerrorloss:(W(~X))2 jj+1Theexerciseofminimizingrisktakesagivenvalueofasgiven,sothatthemin-imizedriskofanestimatordependsonwhichevervalueofyouareconsidering.Youmightbeinterestedinanestimatorwhichdoeswellregardlessofwhichvalueofyouareconsidering.(Analogoustothefocusontheuniformminimalvariance.)Forthisdifferentproblem,youwanttoconsideranotionofriskwhichdoesnotdependon.Twopossiblecriteriaare:Averagerisk:minW()ZR(;W())h()d:whereh()issomeweightingfunctionacross.(InaBayesianinterpretation,h()isapriordensityover.)Minmaxcriterion:minW()maxR(;W()):HereyouchoosetheestimatorW()tominimizethemaximumrisk=maxR(;W()),whereissettotheworsevalue.Sominmaxoptimizeristhebestthatcanbeachievedinaworst-casescenario.8 2LARGESAMPLEPROPERTIESOFESTIMATORSLarge-sampleproperties:exploitLLN,CLTConsiderdatafX1;X2;:::gbywhichweconstructasequenceofestimatorsWnnW(~X1);W(~X2);:::o.Wnisarandomsequence.Dene:wesaythatWnisconsistentforaparameterifftherandomsequenceWnconverges(insomestochasticsense)to.StrongconsistencyobtainswhenWnas!.WeakconsistencyobtainswhenWnp!.Forestimatorslikesample-means,consistency(eitherweakorstrong)followseas-ilyusingaLLN.Dene:anM-estimatorisanestimatorofwhichamaximizerofanobjectivefunctionQn().Examples:MLE:Qn()=1 nPni=1logf(xij)Leastsquares:Qn()=Pni=1[yig(xi;)]2.OLSisspecialcasewheng(xi;)=+X0i.GMM:Qn()=Gn()0Wn()Gn()whereGn()="1 nnXi=1m1(xi;);1 nnXi=1m2(xi;);:::;1 nnXi=1mM(xi;)#0;anM1vectorofsamplemomentconditions,andWnisanMMweightingmatrix.Notation:Foreach2,letff(x1;:::;xn;:::;)denotethejointdensityofthedataforthegivenvalueof.For02,wedenotethelimitobjectivefunctionQ0()=plimn!1;f0Qn()(ateach).ConsistencyofM-estimatorsMakethefollowingassumptions:1.Foreach02,thelimitingobjectivefunctionQ0()isuniquelymaximizedat0(identication)9 2.ParameterspaceisacompactsubsetofRK.3.Q0()iscontinuousin4.Qn()convergesuniformlyinprobabilitytoQ0();thatis:sup2jQn()Q0()jp!0:Theorem:(ConsistencyofM-Estimator)Underassumption1,2,3,4,np!0.Proof:Weneedtoshow:foranyarbitrarilysmallneighborhoodNcontaining0,P(n2N)!1.Fornlargeenough,theuniformconvergenceconditionsthat,forall;0,Psup2jQn()Q0()j=2-278;1:Theeventsup2jQn()Q0()j=2impliesQn(n)Q0(n)=2,Q0(n)-278;Qn(n)=2(3)Similarly,Qn(0)Q0(0)-278;=2)Qn(0)-278;Q0(0)=2:(4)Sincen=argmaxQn(),Eq.(3)impliesQ0(n)-278;Qn(0)=2:(5)Hence,addingEqs.(4)and(5),wehaveQ0(n)-278;Q0(0):(6)Sowehaveshownthatsup2jQn()Q0()j=2=)Q0(n)-278;Q0(0),P(Q0(n)-278;Q0(0))Psup2jQn()Q0()j=2!1:NowdeneNasanyopenneighborhoodofRK,whichcontains0,andNisthecomplementofNinRK.Then\Niscompact,sothatmax2\NQ0()exists.Set=Q0(0)max2\NQ0().ThenQ0(n)-278;Q0(0))Q0(n)-278;max2\NQ0())n2N,P(n2N)P(Q0(n)-278;Q0(0))!1:10 Sincetheargumentaboveholdsforanyarbitrarilysmallneighborhoodof0,wearedone.Ingeneral,thelimitobjectivefunctionQ0()=plimn!1Qn()maynotbethatstraightforwardtodetermine.Butinmanycases,Qn()isasampleaverageofsomesort:Qn()=1 nXiq(xij)(eg.leastsquares,MLE).Thenbyalawoflargenumbers,weconcludethat(forall)Q0()=plim1 nXiq(xij)=Exiq(xij)whereExidenoteexpectationwithrespecttothetrue(butunobserved)dis-tributionofxi.(skip)Mostofthetime,0canbeinterpretedasatruevalue.Butifmodelismisspecied,thenthisinterpretationdoesn'thold(indeed,undermisspec-ication,notevenclearwhatthetruevalueis).Soamorecautiouswaytointerprettheconsistencyresultisthatnp!argmaxQ0()whichholds(giventheconditions)nomatterwhethermodeliscorrectlyspec-ied.**Let'sunpacktheuniformconvergencecondition.Sufcientconditionsforthisconditionsare:1.Pointwiseconvergence:Foreach2,Qn()Q0()=op(1).2.Qn()isstochasticallyequicontinuous:forevery0;0thereexistsasequenceofrandomvariablen(;)andn(;)suchthatforallnn,P(jnj)andforeachthereisanopensetNcontainingwithsup~2NjQn(~)Qn()jn;8n-278;n:Notethatbothnandndonotdependon:itisuniformresult.Thisisaninprobabilityversionofthedeterministicnotionofuniformequicontinuity:wesayasequenceofdeterministicfunctionsRn()isuniformlyequicontinuousif,forevery-278;0thereexists()andn()suchthatforallsup~:jj~jjjRn(~)Rn()j;8n]TJ/;༙ ;.9;Ւ ;Tf 1;.13; 3.;3 T; [0;n:11 Tounderstandthismoreintuitively,considerwhatweneedforconsistency.BycontinuityofQ0,weknowthatQ0()isclosetoQ0(0)for2N(0).Bypointwiseconvergence,wehaveQn()convergingtoQ0()forall.However,whatweneedisthatevenifQn()isnotoptimizedby0,theoptimizern=argmaxQn()shouldnotbefarfrom0.Pointwiseconvergencedoesnotguaranteethis.Forthelastpart,weneedQn()tobeequallyclosetoQ0()forall,be-causethentheoptimizersofQnandQ0cannotbetoofarapart.However,pointwiseconvergenceisnotenoughtoensurethisequallycloseness.Atanygivenn,Qn(0)beingclosetoQ0(0)doesnotimplythisatotherpoints.Uniformconvergenceensures,roughlyspeaking,thatatanygivenn,QnandQ0areequallycloseatallpoints.Thiswasexploitedintheproof.AsymptoticnormalityforM-estimatorsDenethescorevector5~Qn()=@Qn() @1=~;:::;@Qn() @K=~0:Similarly,denetheKKHessianmatrix5~~Qn()i;j=@2Qn() @i@j=~;1i;jK:NotethattheHessianissymmetric.Makethefollowingassumptions:1.n=argmaxQn()p!02.02interior()3.Qn()istwicecontinuouslydifferentiableinaneighborhoodNof0.4.p n50Qn()d!N(0;)5.UniformconvergenceofHessian:thereexiststhematrixH()whichiscon-tinuousat0andsup2Njj5Qn()H()jjp!0.6.H(0)isnonsingular12 Theorem(AsymptoticnormalityforM-estimator):Underassumptions1,2,3,4,5,p n(n0)d!N(0;H10H10)whereH0H(0).Proof:(sketch)ByAssumptions1,2,3,5nQn()=0(thisisFOCofmaximizationproblem).Thenusingmean-valuetheorem(withndenotingmeanvalue):0=5nQn()=50Qn()+5nnQn()(n0))5nnQn()| {z }p!H0(usingA5)p n(n0)=p n50Qn()| {z }d!N(0;)(usingA4),p n(n0)d!H(0)1N(0;)=N(0;H10H10):Note:A5isauniformconvergenceassumptiononthesampleHessian.Givenpreviousdiscussion,itensuresthatthesampleHessian5Qn()evaluatedatn(whichiscloseto0)doesnotvaryfarfromthelimitHessianH()at0,whichisimpliedbyatypeofcontinuityofthesampleHessiancloseto0.2.1MaximumlikelihoodestimationTheconsistencyofMLEcanfollowbyapplicationofthetheoremaboveforconsis-tencyofM-estimators.Essentially,aswenotedabove,whattheconsistencytheoremshowedabovewasthat,foranyM-estimatorsequencen:plimn!1n=argmaxQ0():ForMLE,thereisadistinctandearlierargumentduetoWald(1949),whoshowsthat,inthei.i.d.case,thelimitinglikelihoodfunction(correspondingtoQ0())isindeedgloballymaximizedat0,thetruevalue.Thus,wecandirectlyconrmtheidenticationassumptionoftheM-estimatorconsistencytheorem.Thisargumentisofinterestbyitself.Argument:(summaryofAmemiya,pp.141142)Dene^MLEnargmax1 nPilogf(xij).Let0denotethetruevalue.13 ByLLN:1 nPilogf(xij)p!E0logf(xij);forall(notnecessarilythetrue0).ByJensen'sinequality:E0logf(xj) f(xj0)logE0f(xj) f(xj0)ButE0f(xj) f(xj0)=Rf(xj) f(xj0)f(xj0)=1,sincef(xj)isadensityfunction,forall.2Hence:E0logf(xj) f(xj0)0;8=)E0logf(xj)E0logf(xj0);8=)E0logf(xj)ismaximizedatthetrue0:ThisistheidenticationassumptionfromtheM-estimatorconsistencythe-orem.(skip)Analogously,wealsoknowthat,for-278;0,1=E0logf(x;0) f(x;0)0;2=E0logf(x;0+) f(x;0)0:BytheSLLN,weknowthat1 nXilogf(xi;0) f(xi;0)=1 n[logLn(~x;0)logLn(~x:0)]as!1sothat,withprobability1,logLn(~x;0)logLn(~x;0)fornlargeenough.Similarly,fornlargeenough,logLn(~x;0+)logLn(~x;0)withprobability1.Hence,forlargen,^nargmaxlogLn(~x;)2(0;0+):Thatis,theMLE^nisstronglyconsistentfor0.NotethatthisargumentrequiresweakerassumptionsthantheM-estimatorconsistencytheoremabove. 2Inthisstep,notetheimportanceofassumptionA3inCB,pg.516.Ifxhassupportdependingon,thenitwillnotintegrateto1forall.14 Nowweintroduceanotheridea,efciency,whichisalarge-sampleanalogueoftheminimumvarianceconcept.ForthesequenceofestimatorsWn,supposethatk(n)(Wn)d!N(0;2)wherek(n)isapolynomialinn.Then2isdenotedtheasymptoticvarianceofWn.Inusualcases,k(n)=p n.Forexample,bytheCLT,weknowthatp n(Xn)d!N(0;2).Hence,2istheasymptoticvarianceofthesamplemeanXn.Denition10.1.11:AnestimatorsequenceWnisasymptoticallyefcientforifp n(Wn)d!N(0;v()),wheretheasymptoticvariancev()=1 E0(@ @logf(Xj))2BycomparisonwithEq.(2),notethattheasymptoticvariance1 E0(@ @logf(Xj))2isequivalenttotheCRLBforoneobservation(n=1).Afulldiscussionandjusti-cationofefciencyisdeepandbeyondthiscourse.ButrecallthatN(0;1=I()),whereI()E0@ @logf(Xj)2denotestheFisherinformation,isthedistribu-tionforthesamplemeanestimatorforthemeanparameterofanormaldistributionusingonlyoneobservation.Soessentiallyasyptoticallyefcientestimatorsareasymptoticallyequivalenttosuchanestimationproblem.ByasymptoticnormalityresultforM-estimator,weknowwhattheasymptoticdistri-butionfortheMLEshouldbe.However,itturnsoutgiventheinformationinequality,theMLE'sasymptoticdistributioncanbefurthersimplied.Theorem10.1.12:AsymptoticefciencyofMLEProof:(followingAmemiya,pp.143144)^MLEnsatisestheFOCoftheMLEproblem:0=@logL(j~Xn) @=^MLEn:15 Usingthemeanvaluetheorem:0=@logL(j~Xn) @=0+@2logL(j~Xn) @2=n^MLEn0=)p n^n0=p n@logL(j~Xn) @=0 @2logL(j~Xn) @2=n=p n1 nPi@logf(xij) @=0 1 nPi@2logf(xij) @2=n()Notethat,bytheLLN,1 nXi@logf(xij) @=0p!E0@logf(Xj) @=0=Z@f(xij) @=0dx:Usingsameargumentasintheinformationinequalityresultabove,thelasttermis:Z@f @dx=@ @Zfdx=0:Hence,theCLTcanbeappliedtothenumeratorof(**):numeratorof(**)d!N 0;E0@logf(xij) @=02!:ByLLN,anduniformconvergenceofHessianterm:denominatorof(**)p!E0@2logf(Xj) @2=0:Hence,bySlutskytheorem:p n^n0d!N0B@0;E0@logf(xij) @=02 hE0@2logf(Xj) @2=0i21CA:Bytheinformationinequality:E0@logf(xij) @=02=E0@2logf(Xj) @2=0sothatp n^n0d!N0B@0;1 E0@logf(xij) @=021CA16 sothattheasymptoticvarianceistheCRLB.Hence,theasymptoticapproximationforthenite-sampledistributionis^MLEnaN0B@0;1 n1 E0@logf(xij) @=021CA:17