/
CDF/ANAL/PUBLIC/5776Version2.10August20,2002Everythingyoualwayswantedt CDF/ANAL/PUBLIC/5776Version2.10August20,2002Everythingyoualwayswantedt

CDF/ANAL/PUBLIC/5776Version2.10August20,2002Everythingyoualwayswantedt - PDF document

jane-oiler
jane-oiler . @jane-oiler
Follow
438 views
Uploaded On 2017-11-21

CDF/ANAL/PUBLIC/5776Version2.10August20,2002Everythingyoualwayswantedt - PPT Presentation

howtode ID: 606900

howtode

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "CDF/ANAL/PUBLIC/5776Version2.10August20,..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

CDF/ANAL/PUBLIC/5776Version2.10August20,2002EverythingyoualwayswantedtoknowaboutpullsLucDemortier1andLouisLyons21TheRockefellerUniversity,2UniversityofOxfordAbstractThisnoteexplainsvariouswaystode¯nea\pull"or\stretch".Itdiscussesapplicationsofthisconceptinproblemsofparameteresti-mation(constrainedandunconstrained¯ts)andhypothesistesting.MonteCarlomethodsaredescribedtocharacterizepulldistributionsinsituationsinvolvingsmallsamples.1IntroductionIfarandomvariablexisgeneratedrepeatedlywithaGaussiandistributionofmean¹andwidth¾,thenitisalmostatautologythatthepullg=x¡¹¾(1)willbedistributedasastandardGaussianwithmeanzeroandunitwidth.Thankstothecentrallimittheorem,thissimplepropertycanbeappliedinawiderangeofsituationsfromhypothesistestingtoparameterestima-tion,wherepullsprovideevidenceforvariousformsofbiasandallowtheveri¯cationoferrorcoverage.Section2introducesthreede¯nitionsofpullinthecontextofparameterestimationanddescribesacoupleofsimpleapplications.TheseapplicationsboildowntothecomparisonofapulldistributionwiththeexpectationofastandardGaussian.Incontrast,inhypothesistestingasinglepullisusedasateststatistictodecideontheconsistencyoftwomeasurements.Thisisdescribedinsection3.Section4considersnon-asymptoticsituationsand1 howtode¯nepullsinthepresenceofasymmetricerrors.ThestatementthatpulldistributionsareexpectedtobestandardGaussianimpliesaproperlyconstructedensembleofrealorsimulatedmeasurementsonwhichpullsarede¯ned.Thequestionofhowtoconstructsimulatedensemblesisstudiedinsection5,wherewealsoexaminethee®ectofsamplesizeonpulldistribu-tions.Finally,wegivesomegeneralrecommendationsontheuseofpullsinsection6.2PullsinparameterestimationTwoofthemostpopularmethodsofparameterestimationareleast-squaresandmaximum-likelihood.Intheformer,oneminimisesaweightedsumofsquaresS=XiÃyexpi¡ypredi(¿)¾i!2(2)whereyexpi§¾iareexperimentalmeasurements,andyprediarethepredictedvalues,whichdependononeormoreparameters¿.Then¿m,thebestvalueoftheparameter1,isdeterminedbyminimisingSwithrespectto¿,anditserror¾misgivenforexampleby1=q12d2Sd¿2.Alternatively¿couldbedeterminedbymaximisingthelikelihoodL=Yif³yexpi;ypredi(¿);¾i´(3)wherefistheprobabilitydensityforobservingyexpiwhenthepredictedvalueisypredi(¿).Itisalsopossibletoperformaconstrained¯t,whenotherinformationontheparameter(s)isavailable.Thusif¿haspreviouslybeenmeasuredas¿c§¾c,equations(2)and(3)wouldbemodi¯edtoS=µ¿¡¿c¾c¶2+XiÃyexpi¡ypredi(¿)¾i!2(4)andL=e¡12(¿¡¿c¾c)2p2¼¾cYif³yexpi;ypredi(¿);¾i´(5)1Although¿isdeterminedbya¯ttothedata,wedenoteits¯ttedvalueby¿m(mfor`measured'),todistinguishitfrom¿f(ffor`¯tted')whenweincludesomeconstraintinthe¯t(seeforexampleequation4).Thisisconsistentwiththewaywerefertothemeasuredmomentumofatrackasderivedfroma¯ttothehitsalongitspath,asopposedtothe¯ttedmomentum,fromakinematic¯tincorporatingenergyandmomentumconservation.2 wheretheGaussianfactorgivestheprobabilitydensityforobserving¿cifthetruevalueis¿.Itisassumedthatthepreviousandthecurrentmeasurementsareuncorrelated.Forlargesamples(orforalinearmodelwithGaussianuncertainties),thesecondfactorinequation(5)isGaussianin¿,and¿f§¾f,the¯tresultthatincorporatestheconstraint¿c§¾c,isgivenby:¿f=¿m=¾2m+¿c=¾2c1=¾2m+1=¾2c(6)¾f=1q1=¾2m+1=¾2c(7)2.1Unconstrained¯tsSupposeweobtainasetofmeasurementsofaparameter¿,whose\true"or\generated"valueis¿g.Themeasurementsarestatistical°uctuationsaround¿gandcould,forexample,followanexponentialtimedistribution1¿ge¡t=¿g:(8)Ifahistogramisproduced,therewouldbePoisson°uctuationsonthenum-bersineachbin.A¯ttothedatawouldgiveavalue¿m§¾m.Then,foralargenumberofeventsinthedistribution,wewouldexpect¿mtobeap-proximatelyGaussiandistributedabout¿g,eventhoughthedistribution(8)isnon-Gaussian.Formanyrepetitionsofthisprocedure,thepullg=¿m¡¿g¾m(9)shouldbeastandardGaussian.Thisisstilltruewhenthe¯tinvolvesaddi-tionalparameters,aslongastheerror¾mhasbeencorrectlycalculated.Theabovede¯nitionofpullcanbeusedforcheckingthepropertiesofa¯ttingalgorithmwithlargenumbersofpseudo-experiments.However,whenconfrontedwithrealdata,the\true"value¿gisnotknownandde¯nition(9)isuseless.Fortunatelythereexistsanalternativede¯nitionofpullforcaseswhereanexternalconstraintisapplied.2.2Constrained¯tsConsideragaintheexampleofsection2.1,thistimeincorporatinganextra`constraint'¿=¿c§¾cfromsomeexternalmeasurement.Inotherwords,intheSexpressionwearetryingtominimise,thereisanextraterm(¿¡¿c)2=¾2c.Letthe¯ttedvalueof¿,takingintoaccounttheexternalconstraint,be3 ¿f§¾f.Thenthepullgc=¿f¡¿cq¾2c¡¾2f(10)isusuallyastandardGaussian.Thedenominatoroftheexpressionforgcmayat¯rstsightlookabitsurprising,butitissimplytheerroronthenumerator,takingintoaccountthecorrelationbetweentheerrorsinthe¯tresult¿fandtheconstraint¿c.Equivalently,onecande¯neapullaccordingto:gm=¿m¡¿fq¾2m¡¾2f;(11)where¿m§¾misthe¯tresultwithouttheextraconstraint.Forlargesamples,orforalinearmodelwithGaussianuncertainties,onecanuseequations(6)and(7)toshowthatgc=gm.Itshouldbenotedhowever,thatthelarge-samplelimitisnotreachedatthesameratebygcandgm(seesection5.1.)Thede¯nitionofgmallowsonetoexaminethebehaviourofpullsintwolimitingcases:1.Iftheconstraintistotallyirrelevant(e.g.itreferstoapreviousmea-surementofavariablethatiscompletelyunrelatedtothepresentanal-ysis),the¯twillnotimprovethemeasurementandso¿f§¾f=¿m§¾m:(12)Thenequation(11)reducestogm=0=0,whichisnotwrong.2.Ifincontrasttheextraconstraintisexact,¿f=¿cand¾f=¾c=0.Inthiscase,¿mshouldhavebeenGaussiandistributedabouttheconstraintwithvariance¾2m.Thepullde¯nitiongives:gm=¿m¡¿fq¾2m¡02;(13)whichisthusagainaunitGaussian.Anexampleofthiscouldbethesumofthemeasuredenergiesofallthe¯nalstateparticlesinareaction,whichshouldequalthe(assumedexactlyknown)initialstateenergy.SofarwehavestatedwithoutproofthatpulldistributionsareexpectedtobestandardGaussian.Inordertostudythisstatementmorecarefullyoneneedstospecifytheensembleonwhichpullsarede¯ned.Wedeferadiscussionofthistopictosection5.4 2.3ExamplesInthissectionwegivetwoexamplesoftheuseofpullsinconstrained¯ts.The¯rstexample(section2.3.1)illustratesde¯nition(10)ofconstrainedpulls,i.e.gc,whereasinthesecondexample(section2.3.2)thenatureoftheconstraintissometimessuchthatonlyde¯nition(11),i.e.gm,canbeused.2.3.1LifetimeofCPeigenstatesofBsInCDF,thedecaychannelBs!ÃÁcanbeanalysedintermsoftwodi®erentlifetimes¿sand¿`oftheCPeigenstatesoftheBs,whichmanifestthemselvesinthedi®erentspinstatesoftheÃandÁ,whichinturna®ectthevectormesondecayangulardistributions[2].Inthe¯tofexperimentaldatatothesetwolifetimes(andtootherpa-rameters),itispossibletoimposeaconstraintthattheirsuitablyweightedaverage¹¿cisgivenbythemeasuredBslifetimeof1:54§0:07ps[3].Ifwegenerateawholeseriesofsimulatedexperimentswithvalues¿sand¿`(whoseweightedaverageis1.54ps)andperformtheconstrained¯ttoextracttheaveragelifetime¹¿f§¾fandthefractionallifetimedi®erence¢¡=¡,wewouldthenexpect¹¿ftobedistributedsuchthatitspullgc=¹¿f¡1:54q0:072¡¾2f(14)isaunitGaussian.2.3.2Kinematic¯ttingThisisthesituationwhereweminimiseS=Xiµxfi¡xmi¾mi¶2(15)subjecttosomeconstraint(s)(suchasenergyandmomentumconservationforaspeci¯cassumedreaction)onthe¯ttedkinematicvariablesxfiofanevent,whosemeasuredvaluesbeforethis¯ttingprocedurearexmi§¾mi.Thusxicouldbethe4-momentumcomponentsofthetracksatagivenvertexintheevent.Inreality,thefourxivariablesofatrackarelikelytobecorrelatedwitheachother,whichwouldrequireexpression(15)tobeextendedtotaketheircorrelationsintoaccount.Asaresultofthe¯t,wedeterminethexfiandtheirerrors¾fi(each¾ficanbecalculatedastheshiftinxfineededtoincreaseSby1.0fromitsminimumvalue,whenSisre-minimisedwithrespecttotheotherxfj,5 j6=i.)Thenweexpectthepullsgmi=xfi¡xmiq¾2mi¡¾2fi(16)tobedistributedlikestandardGaussians.Thisisjustequivalenttoequation(11).PullsinhypothesistestingTheprevioussectiondescribedtheuseofpullsinparameterestimation,whereapulldistributionisobtainedandcomparedtoastandardGaussian.Wenowturntoasituationwhereasinglepulliscalculatedand,assumingitsparentdistributiontobestandardGaussian,aninferenceisdrawnaboutthevalidityofagivenhypothesis.Aslightlymoregeneraltreatmentofthematerialpresentedinthissectioncanbefoundonpages277-278ofref.[1].Supposeweperformedaseriesofmeasurementsofaquantity¿andwishtotesttheconsistencyofthelatestmeasurement,¿`§¾`,withtheaverageofallmeasurements,¿a§¾a.Wewrite¿p§¾pfortheaverageofallmeasurementspriortothelatestone,andregard¿pand¿`asuncorrelated.Forthecombinedresultwehave:¿a=¿pwp+¿`w`wp+w`;(17)¾a=1pwp+w`;(18)wherewp=1=¾2pandw`=1=¾2`.Thedi®erencebetweenthecombinedresultandthelatestoneis:¿a¡¿`=¿pwp¡¿`wpwp+w`;(19)andtheerror¾a`on¿a¡¿`isgivenby(rememberthat¿`and¿pareuncor-related):¾2a`=¾2pÃwpwp+w`!2+¾2`Ãwpwp+w`!2(20)=(¾2p+¾2`)Ã1=¾2p1=¾2p+1=¾2`!2(21)=¾4`¾2p+¾2`(22)6 Rewritingequation(18)intermsof¾pand¾`yields:¾2a=¾2p¾2`¾2p+¾2`:(23)Comparingequations(22)and(23),oneinfersthat:¾2a`=¾2`¡¾2a:(24)Thepullofthelatestmeasurementfromtheaveragevalueisthereforegivenbyg`=¿`¡¿aq¾2`¡¾2a:(25)Ifthelatestmeasurementisconsistentwiththeaverage,g`shouldbedis-tributedasaGaussianwithmean0andwidth1,andcanthereforebeusedasateststatistic.Itisidenticaltode¯nition(11).Needlesstosay,theequivalentde¯nitiongp=¿a¡¿pq¾2p¡¾2a(26)givesidenticalnumericalvalues.4Non-asymptoticandpathologicalcasesInmostcasesweexpectthepulldistributiontotendtoastandardGaussianonlyasymptotically.Forsmallnumbersofevents,thelikelihoodfunctionisusuallyskewed,resultinginasymmetricerrorintervalsandpulldistributionsthataresigni¯cantlynon-Gaussianunlessspecialcareistakeninde¯ningthepulls.Wediscussthede¯nitionofpullsfromasymmetricerrorsinsection4.1.Later,insection5.3,wewillreturntothisde¯nitionwithanexamplethatdemonstratesthecorrespondingimprovementinGaussianshapeofthepulldistribution.Itisalsopossibletoencounterill-de¯nedproblems,wherepulldistribu-tionswillneverlookGaussian,regardlessofthesizeofthedatasample.Wepresentanexampleofsuchapathologyinsection4.2.4.1AsymmetricerrorsSometimesa¯treturnsasymmetricerrorsforaparameter.Thishappensforexamplewiththeminosalgorithmintheminuitpackage[4].Inthiscase7 thepullgshouldbede¯nedasfollows:if(¯tresult)·(truevalue):g=(truevalue)¡(¯tresult)(positiveminoserror);otherwise:g=(¯tresult)¡(truevalue)(negativeminoserror):(27)Thisde¯nitionguaranteesthatthepercentageofpullsbetween¡1and+1equalsthecoverageoftheerrorintervalreturnedbyminos,whichshouldbe68.27%if1-¾intervalsarerequested.Thiscanbeseenasfollows.Suppose¿gisthetruevalueoftheparameterwearetryingtodetermine,and¿fisthe¯tresult,with¾+fand¾¡ftheabsolutevaluesofthepositiveandnegativeerrorscalculatedbyminos.Byde¯nitionoftheseminoserrors,wehave:®=Pr(¿f¡¾¡f¿g¿f+¾+f);(28)where®is(closeto)68.27%.Thiscanberewrittenas:®=Pr(¡¾¡f¿g¡¿f+¾+f):(29)Next,wesplittheprobabilityontheright-handsideintotwonon-overlappingcases,¿g¡¿f0and¿g¡¿f¸0:®=Pr(¡¾¡f¿g¡¿f0)+Pr(0·¿g¡¿f+¾+f)(30)Finally,dividingby¾¡finsidethe¯rstprobabilitytermandby¾+finsidethesecondone,weobtain:®=Pr(¡1¿g¡¿f¾¡f0)+Pr(0·¿g¡¿f¾+f+1):(31)Theinterpretationofthisequationisstraightforward:when¿g¿f,dividethedi®erenceby¾¡f,otherwisedivideitby¾+f,andthisguaranteesthatafraction®ofthetimetheresultwillbebetween¡1and+1.ThereisofcoursenoguaranteethatthepulldistributionwillbeGaussian.However,ifitis,anditswidthis1,thenthecoveragewillbecorrect.Itisthereforealwaysusefultoplotthepulldistributionaccordingtotheabovede¯nitionsinceitprovidesagoodvisualindicatoroftheaccuracyoftheerrorestimates.Example:exponentialdistributionToillustratesomefeaturesofasymmetriclikelihoodfunctions,weinvestigatealikelihood¯ttoasmallnumberNoftimevaluesfromanexponential8 distribution,equation(8),withlifetimeparameter¿g=1.Thelikelihoodestimateof¿gissimply¹t,themeanoftheNtimevalues.Thepullisde¯nedasg=¹t¡¿g¾:(32)Fourdi®erentpullsresultfromfourdi®erentde¯nitionsoftheerror¾:²g(1)uses¾=¿g=pN,whichistheapproximatevalueoftheexpectederror.²g(2)uses¾=¹t=pN,whichistheparabolicerrorreturnedbythelikelihood¯t.²g(3)andg(4)makeuseoftheasymmetricerrorsonthelikelihood¯t,de¯nedbythechangesin¹trequiredforthelogarithmofthelikelihoodtodecreaseby0.5fromitsmaximumvalue.Theng(3)usestheuppererror¾uif¹t·¿g,andthelowererror¾lotherwise.Notethat,becauseoftheasymmetryinthelikelihood,¾uwilltendtobelargerthan¾l.²g(4)triesouttheerrorstheotherwayaround,i.e.¾uif¹t�¿g,and¾lotherwise.ForsamplesofsizeN=4andN=30,Table1showsthepullmeansandstandarddeviations2.N=4N=30Pullde¯nitionMeanWidthMeanWidthg(1)0:001:000:001:00g(2)¡0:671:88¡0:191:07g(3)¡0:311:43¡0:091:03g(4)¡1:062:44¡0:291:12Table1:Meansandwidthsofpulldistributionsforsamplesofsize4and30,forfourde¯nitionsofpulls(seetext).Theresultforg(1)isobviousastheestimate¹thasmeanvalue¿gandvariance¿g=N.HencethemeanpulliszeroanditsvarianceisunityforanyvalueofN.However,forsmallNthedistributionofthepullisnon-Gaussian.ThisisclearfortheextremecaseofN=1,whenthepulldistributionis2ThereisnoneedtoperformMonteCarlocalculations,asthesumxofNindependentrandomvariables,eachexponentiallydistributedwithlifetimeparameter¿g,isknowntohaveagammadistribution1¿Ng¡(N)e¡x=¿gxN¡1.Thereforethedistributionof¹tisNN¿Ng¡(N)e¡N¹t=¿g¹tN¡1.9 e¡g¡1forpullvaluesabove¡1,andzerootherwise.ItbecomesapproximatelyGaussianforlargeN,becauseoftheCentralLimitTheorem.Itisthenclearthatg(2)willbebiassednegatively.Thisisbecauseanegativepull,correspondingtoalowvalueof¹t,willresultinasmallestimateoftheerrorusedinthedenominatorofthepullde¯nition.Hence,ascomparedwithg(1),thescaleisexpandedfornegativepullsandcontractedforpositiveones.Thepullsg(3)andg(4)bothuseerrorswhichvarywith¹t,andhencesharethetendencyofg(2)tohaveanegativebias.Sinceg(4)usesasmallererrorforcalculatingnegativepullsandalargererrorforpositivepulls,theextentofthebiasisincreased.Forg(3),theoppositeisthecase.Thistendstocon¯rmthe`obvious'factthatwhenthedatahasasymmetricerrors,itisappropriatetousetheuppererrorwhenthedataisbelowtheexpectation.Alsoasexpected,thedeviationsfrom0:0§1:0becomesmallerforlargerN.4.2Searchingforanon-existentresonanceAninterestingexample[5]isprovidedbyasmoothmassdistributionbeing¯ttedbyabackgroundshapeandaresonancepeakofarbitrarypositionandarbitraryamplitudeA§¾A,whichcanbepositiveornegative.Sincethemassdistributioncontainsnoresonance,thepullissimplyA=¾A.Becauseof°uctuationshowever,thisturnsouttohaveabimodaldistribution,withpeaksmoreorlesssymmetricallysituatedaboveandbelowzero.Ithasaminimumattheorigin(whereastandardGaussianpulldistributionhasitsmaximum).Thisarisesbecausethe¯tofaresonacepeakwitharbitrarypositionwillpickoutthemassregionwhichmostdeviatesfromthesmoothshape.Inorderfora¯ttoreturnA=0,wethusrequiretheretobenosigni¯cantdeviationsacrossthewholemassdistribution;thisisveryunlikely.Asthenumberofeventsinthedistributionincreases,°uctuationsbecomerelativelysmaller,andthepositionsofthebimodalpeaksmoveintowardszeropull.However,theminimumatzeroismaintained.5Pseudo-experimentensemblesfortestingpullsWhengeneratingpseudo-experimentstotestthepropertiesofa¯ttingal-gorithmthatincludesconstraints,itisnecessarytounderstandwhichpa-rametersto°uctuate,andhowto°uctuatethem.Forexample,aneventratewhichissubjectedtoaGaussianconstraintissometimes°uctuatedac-cordingtoaPoissondistributionwhosemeanisitself°uctuatedaroundthe10 Gaussianconstraint.Thismethodiswrong,ascaneasilybeseenbycon-sideringthattheprobabilityforagiveneventratetooccurinthepseudo-experimentensembleisdi®erentfromthatpredictedbythelikelihoodmodel.Thecorrectmethodisto°uctuatetheeventrateaccordingtoaPoissondis-tributionwith¯xedmean,andseparatelyto°uctuatetheconstrainingvalueaccordingtoitsGaussiandistribution3.Oncethequestionofhowtorunpseudo-experimentsisproperlyresolved,onecancheckwhetherthedatasamplesizeislargeenoughforthepulldistributiontobestandardGaussian.Inthissectionwestartbyexaminingthee®ectofsamplesizeontheshapeofpulldistributions(subsection5.1).Wethencalculatetheexpectedwidthsofpulldistributionsforaverygeneralpseudo-experimentensemblethatin-cludesthe\correct"and\wrong"ensemblesdescribedaboveasspecialcases(subsection5.2).Thisprovidesademonstrationoftheimportanceofusingtheproperensembletostudypulls.Inthelastsubsectionwearguethattheuseofminoserrorsinminuit¯tsyieldsbetter-behavedpullsthanparabolicerrors.To¯xideas,wewillbeworkingwiththeexample¯rstintroducedinsection2.1,namelythemeasurementofatimeconstantwiththefollowinglikelihood:L(¿)=e¡12(¿¡¿c¾c)2p2¼¾cNYi=1µ1¿e¡ti=¿¶=e¡12(¿¡¿c¾c)2p2¼¾ce¡N¹t¿¿N(33)Intheabsenceoftheconstraint(¾c!1),themaximumlikelihoodestimate¿mof¿,anditsuncertainty¾m,aregivenby:¿m=¹t´1NNXi=1ti;(34)¾m=¾¹t=¹tpN:(35)Whentheconstraintisenforcedasinsection2.2,the¯ttedvalue¿fisnolongersimplyequalto¹t,althoughitremainsauniquefunctionof¹tandtheconstrainingvalue¿c.5.1E®ectofsamplesizeonpulldistributionsWeransetsofpseudo-experimentstostudythedistributionsofthevarioustypesofpullde¯nedinthisnote,andtheirdependenceonthenumberofmeasurementsN.Eachpseudo-experimentwasgeneratedasfollows:3Wecanseethatthisprocedureisreasonablefortheexampleofsection3.TotestthatprocedurebyMonteCarlo,wewouldvaryboth¿aand¿`inGaussianfashionaccordingtotheirerrors.Thiscorrespondsinthiscaseto°uctuatingtheconstraintandthePoissondatasample.11 1.GenerateNrandomtivaluesaccordingtoanexponentialdistributionwith¯xedtimeconstant¿g;2.Generateaconstraint¿caccordingtoaGaussianwithmean¹¿candwidth¾c;3.Fitthetitoanexponentialdistributionwhosetimeconstantisthe¯tparameterandisconstrainedto¿c§¾c.Unlessoneisinterestedinstudyingthebiasintroducedbyconstrainingtothewrongtimeconstant,onewillusuallyset¹¿c´¿g.Wegeneratedthreesetsofpseudo-experimentswith¿g=¹¿c=5andwithN=10,100and1000respectively.Ineachcasewesettheuncertainty¾contheconstrainttobeequaltotheexpecteduncertaintyonthecorrespondingunconstrainedresult,i.e.¿g=pN.TheresultsforN=100areshowninFigures1and2.Figures1(a),(b)and(c)showthedistributionsofthegeneratedconstraint¿c,the¯tresultwithoutconstraint¿m,andthe¯tresultwithconstraint¿f.Becauseofthelargenumberofmeasurementsperpseudo-experiment,thedistributionof¿misreasonablyGaussian.Soisthedistributionof¿fwhich,asexpected,isnarrowerthanboththedistributionsof¿cand¿m.Plots1(d),(e)and(f)showdistributionsofthepullsde¯nedbyequations(27),(10)and(11),respectively.ThegandgcpulldistributionsareGaussian,butgmisclearlynot.Inordertounderstandthis,weplotdistributionsofthenumeratorsanddenominatorsofthepullsinFigure2.ThenumeratorsallappeartobeGaussian,includingthenumeratorofgm.Infact,judgingbytheÂ2=ndfval-ues,thenumeratorofgmisevenmoreGaussian-likethanthe¿mdistribution,indicatingthatsomecancellationofnon-Gaussiane®ectstakesplaceinthedi®erence¿m¡¿f.Asexpected,themeansofthedenominatordistributionsagreewiththeRMSwidthsofthecorrespondingnumeratordistributions.IfoneweretodividethepullnumeratorsbytheseRMSwidths,theresultingpulldistributionswouldbeperfectlynormal(i.e.Gaussianwithmean0andwidth1.)Whendividingbytheproperdenominatorshowever,°uctuationsinthelatterdistortthepulldistributions.Ameasureofthemagnitudeofthese°uctuationsisprovidedbytheRMS/meanratiosofthedenominatordistributions.Theseareequalto4%,5%and21%forg,gcandgm,respec-tively.Thelarge°uctuationsinthedenominatorofgmareclearlyresponsibleforthenon-Gaussiantailinthecorrespondingpulldistribution.Figures3and4showthesameplotsasFigures1and2forasetofpseudo-experimentswithN=10,i.e.inaregimewheretheasymptoticlimitisnolongeragoodapproximation,ascanbeseeninthedistributionof¿m(Figure3(b).)Notonlygm,butnowalsothegcpulldistributionisbeginningtodevelopastrongnon-Gaussiantail.12 Finally,Figures5and6showwhathappenswhenNisincreasedto1000.NoweventhegmpullisbeginningtolookquiteGaussian.Weconcludefromthesestudiesthatdi®erentde¯nitionsofpullshavedi®erentratesofconvergencetowardstheasymptoticlimit.Amongthethreede¯nitionswehaveconsidered,gconvergesthefastest,andgmtheslowest.5.2E®ectofpseudo-experimentensemblesonpulldis-tributionsTostudythebehaviourofpullsinvariousensemblesofpseudo-experiments,westartfromaverygeneralensemble,inwhicheachpseudo-experimentisde¯nedasfollows:1.Generatearandomtimeconstant¿±accordingtoaGaussianwithmean¿gandwidth¾¿±;2.GenerateNrandomtivaluesaccordingtoanexponentialdistributionwithtimeconstant¿±;3.Generateaconstraint¿caccordingtoaGaussianwithmean¿gandwidth¾¿c;4.Fitthetitoanexponentialdistributionwhosetimeconstantisthe¯tparameterandisconstrainedto¿c§¾c.Thisgeneralensembledependson¯veparameters:N,¿g,¾¿±,¾¿c,and¾c,andrequiresthegenerationofN+2independentrandomnumbersperpseudo-experiment:¿±,¿candt1:::tN.Whatwecalled\correctmethod"intheintroductiontosection5correspondsto¾¿±=0and¾¿c=¾c,whereaswhatwecalled\wrongmethod"correspondsto¾¿c=0and¾¿±=¾c.Inthefollowingsubsectionswecalculateanalyticallythewidthsofthegandgcpulldistributionsintheasymptoticlimit,andillustratetheresultswithMonteCarlocalculations.5.2.1StandarddeviationofgpullsThegpullisde¯nedby:g=¿f¡¿g¾f:(36)Intheasymptoticlimit,the¯tresult¿f§¾fisgivenbyequations(6)and(7),where¾m=¿±=pN.Since¾mdependsontherandomvariable¿±itisitselfarandomvariable,withstandarddeviation¾¿±=pN.ForlargeNwecanneglectthe°uctuationsof¾mcomparedtothoseof¿±,andhence13 tothoseofthenumeratorof(36).Accordinglywewillwrite¾m»=¿g=pN.Thuswehave:¿f=N¹t=¿2g+¿c=¾2cN=¿2g+1=¾2c(37)¾f=1qN=¿2g+1=¾2c(38)Wewillusetheseequationstocalculatethestandarddeviation¾g=¾¿f=¾fofthegpulls,where¾¿fisthestandarddeviationof¿f.Notethatinprinciple¾¿fcouldbedi®erentfrom¾f,becausetheformerdependsonhowpseudo-experimentsare°uctuated,whereasthelatteristheresultofa¯t,andthe¯tterknowsnothingaboutwherethedatacamefrom.Wehaveinfact:¾2¿f´Eh(¿f¡¿g)2i(39)=E2ÃN(¹t¡¿g)=¿2g+(¿c¡¿g)=¾2cN=¿2g+1=¾2c!23(40)=N2¿4gEh(¹t¡¿g)2i+1¾4cEh(¿c¡¿g)2i+2N(¿g¾c)2E[(¹t¡¿g)(¿c¡¿g)]³N=¿2g+1=¾2c´2(41)Theexpectationvaluesdependonthepseudo-experimentensemble;inthiscasetheyare:Eh(¹t¡¿g)2i=¾2¿±+¿2gN(42)Eh(¿c¡¿g)2i=¾2¿c(43)E[(¹t¡¿g)(¿c¡¿g)]=0(44)Pluggingtheseexpectationsbackintotheexpressionfor¾2¿fandtakingthesquarerootyields:¾¿f=sN¿2gµ1+N¿2g¾2¿±¶+³¾¿c¾2c´2N¿2g+1¾2c(45)Dividingby¾f,weobtain¯nally:¾g=vN¿2gµ1+N¿2g¾2¿±¶+³¾¿c¾2c´2N¿2g+1¾2c(46)Weconsidertwospecialcases:14 1.¾¿±=0and¾¿c=¾c.Thiscorrespondstothecorrectwayofrunningpseudo-experiments.Inthiscase,equation(46)gives¾g=1.Thedistributionoftheg-pullwillbestandardGaussian.2.¾¿c=0and¾¿±=¾c.Thiscorrespondstothewrongwayofrunningpseudo-experiments.Equation(46)reducesto¾g=¾cpN=¿g.Theg-pulldistributionwillnotbestandardGaussian,exceptwhen¾c=¿g=pN,i.e.whentheuncertaintyontheconstraintmatchestheexpecteduncertaintyontheunconstrainedresult.5.2.2StandarddeviationofgcpullsThegcpullisde¯nedinequation(10).Tocalculate¾gcwewillagainusetheapproximation¾m»=¿g=pN.Thestandarddeviationofthenumeratorofthegcpull,(¿f¡¿c),canbecalculatedinthesamewayas¾¿fintheprevioussection.We¯nd:¾(¿f¡¿c)=N¿2gq¿2gN+¾2¿±+¾2¿cN¿2g+1¾2c:(47)Ontheotherhand,thedenominatorofthegcpullcanberewrittenas:q¾2c¡¾2f=pN¿g¾crN¿2g+1¾2c;(48)sothat:¾gc=v¿2gN+¾2¿±+¾2¿c¿2gN+¾2c:(49)Itiseasytoseethat¾gc=1ineitherofthetwospecialcasesconsideredearlier,namely¾¿±=0and¾¿c=¾c,or¾¿c=0and¾¿±=¾c.Inotherwords,thegcpullhasastandardGaussiandistributionforboththe\correct"and\wrong"waysofrunningpseudo-experiments.Thesameconclusionappliestothegmpullsincegmandgcareasymptoticallyequal(section2.2).5.2.3ComparisonwithMonteCarlocalculationsWeillustratetheaboveresultsinFigure7,whereweplottheg,gcandgmpulldistributionsfor\correct"and\wrong"ensemblesofpseudo-experimentswithN=1000,¿g=5and¾c=0:03162.Asexpected,alldistributionsarestandardGaussianexceptthatofthegpullforthewrongensemble.Plugging15 N=1000,¿g=5,¾¿±=¾c=0:03162and¾¿c=0inequations(45)and(38)yields¾¿f=0:0062and¾f=0:031,sothat¾¿f=¾f=0:2,inagreementwiththewidthofthedistributioninplot(d).5.3PulldistributionsforminoserrorsFigure8showsdistributionsoftheminoserror,theparabolicerror,andvariouspullsforanensembleof\correct"pseudo-experimentswithN=10,¿g=5and¾c=0:1581.Forthisexamplethemagnitudesofthepositiveandnegativeminoserrorsdi®erbyabout15%onaverage.JudgingbytheÂ2=ndfvalues,thedistributionoftheminospullfromde¯nition(27)isclearlymoreGaussian-likethanthegpullusingtheparabolicerror.However,iftheminoserrorassignmentinequation(27)isreversed,theresultingpulldistributiondisplaysastrongnon-Gaussiantail.Thattheassignmentofequation(27)isindeedcorrectcanbeseenmoredirectlybyplottingacombinedhistogramofthepositiveandnegativeerrors(plots(c)and(d)).Weconcludethatinnon-asymptoticsituationspullscalculatedfrommi-noserrorsare\betterbehaved"thanpullscalculatedfromparabolicerrors,andthatequation(27)usesthecorrectassignmentofminoserrors.6GeneralrecommendationsfortheuseofpullsinparameterestimationproblemsWheneveroneisdoinga¯t,pulldistributionsshouldbeplottedtocheckthatthe¯tisgivingsensibleresults.Insituationsthatinvolvemanyseparate¯ts(e.g.track¯ttingforawholeseriesofevents),each¯tprovidesitsownpull(s),andthedistributioncaneasilybeobtained.If,however,theexperimentinvolvestheestimationofjustonesetofparameters,thepulldistributioncanbelookedatonlyforasimulatedsetofrepetitionsoftheexperiment.Suchpseudo-experimentsshouldalwaysbedesignedsothattheprobabilityofagivenpseudo-datasampleinthepseudo-experimentensembleisequaltotheprobabilitypredictedbythelikelihood(orchisquare)modelforthissample.Inthemajorityofcases,oneexpectsthepulldistributiontobeastandardGaussian.Onethusneedstocon¯rmthatitiscenteredatzero,hasunitwidth,andhasnolongtails.Ifthisisnotthecase,onemayneedtolookatthemeasurementsetup,theexperimenter'sassumptions,etc.Wegivetwosimpleexamples:1.Supposewemeasurethethreeanglesofatriangleasµmi.Improvedvaluesµficanbeobtainedbyimposingtheconditionthattheanglesaddupto180±.Thepullwouldbesensitivetoe®ectssuchastheerrors16 beingincorrectlyassigned,thetrianglenotbeingclosed,thegeometrynotbeing°at(e.g.thetriangleisdrawnonasphere),etc.2.Inthekinematic¯ttingexampleofSection2.3.2,pullscanbeexam-inedtolookfore®ectssuchasbiasedmomentummeasurements,mis-alignmentofthedetector,odditiesofthekinematic¯ttingprocedure,contaminationfromotherreactions,etc.ItmayhappenthatthepulldistributionisapproximatelyGaussian,butitswidthisnot1.Assumingthatthisisunderstoodtobeane®ectofthenon-asymptoticnatureoftheproblemandnotaprogrammingerror(thiscanalwaysbetestedbyrunningpseudo-experimentsclosertotheasymptoticlimit!),onemaywanttocorrectthequoteduncertaintiesbymultiplyingthembythewidthofthepulldistribution.Inothercasesthenon-asymptoticnatureoftheproblemmanifestsitselfbytheappearanceoftailsinthepulldistribution.Onemustthenbecarefulwiththeinterpretationoftheuncertainties.Ifthepercentageofpullsbetween¡1and+1is68.27%,then\1-¾"errorshavetheusualmeaning.However,sincethepulldistributionisnotGaussian,\2-¾"errorsnolongerhaveacoverageof95.45%,etc.Finally,asillustratedinsection5,oneshouldkeepinmindthatdi®erentpullde¯nitionshavedi®erentratesofconvergencetowardstheasymptoticlimit.Thusitmaybethatthechoiceofpullde¯nitionitselfisthecauseofnon-Gaussiandistortionsinthepulldistribution.References[1]W.T.Eadie,D.Drijard,F.James,M.Roos,andB.Sadoulet,\Statis-ticalMethodsinExperimentalPhysics",NorthHolland(1971).[2]F.Azfar,privatecommunication;seealsoF.Azfar,L.Lyons,M.Mar-tin,C.Paus,andJ.Tseng,\Prospectsformeasuring³¢¡¡´Bs0usingBs0!J=ÃÁ,withJ=Ã!¹+¹¡,Á!K+K¡,inRun-II,anupdate",CDF/ANAL/BOTTOM/CDFR/5351(25June2000).[3]D.E.Groometal.,\ReviewofParticlePhysics",Eur.Phys.J.C15,1(2000).[4]F.JamesandM.Roos,\MINUIT,FunctionMinimizationandErrorAnalysis,"CERND506(LongWriteup).AvailablefromtheCERNPro-gramLibraryO±ce,CERN-ITDivision,CERN,CH-1211,Geneva21,Switzerland.17 [5]T.DorigoandM.Schmitt,\Onthesigni¯canceoftheDimuonMassBumpandtheGreedyBumpBias",CDF/DOC/TOP/CDFR/5239(26February2000).18 Figure1:Resultsofapseudo-experimentrunwith¿g=¹¿c=5,¾c=0:5andN=100(seetext).Plots(a),(b)and(c)showdistributionsoftheconstraint¿c,theunconstrained¯tresult¿m,andtheconstrained¯tresult¿f,respectively.Plots(d),(e)and(f)showpulldistributionsaccordingtode¯nitions(27),(10)and(11),respectively.19 Figure2:Distributionsofthenumeratorsanddenominatorsofthepullsg,gcandgmshowninFigure1.20 Figure3:Resultsofapseudo-experimentrunwith¿g=¹¿c=5,¾c=1:581andN=10(seetext).Plots(a),(b)and(c)showdistributionsoftheconstraint¿c,theunconstrained¯tresult¿m,andtheconstrained¯tresult¿f,respectively.Plots(d),(e)and(f)showpulldistributionsaccordingtode¯nitions(27),(10)and(11),respectively.21 Figure4:Distributionsofthenumeratorsanddenominatorsofthepullsg,gcandgmshowninFigure3.22 Figure5:Resultsofapseudo-experimentrunwith¿g=¹¿c=5,¾c=0:1581andN=1000(seetext).Plots(a),(b)and(c)showdistributionsoftheconstraint¿c,theunconstrained¯tresult¿m,andtheconstrained¯tresult¿f,respectively.Plots(d),(e)and(f)showpulldistributionsaccordingtode¯nitions(27),(10)and(11),respectively.23 Figure6:Distributionsofthenumeratorsanddenominatorsofthepullsg,gcandgmshowninFigure5.24 Figure7:Pulldistributionsforpseudo-experimentswithN=1000,¿g=5and¾c=0:03162.Plots(a),(b)and(c)showtheresultofusingthecorrectensembleofpseudo-experiments(¾¿±=0,¾¿c=¾c),whereasplots(d),(e)and(f)showtheresultofusingawrongensemble(¾¿c=0,¾¿±=¾c).Seetextforfurtherdetails.25 Figure8:Resultofapseudo-experimentrunwithN=10,¿g=5and¾c=0:1581.Eachpseudo-experimentwasgeneratedaccordingtothealgorithmdescribedinsection5.1.Plot(c)isahistogramofthepositiveminoserrorforpseudo-experimentswherethe¯tresult¿fissmallerthanthe\true"value¿g,andofminusthenegativeminoserrorfortheremainingpseudo-experiments.Plot(d)showstheoppositeminoserrorassignment.Similarly,plot(g)showsthegpullaccordingtoequation(27)andplot(h)thegpullwiththeoppositeminoserrorassignment.26