/
Clarify Soft are for In terpreting and Presen ting Statistical Results Mic hael omz Jason Clarify Soft are for In terpreting and Presen ting Statistical Results Mic hael omz Jason

Clarify Soft are for In terpreting and Presen ting Statistical Results Mic hael omz Jason - PDF document

trish-goza
trish-goza . @trish-goza
Follow
567 views
Uploaded On 2015-03-05

Clarify Soft are for In terpreting and Presen ting Statistical Results Mic hael omz Jason - PPT Presentation

edu Witten erg Departmen of olitical Science Univ ersit of Wisconsin Madison 1050 Bascom Mall 221 North Hall Madison WI 53706 email wittypolisciwis c edu King Cen ter for Basic Researc in the So cial Sciences 34 Kirkland Street Harv ard Univ ersit Ca ID: 41827

edu Witten erg Departmen

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Clarify Soft are for In terpreting and P..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Clarify:SoftwareforInterpretingandPresentingStatisticalResultsMichaelTomzJasonWittenbergGaryKing1June1,20011Tomz:DepartmentofPoliticalScience,StanfordUniversity,EncinaHall,Stanford,CA94305-6044,emailtomz@stanford.edu;Wittenberg:DepartmentofPoliticalScience,UniversityofWisconsin,Madison,1050BascomMall,221NorthHall,Madison,WI,53706,emailwitty@polisci.wisc.edu;King:CenterforBasicResearchintheSocialSciences,34KirklandStreet,HarvardUniversity,CambridgeMA02138,emailKing@Harvard.Edu,websitehttp://GKing.Harvard.Edu.Clarifyiscopyrighted,butyoumaycopyanddistributethisprogramprovidedthatnochargeismadeandthecopyisidenticaltotheoriginal.Torequestanexception,pleasecontactMichaelTomz. Contents1Introduction32Softwarerequirements33HowtoInstallClarifyFortheFirstTime33.1InstallingonComputersthatareConnectedtotheInternet..............33.2InstallingonComputersthatareNotConnectedtotheInternet...........44HowtoupdateClarify54.1UpdatingfromVersion1.2xorearlier..........................54.2UpdatingfromVersion2.0orlater............................55WhatClarifyDoes56What'snewinClarify2.0?67ASimpleExample78TheMainCommands88.1estsimp...........................................88.2setx.............................................108.3simqi.............................................149FrequentlyAskedQuestions1910Formulae-APeekUndertheHood2310.1Algorithmsforestsimp..................................2310.2Algorithmsforsetx....................................241 10.3Algorithmsforsimqi....................................2511References2812Acknowledgements282 1IntroductionClarifyisaprogramthatusesMonteCarlosimulationtoconverttherawoutputofstatisticalproce-duresintoresultsthatareofdirectinteresttoresearchers,withoutchangingstatisticalassumptionsorrequiringnewstatisticalmodels.Theprogram,designedforusewiththeStatastatisticspackage,o ersaconvenientwaytoimplementthetechniquesdescribedin:GaryKing,MichaelTomz,andJasonWittenberg(2000).\MakingtheMostofSta-tisticalAnalyses:ImprovingInterpretationandPresentation."AmericanJournalofPoliticalScience44,no.2(April2000):34761.Werecommendthatyoureadthispaperbeforeusingthesoftware.Clarify2.0simulatesquantitiesofinterestforthemostcommonlyusedstatisticalmodels,includ-inglinearregression,binarylogit,binaryprobit,orderedlogit,orderedprobit,multinomiallogit,Poissonregression,negativebinomialregression,weibullregression,seeminglyunrelatedregressionequations,andtheadditivelogisticnormalmodelforcompositionaldata.2SoftwarerequirementsClarifyworksinconjunctionwithStataStatisticalSoftware,producedbyStataCorporation.Clarifywillrunonanyplatform(Windows,Unix,orMacintosh)whereStataisalreadyinstalled.YoumusthaveStata6.0orlatertorunClarify.ToobtainacopyofStataorlearnmoreaboutthesoftware,visithttp://www.stata.comorsendemailtostata@stata.com.3HowtoInstallClarifyFortheFirstTimeIfyoudonothaveapreviousversionofClarifyonyourpersonalcomputeroryournetwork,therearetwowaystoinstallthesoftware.3.1InstallingonComputersthatareConnectedtotheInternetToinstallClarify2.0forusewithapersonalcopyofStata,launchStataandthentype:netfromhttp://gking.harvard.edu/clarify/netinstallclarify3 ToinstallClarifyforusewithanetworkedcopyofStata,launchStataandthentype:netfromhttp://gking.harvard.edu/clarify/netsetadoSITEnetinstallclarifyIneithercase,thefollowing leswillbeinstalledonyourcomputer:estsimp.ado,estsimp.hlp,setx.ado,setx.hlp,simqi.ado,simqi.hlp,sumqi.ado,sumqi.hlp,tlogit.ado,tlogit.hlp.Notethatthese leswillbeinstalledontoyouradopath,thepathwhereStatasearchesforthe lesitneeds.Ifyoueverwanttoremovethese les,simplytypeadouninstallclarify3.2InstallingonComputersthatareNotConnectedtotheInternetDownloadclarify.zipfromhttp://gking.harvard.eduandcopythat letoa\roppydisk.Theninsertthe\roppyintothediskdriveofthemachinethatisnotconnectedtotheinternet.Copyclarify.zipintoatemporarydirectoryorfolderontheharddisk,anduseautilitysuchaspkunzip(availableforthePCathttp://www.pkware.com)orStu It(availablefortheMacIntoshathttp://www.aladdinsys.com/expander/index.html)toextractthe lesintoyourtemporarydirectory.Finally,launchStataandtypethefollowingcommandsfromwithinStata:netfrompath&#xtemp;&#xorar;&#xy000;designatornetinstallclarifyHere,replacetemporarypathdesignator�withthepathtothetemporarydirectoryorfolderwhereyouextractedthecontentsofclarify.zip.Thus,forexample,ifthearchivewerecopiedandopenedintothec:ntempfolderonaWindowsmachine,theappropriateinstallationcommandswouldbe:netfromc:\tempnetinstallclarifyOnaMacintosh,ifyoucopiedclarify.zipintoatemporaryfoldercalledTEMPFOLDER,youcouldinstalltheprogrambytyping:netfrom:TEMPFOLDERnetinstallclarify4 4HowtoupdateClarifyTherearetwowaystoupdateClarify.Thechoicedependsonwhichversionyoucurrentlyhaveonyourcomputerornetwork.4.1UpdatingfromVersion1.2xorearlierIfyouareusingaversionofClarifyreleasedbeforeJune2001,manuallydeletethefollowing lesfromyourStataadopath,workingdirectory,ornetwork:estsimp.ado,estsimp.hlp,setx.ado,setx.hlp,simqi.ado,simqi.hlp,sumqi.ado,sumqi.hlp.Onceyouhavedonethis,youmayinstallClarifyaccordingtothedirectionsinSection3.4.2UpdatingfromVersion2.0orlaterOnceyou'veinstalledVersion2.0onyourcomputer,itshouldbeeasytoupdatetheprogramasnewreleasesbecomeavailable.IfyouhaveapersonalcopyofStata,launchStataandthentype:netfromhttp://gking.harvard.edu/clarify/netinstallclarify,replaceToupdatethecopyonanetwork,launchStataandthentype:netfromhttp://gking.harvard.edu/clarify/netsetadoSITEnetinstallclarify,replace5WhatClarifyDoesClarifyusesstochasticsimulationtechniquestohelpresearchersinterpretandpresenttheirstatis-ticalresults.Ituseswhateverstatisticalmodelyouhavechosenandassuchchangesnostatisticalassumptions.Asa rststep,theprogramdrawssimulationsofthemainandancillaryparameters(~\r)fromtheirasymptoticsamplingdistribution,inmostcasesamultivariatenormalwithmeanequaltothevectorofparameterestimates(^\r)andvarianceequaltothevariance-covariancematrixofestimates^V(^\r).1Thus,~\rN^\r;^V(^\r)1Therearetwoexceptions,thevarianceparametersforsureg,drawnfromaninverseWishartdistribution;andthevarianceforregress,drawnfromaninverse2distribution.Fordetails,seesection10.5 BydefaulttheprogramdrawsM=1000setsofsimulatedparameters,whichshouldbesucientformostapplications.Next,Clarifyconvertsthesimulatedparametersintosubstantivelyinterestingquantities,suchaspredictedvalues,expectedvalues,or rstdi erences.Toachievethisobjective,theuserneedonlychooserealorhypotheticalvaluesfortheexplanatoryvariables(theX's)andindicatewhichquantitiesshouldbecalculated,conditionalonthoseX's.Theprogramallowsresearcherstocalculatevirtuallyanyquantitythatwouldshedlightonaparticularproblem,andprovidesanumberofStataprocedurestodothiseasily.Clarify2.0simulatesquantitiesofinterestforthemostcommonlyusedstatisticalmodels,includ-inglinearregression,binarylogit,binaryprobit,orderedlogit,orderedprobit,multinomiallogit,Poissonregression,negativebinomialregression,weibullregression,seeminglyunrelatedregressionequations,andcompositionaldata.6What'snewinClarify2.0?Clarify2.0includesanumberofenhancementsoverpreviousversions,including:Supportformoremodels,includingweibullregression,seeminglyunrelatedregressionequa-tions,andtheadditivelogisticnormalmodelforcompositionaldata.Theabilitytoapplystandardtransformations{suchasnaturallogsandexponents{todependentvariables,estimateamodel,andthenreversethosetransformationswheninter-pretingtheresults.ClarifyisAmelia-compatible:ifyouusemultipleimputationtocorrectforproblemswithmissingvalues,Clarifywillanalyzeallthemultiplyimputeddatasetsandappropriatelycombinetheresultsandcomputeyourquantityofinterestautomatically.(ForinformationonthesoftwareprogramAmelia,seehttp://GKing.Harvard.Edu).Theoptiontogenerateantitheticalsimulations,whichguaranteesthatthemeanofthesimu-latedparametersequalsthevectorofpointestimates,^\r,andreducesMonteCarlovariance.Morepowerfulcommandsforsettingthevaluesoftheexplanatoryvariables(theX's),usingeithersingleormultiply-imputeddatasetstocomputedescriptivestatistics.Theabilitytore-displaythepreviouspointestimatesbyenteringtheestsimpcommandwithoutanyarguments.Theoptionintheestsimpcommandtodroppreviouslysimulatedparameters.Forautomaticnoti cations(viaemail)ofupdatestoClarify,seethewebpagehttp://GKing.Harvard.Edu/netmind.shtml.6 7ASimpleExampleClarifyisbasedonthreesimplecommands:estsimp(estimatesthemodelandsimulatesitsparameters)setx(setsvaluesofXsbeforesimulatingquantitiesofinterest)simqi(simulatesquantitiesofinterest)Ingeneral,thecommandsshouldberuninthatorder,althoughitisoftenusefultocallsetxandsimqimanytimesafterrunningestsimp.Clarifyalsocontainstwominorcommands:sumqiandtlogit.sumqiassistsinsummarizingquantitiesofinterestthathavebeensavedtothedataset.tlogitappliesthelogistictransformationtooneormorevariables.Forinstructionsonusingthesecommands,consulttheon-linehelp le.AllClarifycommandscanberuninteractivelyattheStatacommandlineorinbatchmodebyusingaStatado- le.Eachcommandcontainsmanyoptions,whichwedescribeinSection8ofthismanual.Here,weo erasimpleexampledesignedtoshowhowClarifycanbeused.Withadatasetloadedintomemory,typethefollowing3commands:estsimplogitYX1X2/*Estimatealogitandsimulateitsparameters*/setxmean/*SetX'stotheirmeans.*/simqi/*ReportPr(Y=1)conditionalontheX's*/Eachofthesecommandsperformsaparticularoperation,whichwesummarizehere.AmoredetaileddiscussionappearsinSection8.estsimplogitYX1X2ClarifyworksbycapturingandinterpretingthestatisticalresultsthatStataproduceswhenestimatingaparticularmodel.TouseClarify,insertthewordestsimpatthebeginningofanestimationcommandthatyouwouldnormallyruninStata.Inthisexample,thebuilt-inStatacommandislogit,thebinarydependentvariableisY,andtheexplanatoryvariablesareX1,X2,andaconstant.Unlesstheuserspeci esotherwise,estsimpwillsavethesimulatedparametersasnewvariablesbearingthenamesb1,b2,andb3,whichwillholdsimulationsofthecoecientsonX1,X2,aswellastheconstantterm.Ifanyofthevariablesb1throughb3alreadyexistinthedataset,Clarifywillasktheusertodeletethosevariablesorchoosedi erentnamesforthesimulations.setxmeanThesetxcommandallowstheusertochoosearealorhypotheticalvalueforeachexplanatoryvariablebeforecomputingquantitiesofinterest.Inthisexample,thecommandsetxmeansetseachXequaltoitsaveragevalue.simqiThesimqicommandcomputesandreportsquantitiesofinterestandassociatedmeasuresofuncertainty.Usedwithoutspecifyinganyoptions(manyoptionsarepossible{seeSection7 8.3),simqiwillcomputeintelligentdefaultquantitiesthatareappropriatetothemodelbeingestimated.Inthecaseoflogit,simqiwillreporttheprobabilitythatY=1.8TheMainCommandsThissectionprovidesmoredetailedinformationabouteachofthemaincommandsinClarify:estsimp,setx,andsimqi.Muchoftheinformationalsoappearsinon-linehelp les,whichcanbeviewedbysearchingtheStatahelpmenuorenteringhelpcommandnameattheStataprompt.8.1estsimpFormat:estsimpmodelnamedepvar[indepvars][weight][ifexp][inrange][,sims(m)genname(newvar)antisimmi(file1file2...filek)ioutdropsims]Description:estsimpestimatesavarietyofstatisticalmodelsandgeneratesMsimulationsofeachparameter.Currentlysupportedmodelsincluderegress,logit,probit,ologit,oprobit,mlogit,poisson,nbreg.weibull,sureg,andtheadditivenormalmodelforcompositionaldata.Thesimulationsarestoredinnewvariablesbearingthenamesnewvar1;newvar2;:::;newvark,wherekisthenumberofparameters.EachvariablehasMobservationscorrespondingtotheMsimulations.estsimplabelsthesimulatedvariablesandliststheirnamesonthescreen,soyoucanverifywhatwassimulated.Theestsimpcommandacceptsnearlyalloptionsthataretypicallyavailableforthesupportedmodels.Italsoacceptsseveralspecialoptionsthataredescribedbelow.Options:sims(M)speci esthenumberofsimulations,M,whichmustbeapositiveinteger.Thedefaultis1000simulations.Ifyouchoosealargenumberofsimulations,youmayneedtoallocatemorememorytoStata.See[R]memoryintheStatareferencemanualformoredetailsaboutmemoryallocation.genname(newvar)speci esastub-nameforthenewlygeneratedvariables.Ifnostubisgiven,Statawillgeneratethevariablesb1;b2;:::;bk,otherwiseitwillgeneratenewvar1,newvar2,...,newvark,providedthatthevariablesdonotexistinmemoryalready.antisiminstructsestsimptouseantitheticalsimulations,inwhichnumbersaredrawninpairsfromtheuniform[0,1]distribution,withtheseconddrawbeingthecomplementofthe rst.8 Theantitheticaldrawsarethenusedtoobtainsimulationsfromamultivariatenormaldistri-bution.Thisprocedureensuresthatthemeanofthesimulationsforaparticularparameterisequaltothepointestimateofthatparameter.mi( lelist)allowsestsimptoanalyzemultiply-imputeddatasets: lesinwhichmissingvalueshavebeenmultiplyimputed,suchascreatedbyAmelia.Enterthenameforeachimputeddatasetyouwanttouse,suchasmi(file1file2file3).Alternatively,youcanenteracommonstubnameforallimputeddatasets,suchasmi(file).Inthiscase,estsimpassumesthatyouwanttouseall lesintheworkingdirectorythatarepartoftheuninterruptedsequence le1, le2, le3...estsimpwillestimatetheparametersforeachdatasetandusetheestimatestogeneratesimulations,whichwillre\rectnotonlyestimationuncertaintybutalsotheuncertaintyarisingfromtheimputationprocess.Note:ifthedatainmemoryhavebeenchanged,youcannotspecifythemi()optionuntilyouclearthememoryorsavethealtereddataset.ioutinstructsestsimptoprintintermediateoutput(atableofparameterestimates)foreachimputeddatasetthatitanalyzes.Bydefault,estsimpsuppressestheintermediateoutputanddisplaysonlythe nalestimatesproducedbycombiningtheresultsfromeachimputeddataset.dropsimsdropsthesimulatedparametersfromthepreviouscalltoestsimp.Examples:Toestimatealinearregressionofyonx1,x2,x3,andaconstantterm;simulate1000setsofparameterestimates;andthensavethesimulationsasb1,b2,...,bk,type:.estsimpregressyx1x2x3Inthisexample,Statawillcreate venewvariables.Thevariablesb1,b2andb3willcontainsimulatedcoecientsforx1,x2andx3;b4willholdsimulationsoftheconstantterm;andb5willcontainsimulatedvaluesforsigmasquared,themeansquarederroroftheregression.Tosimulate500setsofparametersfromalogitregressionandsavetheresultsasvariablesbeginningwiththeletter\s",type:.estsimplogityx1x2x3,sims(500)genname(s)Sincethelogitmodelcontainsnoancillaryparameters,thiscommandwillgeneratefournewvari-ables:s1,s2,s3,ands4.Variabless1-s3aresimulatedcoecientsforx1,x2andx3,andthe nalvariable,s4,isthesimulatedconstantterm.Tosimulate1000setsofparametersfromanorderedprobitregressioninwhichthedependentvariablecanassumethreevalues(low,medium,andhigh),type:9 .estsimpoprobityx1x2x3Theorderedprobitmodeldoesnotcontainaconstantterm,butitdoeshaveancillaryparameterscalledcut-points.Thus,theestsimpcommandlistedabovewillgenerate venewvariables.Thevariablesb1,b2andb3willholdsimulatedcoecientsforx1x2andx3.Variablesb4andb5willcontainsimulationsforthetwocutpoints(cut1andcut2).Toobtainantitheticalvariates,simplyusetheantisimoption,asin.estsimpoprobityx1x2x3,antisimSupposethatwehavethreeimputeddatasets,calledimp1.dta,imp2.dta,andimp3.dta.Wecouldanalyzeallthreedatasetsandcombinetheresultsbyissuingthefollowingcommand:.estsimpoprobityx1x2x3,mi(imp1imp2imp3)Theresultingsimulationsofthemainandancillaryparameterswouldre\rectbothestimationun-certaintyandthevariabilityassociatedwiththemultipleimputations.Toviewtheintermediateoutputfromeachorderedprobitestimation,addtheioutoptiontothepreviouscommand,asin.estsimpoprobityx1x2x3,mi(imp1imp2imp3)iout8.2setxFormat:setxfunction[weight][ifexp][inrange][,noinhernocwdel]setxvarname1function1varname2function2...[weight][ifexp][inrange][,noinhernocwdel]setx(varname1varname2)function1(varname3varname4)function2...[weight][ifexp][inrange][,noinhernocwdel]where10 function=mean|median|min|max|p#|math|#|`macro'|varname[#]Description:Aftersimulatingparametersfromthelastestimation(seeSection8.1),usesetxtosetvaluesfortheexplanatoryvariables(theX's),changevaluesthathavealreadybeenset,orlistthevaluesthathavebeenchosen.Themainvaluetypesaremeanarithmeticmeanmedianmedianminminimummaxmaximump##thpercentilemathamathematicalexpression,suchas5*5orsqrt(23)#anumericvalue,suchas5`macro'thecontentsofalocalmacro[#]thevalueinthe#thobservationofthedatasetIfyouusedmultiplyimputeddatasetsattheestimationstage,setxwillusethosesameimputeddatasetstocalculatevaluesfortheexplanatoryvariables.Forinstance,setxx1meanwouldcalculatethemeanofx1acrossalltheimputeddatasets.WhenusingsetxoranyotherStatacommandtocalculatesummarystatisticssuchasmeans,medians,minimums,maximums,andpercentiles,itisimportanttode nethesample.Attheestimationstage,Stataautomaticallydisregardsobservationsthatdonotsatisfythe\if",\in",and\weight"conditionsspeci edbytheuser.Italsoignoresobservationswithmissingvaluesononeormorevariables.Beforesettingaparticularvariableequaltoitsmeanoranyothersummarystatistic,usersmustdecidewhethertocalculatethestatisticbasedonlyonobservationsthatwereusedduringtheestimationstage,ortoincludeotherobservationsinthecalculation.Bydefault,setxinheritstheif-in-weightconditionsfromestsimpanddisregards(casewise-deletes)anyobservationwithmissingvaluesonthedependentorexplanatoryvariables.Youcanspecifydi erentif-in-weightconditionsbyincludingtheminthesetxcommandline,andyoucandisregardallinheritedconditionsbyusingthenoinherandnocwdeloptionsdescribedbelow.Thesetxcommandisalsousedbyanotherstatisticalpackagecalledrelogit,whichisalsoavailablefromhttp://gking.harvard.edu.Ifyouarerunningrelogitwiththewc()orpc()options,indicatingthatthedatawereselectedonthedependentvariable,setxwillcorrecttheselectionbiaswhencalculatingsummarystatistics.Forthisreason,meansandpercentilesproducedbysetxmaydi erfrommeansandpercentilesofthe(biased)sample.Whentheproportionof1'sinthepopulationisknownonlytofallwithinarange,suchaspc(.2.3),setxwillcalculateboundson11 thevaluesoftheexplanatoryvariables.TheresultwillbetwoX-vectors,the rstassumingthatthetrueproportionof1'sisatitslowerbound,andthesecondconditionalonthetrueproportionbeingatitsupperbound.Theprogramwillpassthesevectorstorelogitqandusethemtocalculateboundsonquantitiesofinterest.Toseteachexplanatoryvariableatasinglevaluethatfallsmidwaybetweenitsupperandlowerbounds,usethenoboundoptionthatisdescribedbelow.setxreliesuponthreeglobals:thematrixmrtxcandthemacrosmrtvtandmrtseto.Ifyouchangethevaluesoftheseglobals,theprogrammaynotworkproperly.setxacceptsaweightsandfweights.Italsoacceptsthespecialoptionslistedbelow.Notes:(1)Setxwillnotacceptspacesinmathematicalexpressionsunlessyouenclosetheexpressioninparentheses.setxx4ln(20)isavalidcommand,butsetxx4ln(20)isasyntaxerror.Similarly,setxx45*5andsetxx4(5*5)arevalid,butsetxx45*5isnot.(2)Youmayusesquarebrackets([])whenreferringtoobservationnumbers,e.g.setxx[15],butdonotusesquarebracketsinmathematicalexpressions,oryoumaygetunexpectedresults.Werecommendthatuserscheckthevaluestheyhavesetwithsetxbyenteringsetxwithoutanyarguments.(3)setxreliesonthreeglobals:thematrixmrtxcandthemacrosmrtvtandmrtseto.Ifyouchangethevaluesoftheseglobals,theprogrammaynotworkproperly.Options:noinhercausessetxtoignoreallif-in-weightconditionsthatareinheritedfromestsimp.Theusercanspecifynewif-in-weightconditionsbytypingthemaspartofthesetxcommand.nocwdelforcessetxtocalculatesummarystatisticsbasedonallvalidobservationsforagivenvariable,eveniftheobservationscontainmissingvaluesfortheothervariables.Ifnocwdelisnotspeci ed,setxwillcasewise-deleteobservationswithmissingvalues.noboundThisoptionisavailableonlyafterrelogit,andonlywhenthetrueproportionof1'sisassumedtofallwithinaspeci edrange.Supposetheusertypedpc(.2.4)withrelogitandthenenteredsetxx1mean.Bydefault,setxwouldsetthevariablex1equaltotwovalues:themeanofx1,assumingthatthetrueproportionofonesisonly0.2,andthemeanofx1,allowingthatthetrueproportionisashighas0.4.Bothvaluesforx1willbepassedtorelogitqandusedtocalculateboundsonquantitiesofinterest.Thenoboundoptionoverridesthisprocedurebysettingeachxtoasinglevalue:themidpointofitsupperandlowerbound.Thus,thecommandsetxx1mean,noboundwouldsetx1equaltothefollowingexpression:[(mean(x1)j=0:2)+(mean(x1)j=0:4)]=2,whererepresentsthepresumedproportionof1'sinthepopulation.keepmrtisaprogrammer'soptionthatinstructssetxtoreturnthematrixr(mrtxc)withoutchangingtheglobalsmrtxc,mrtvt,andmrtseto.Ifyoudon'tunderstandwhatthismeansyoushouldnotusethisoption.Examples:12 Tolistvaluesthathavealreadybeenset:.setxToseteachexplanatoryvariableatitsmean:.setxmeanToseteachexplanatoryvariableatitsmedian,basedonasampleinwhichx3�12andallinheritedconditionsareignoredandcasewisedeletionissuppressed:.setxmedianif�x312,noinhnocwToseteachexplanatoryvariabletothevaluecontainedinthe15thobservationofthedataset.setx[15]Thecommandcanalsoseteachvariableseparately.Forinstance,tosetx1atitsmean,x2atitsmedian,x3atitsminimum,x4atitsmaximum,x5atits25thpercentile,x6atln(20),x7at2.5,andx8equaltoalocalmacrocalledmyval,type:.setxx1meanx2medianx3minx4maxx5p25x6ln(20)x72.5x8`myval'Tochangethevalueofx3fromitspreviouslychosenvalueto5*5.setxx35*5Tosetallvariablesexceptx10attheirmeans,and xx10atits25thpercentile,callsetxtwice:oncetosetallvariablesattheirmeans,andasecondtimetochangethevalueofx10toits25thpercentile..setxmean.setxx10p25setxcanalsosetvaluesforgroupsofvariables.Tosetx1andx2totheirmeans,x3toitsmedian,andx4andx5totheir25thpercentiles,type:.setx(x1x2)meanx3median(x4x5)p2513 8.3simqiFormat:simqi[,pvgenpv(newvar)evgenev(newvar)prprval(value1value2...)genpr(newvar1newvar2...)fd(existingoption)changex(var1val1val2[&var2val1val2])msims(#)tfunc(function)level(#)listx]Description:Aftersimulatingparametersfromthelastestimation(seeSection8.1)andsettingvaluesfortheex-planatoryvariables(seeSection8.2),usesimqitosimulatevariousquantitiesofinterest,includingpredictedvalues,expectedvalues,and rstdi erences.Predictedvaluescontaintwoformsofuncertainty:\fundamental"uncertaintyarisingfromsheerrandomnessintheworld,and\estimation"uncertaintycausedbynothavinganin nitenumberofobservations.Moretechnically,predictedvaluesarerandomdrawsofthedependentvariablefromthestochasticcomponentofthestatisticalmodel,givenarandomdrawfromtheposteriordistributionoftheunknownparameters.Iftherewerenoestimationuncertainty,theexpectedvaluewouldbeasinglenumberrepresentingthemeanofthedistributionofpredictedvalues.Butestimatesarenevercertain,sothetheexpectedvaluemustbeadistributionratherthanapoint.Toobtainthisdistribution,weaverage-awaythefundamentalvariability,leavingonlyestimationuncertainty.Forthisreason,expectedvalueshaveasmallervariancethanpredictedvalues,eventhoughthepointestimateshouldberoughlythesameinbothcases.simqicalculatestwokindsofexpectedvalues:theexpectedvalueofY,andtheprobabilitythatYtakesonaparticularvalue.Formodelsinwhichthesetwoquantitiesareequal,simqiavoidsredundancybyreportingonlytheprobabilities.Note:simulatedexpectedvaluesareequivalenttosimulatedprobabilitiesforallthediscretechoicemodelsthatsimqisupports(logit,probit,ologit,oprobit,mlogit).Inthesemodels,theexpectedvalueofYisavector,witheachelementindicatingtheprobabilitythatY=j.Consideranorderedprobitwithoutcomes1,2,3.Theexpectedvalueis[Pr(Y=1),Pr(Y=2),Pr(Y=3)],themeanofamultinomialdistributionthatgeneratesthedependentvariable.A rstdi erenceisthedi erencebetweentwoexpectedvalues.Tosimulate rstdi erencesusethefd\wrapper",whichisdescribedbelow.simqicangeneratepredictedvalues,expectedvaluesand rstdi erencesforallthemodelsthatitsupports.Bydefault,however,itwillonlyreportthequantitiesofinterestthatappearinthe14 tablebelow.Toviewotherquantitiesofinterestorsavethesimulatedquantitiesasnewvariablesthatcanbeanalyzedandgraphed,useoneofsimqi'soptions.StatisticalQuantitiesdisplayedModelbydefaultregressE(Y)logitPr(Y=1)probitPr(Y=1)ologitPr(Y=j)foralljoprobitPr(Y=j)foralljmlogitPr(Y=j)foralljpoissonE(Y)nbregE(Y)suregE(Yj)forallequationsjweibullE(Y)Options:pvdisplaysasummaryofthepredictedvaluesthatsimqigeneratedviasimulation.genpv(newvar)savesthepredictedvaluesasanewvariableinthecurrentdataset.Each\obser-vation"ofnewvarrepresentsonesimulatedpredictedvalue.prdisplaysasummaryoftheprobabilitiesthatsimqigeneratedviasimulation.prval(value1value2...)instructssimqitoevaluatetheprobabilitythatthedependentvariabletakes-oneachofthelistedvaluesgenpr(newvar1newvar2...)savesthesimulatedprobabilitiesasnewvariablesinthecurrentdataset.Eachnew\observation"representsonesimulatedprobability.Ifboththeprval()op-tionandthegenpr()optionareused,simqiwillsavePr(Y=value1)innewvar1,Pr(Y=value2)innewvar2,etc.Iftheprval()optionisnotspeci ed,genpr()willsavetheprobabilitiesintheorderthattheyappearonthescreen.evdisplaysasummaryofexpectedvaluesthatsimqigeneratedviasimulation.Thisoptionisnotavailablefordiscretechoicemodels,whereitisredundantwithpr.genev(newvar)savestheexpectedvaluesinanewvariablecallednewvar.Eachobservationofnewvarrepresentsonesimulatedexpectedvalue.Thisoptionisnotavailablefordiscretechoicemodels,whereitisredundantwithgenpr().fd(existingoption)isawrapperthatmakesiteasytosimulate rstdi erences.Simplywrapthefd()wrapperaroundanexistingoptionandspecifythechangex()option.15 changex(var1val1val2)speci eshowtheexplanatoryvariables(thex's)shouldchangewhenevaluatinga rstdi erence.changexusesthesamebasicsyntaxassetx,exceptthateachexplanatoryvariablehastwovalues:astartingvalueandanendingvalue.Forinstance,fd(ev)changex(x1.2.8)instructssimqitosimulateachangeintheexpectedvalueofYcausedbyincreasingx1fromitsstartingvalue,0.2,toitsendingvalue,to0.8.level(#)speci esthecon dencelevel,inpercent,forcon denceintervals.Thedefaultislevel(95)orthevaluesetbysetlevel.Formoreinformationonthesetlevelcommand,seetheon-linehelpforlevel.msims(#)setsthenumberofsimulationstobeusedwhencalculatingexpectedvalues.Thenumbermustbeapositiveinteger.Bydefault,thevalueofmsimsissetat1000.simqidisregardsthemsimsoptionwhenevertheexpectedvalueisparametricallyde ned.listxinstructssimqitolistthex-valuesthatwereusedtoproducethequantitiesofinterest.Thesevaluesweresetusingthesetxcommand.tfunc(function)allowstheusertospecifyatransformationfunctionfortransformingthedepen-dentvariable.Thisoptionisonlyavailableforregressandsureg.ThecurrentlysupportedfunctionsareFunctionTransformation(forallvariablesj)squaredyj!yjyjsqrtyj!p(yj)expyj!eyjlnyj!ln(yj)logitiyj!eyj=(1+Pjeyj)BasicExamples:Todisplaythedefaultquantitiesofinterestforthelastestimatedmodel,type:.simqiForasummaryofthesimulatedexpectedvalues,type:.simqi,evForasummaryofthesimulatedprobabilities,Pr(Y=j),foralljcategoriesofthedependentvariable,type:16 .simqi,prTodisplayonlyasummaryofPr(Y=1),theprobabilitythatthedependentvariabletakesonavalueof1,type:.simqi,prval(1)Togenerate rstdi erences,usethefd()wrapperandthechangex()option.Forinstance,thefollowingcommandwillsimulatethechangeintheexpectedvalueofYcausedbyincreasingx4from3to7,whileholdingotherexplanatoryvariablesattheirmeans.setxmean.simqi,fd(ev)changex(x437)Tosimulatethechangeinthesimulatedprobabilities,Pr(Y=j),foralljcategoriesofthedependentvariable,givenanincreaseinx4fromitsminimumtoitsmean,type:.setxmean.simqi,fd(pr)changex(x4minmean)IfyouareonlyinterestedinthechangeinPr(Y=1)causedbyraisingx4fromits20thtoits80thpercentilewhenothervariablesareheldattheirmean,type:.setxmean.simqi,fd(prval(1))changex(x4p20p80)MoreIntricateExamples:Todisplaynotonlythesimulatedexpectedvaluesbutalsothex-valuesusedtoproducethem,wewouldtype:.simqi,evlistxsimqidisplays95%con denceintervalsbydefault,butwecouldmodifythepreviousexampletogivea90%con denceintervalfortheexpectedvalue:.simqi,evlistxlevel(90)Tosavethesimulatedexpectedvaluesinanewvariablecalledpredval,type:17 .simqi,genev(predval)TosimulatePr(Y=0),Pr(Y=3),andPr(Y=4),andthensavethesimulatedprobabilitiesasvariablescalledsimpr0,simpr3andsimpr4,type:.simqi,prval(034)genpr(simpr0simpr3simpr4)Thechangexoptioncanbearbitrarilycomplicated.SupposethatwewanttosimulatethechangeinPr(Y=1)causedbysimultaneouslyincreasingx1from.2to.8andx2fromln(7)toln(10).Thefollowinglineswillproducethequantitiesweseek:.setxmean.simqi,fd(prval(1))changex(x1.2.8x2ln(7)ln(10))Wecouldaugmentthepreviousexamplebyrequestingasecond rstdi erence,causedbyincreasingx3fromitsmediantoits90thpercentile.Simplyseparatethetwochangexrequestswithanampersand..setxmean.simqi,fd(prval(1))changex(x1.2.8x2ln(7)ln(10)&x3medianp90)Likewise,thefd()optioncanbeasintricateaswewouldlike.Forinstance,supposethatwehaverunapoissonregression.WewanttoseewhathappenstoPr(Y=2),Pr(Y=3),andtheexpectedcountwhenweincreasex1fromitsminimumtoitsmaximum.Toobtainourquantitiesofinterest,wewouldtype:.setxmean.simqi,fd(prval(23))fd(ev)changex(x1minmax)simqiallowsustosaveanysimulatedvariableforsubsequentanalysis.To ndthemean,standarddeviation,andacon denceintervalaroundanyquantityofinterestthathasbeensavedinmemory,usethesumqicommand.Tographthesimulations,usegraphorkdensity.Thetfunc()optionreversescommontransformationsthatusershaveappliedtothedependentvariable.Supposethatyouhavetakenthelogofthedependentvariablebeforerunningestsimpregress.Thecommandsimqiwouldprovidequantitiesofinterestontheloggedscale.Ifyouwantedtoreversethetransformation,therebyrecoveringtheoriginalscale,youcouldtype.simqi,tfunc(exp)18 9FrequentlyAskedQuestionsWhydoesClarifygiveslightlydi erentresultseachtime?Clarifyusesrandomsimulationtocreatequantitiesofinterestandassociatedmeasuresofuncertainty.Slightdiscrepanciesarearesultoftakinga nitenumberofsimulationsandusingadi erentrandomnumberseed.Ifyourequiremoreprecision,increasethenumberofsimulationsdrawn(seeSection8,Sub-section8.1).Ifexactlythesamenumericalresultsarerequired,settherandomnumberseedwiththeStatacommandsetseedbeforebeginningtheanalysis.Isitokifsomeofmyexplanatoryvariablesarestatisticallyinsigni cant?Yes.Clarifycomputesquantitiesofinterestbasedonallestimatedcoecients,regardlessoftheirlevelofsta-tisticalsigni cance.Thisisnotproblematicbecausethetruequantitiesofinterestareusuallythepredictedvalues,expectedvalues,and rstdi erences,notthecoecientsthemselves.Itisusuallybettertofocusonthecon denceintervalsClarifyreportsforeachquantityitcomputesthanthestandarderrorsofcoecients.HowdoIknowhowlargetomakeM?InourexperienceM=1000issucientformostanalyses.However,onecheckontheadequacyofMistoverifythatthemeansofthesimulatedparametersareequaltotheestimatedparameterswithinthedesireddegreeofprecision.Iftheyarenot,increaseMuntilyouachievethedesiredprecision.BeawarethatinlargermodelsincreasingMmayaddtothecomputertimeandmemoryrequiredforsimulation.HowcanIsettheX'sequaltotheactualvaluesinmydataset?Youcanusethesetxcommandtosetthex'satanyvalueyou'dlike,includingtheactualvaluesthatappearinthedataset.Forinstance,setx[93]willsetallthex'sequaltothevaluesthatappearinthe93rdobservationofyourdataset.Thesequenceofcommandssetxmean/*setsallx'stotheirmeans*/setxx1x1[7]/*resetsx1tovaluein7thobservation*/willsetallthex'sequaltotheirmeans,andthensetx1equaltothevalueofx1thatappearsinthe7thobservationofthedataset.Ifyouwantedtogetresultsforeachx1inyourdataset,youcouldwritealittleloop,suchas:setxmean/*setallx'stotheirmeanlevels*/locali1/*createacounterthatrunsfrom1to*/while`i'_N{/*_N,where_Nisthe#ofobservations*/setxx1x1[`i']/*setx1tothevalueintheithobs*/19 simqi/*simulatequantityofinterest*/locali=`i'+1/*repeatforotherobservations*/}HowcanIsettheX'swhenIhaveinteractionterms?setxwillworkwithinteractedvariables.SupposetheindependentvariablesinyourdatasetareX1andX2andyouhavecreatedaninteractiontermX1X2=X1X2.Firstyouwouldrunestsimpwiththeinteractedvariable,e.g.:regressyx1x2IfyouwishtosetX1X2toitsmean,thenyoucanusesetxinthenormalway:setxx1x2meanHowever,ifyouwanttosetX1X2totheproductofthemeansofX1andX2(ratherthanthemeanoftheproduct),thenyouhavetwochoices.First,youcouldsetthevaluesbyhand,e.g.,setxx1x210*12where10isthemeanofx1and12isthemeanofx2.However,anevenbettermethodisthefollowingsequenceofcommands:summarizex1,meanonly/*Computethemeanofx1*/localmx1=`r(mean)'/*Savethemeaninalocalmacro*/summarizex2,meanonly/*Computethemeanofx2*/localmx2=`r(mean)'/*Savethemeaninalocalmacro*/setxx1x2`mx1'*`mx2'/*Setxtomean(x1)*mean(x2)*/HowcanIuseClarifytoanalyzecompositionaldata?Theprocedureinvolvesfourbasicsteps:1.Runtlogittotransformthevoteshares(orothercompositionaldata)intologratios20 2.Runestsimpsuregtoestimateaseeminglyunrelatedregressionandsimulatetheparameters3.Runsetxtochooserealorhypotheticalvaluesfortheexplanatoryvariables(X's)4.Runsimqiwiththetfunc()optiontosimulatethedistributionofvotes,conditionalonthesimulatedparametersandchosenX's.Supposethatwearestudyingapoliticalsystemwith500electoraldistricts.Eachobservationorrowinthedatasetpertainstooneofthosedistricts.Inthisexample,wehavethreepoliticalpartiesthateachgarnerapercentageofthevote.Theirvoteshares,collectedinvariablesv1,v2,andv3,sumto100percent.First,weselectparty3asourreferencepartyandtransformthevotesharesoftheothertwopartiesintologratioswithrespecttoparty3.Thus,y1=ln(v1=v3)andy2=ln(v2=v3).TheappropriatesyntaxinClarifyistlogitv1y1v2y2,base(y3)percent,whichwillcreatetwonewvariables:y1andy2,whicharethelogratiosforv1andv2withrespecttothebasevariablev3.usetheestsimpcommandtorunaseeminglyunrelatedregressionmodelwiththelogratiosy1andy2asourdependentvariables.Thesyntaxisestsimpsureg(y1x1x2)(y2x3x4).Eachequationisenclosedinparentheses.Thus,the rstequationstatesthatthelogratioy1isalinearfunctionoftheexplanatoryvariablesx1andx2.Theprogramwillautomaticallyaddaconstantterm,aswell,unlesstheuserasksthatitbesuppressed.Likewise,thesecondequationstatesthaty2isalinearfunctionofx3,x4,andaconstant.Theestsimpcommandwillestimatethemodelandsimulatetheparameters.Bydefault,estsimpwilldraw1000valuesforeachparameter.Inthisexample,theprogramwoulddraw1000setsofbetas(eachsethassixelements:threebetasforequation1andthreeforequationtwo);theprogramwouldalsogenerate1000simulationsof,a2x2matrixthatgovernstherelationshipbetweentheerrorsofthetwoequations.Clarifywillstorethesesimulationsinmemoryforsubsequentuse.Third,usethesetxcommandtochoosesomehypotheticalorrealvaluesforourexplanatoryvariables.Forinstance,typesetx(x1x2)meanx315x4p20tosetvariablesx1andx2attheirrespectivemeans,x3equaltothenumber15,andx4equaltoitstwentiethpercentile.Finally,usethesimqicommandtosimulatequantitiesofinterest,suchasthepredicteddistributionofvotes.Thecommandissimqi,pvtfunc(logiti),wheretfunc(logiti)tellstheprogramtoapplytheinverselogisticfunctiontotransformthelogratiosintosharesofthetotalvote.Howdidyougeneratethegraphsinyourpaper?Clarifydoesnotautomaticallyproducegraphs.Inordertoproduceagraph,suchasFigure1fromKing,Tomz,andWittenberg(2000),youwillneedtouseStata'sgraphicscommands.ThesequenceofcommandsusedtogenerateFigure1is:21 generateplo=.generatephi=.generateageaxis=_n+17in1/78setxeducate12white1incomemeanlocala=18while`a'95{setxage`a'agesqrd(`a'^2)/100simqi,prval(1)genpr(pi)_pctilepi,p(2.5,97.5)replaceplo=r(r1)ifageaxis==`a'replacephi=r(r2)ifageaxis==`a'droppilocala=`a'+1}ageaxisgraphplophiageaxis,s(ii)c(||)HowcanIcreatealogofallthecommandsandoutputinaClarifysession?See[R]logintheStatareferencemanualorconsulttheon-linehelpforthelogcommand.HowdoIcitethisprogram?Ifyouusethissoftware,pleaseciteMichaelTomz,JasonWittenberg,andGaryKing(2001).CLARIFY:SoftwareforInterpretingandPresentingStatisticalResults.Version2.0Cambridge,MA:HarvardUniversity,June1.http://gking.harvard.eduandGaryKing,MichaelTomz,andJasonWittenberg(2000).\MakingtheMostofSta-tisticalAnalyses:ImprovingInterpretationandPresentation."AmericanJournalofPoliticalScience44,no.2(April2000):347-61.CanIshareClarifywithothers?Clarifyis(C)Copyright,19992001,MichaelTomz,JasonWittenbergandGaryKing,AllRightsReserved.Youmaycopyanddistributethisprogramprovidedthecopyisidenticaltotheoriginalandyoudonotchargeforit.Torequestanexception,pleasecontactMichaelTomz,tomz@stanford.edu.Werecommendthatyoudistributethecurrentversionofthisprogram,whichisavailablefromhttp://GKing.Harvard.Edu.22 WhatifI ndabug?First,getthemostrecentversion(fromhttp://gking.harvard.edu)andtrytoreplicatetheproblem.Iftheproblempersists,copydownexactlywhatyouseeonthescreenwhentheprogramcrashes,andemailitalongwiththecommandyouusedtogeneratetheerror,totomz@stanford.edu,witty@polisci.wisc.edu,orking@harvard.edu.YoumayalsosendcommentstoMichaelTomz,DepartmentofPoliticalScience,EncinaHall,StanfordUniversity,Stanford,CA94305-6044.10Formulae-APeekUndertheHoodThissectionisintendedforadvanceduserswhowantdetailsaboutthealgorithmsthatClarifyusestosimulateparameters,setvaluesfortheexplanatoryvariables,andcomputequantitiesofinterest.Wewelcomeanysuggestionsforimprovement.10.1AlgorithmsforestsimpRecallthattheestsimpcommandperformstwofunctions:itestimatesthemainandancillaryparameters(\r)ofthestatisticalmodel,anditdrawssimulationsofthoseparametersfromtheirasymptoticsamplingdistribution.Typically,thesamplingdistributionismultivariatenormalwithmeanequaltothepoint-estimatesoftheparameters(^\r)andvarianceequaltothevariance-covariancematrixofestimates^V(^\r).ThecurrentversionofClarifycontainstwoexceptionstothisrule.Inthecaseoflinearregression,thee ectcoecients( s)aredrawnfromamultivariatenormal,butsimulationsofthehomoskedasticvariance2areobtainedinaseparatestepfromascaledinverse2distributionwith=nkdegreesoffreedom,wherenisthenumberofobservationsinthedatasetandkisthenumberofexplanatoryvariables,includingtheconstantterm(Gelman,etal.,1995,p.237).Thetwo-stepprocedureislegitimatebecausethee ectcoecientsandthevarianceparameterareorthogonalinalinearregression;theprocedureisdesirablebecause2isstrictlypositive,andthereforemoreappropriatelydrawnfromitsexactposteriorthanfromanormaldistribution.Toobtainsimulationsof2,theprogramdrawscfroma2withdegreesoffreedom,andthencalculates~2=2=c.Theresultingdrawshaveanexpectedvalueof2^2,whichapproaches^2as!1.Likewise,thee ectcoecients( s)ofaseeminglyunrelatedregressionaredrawnfromthemul-tivariatenormal,butsimulationsofthevariancematrixareobtainedinaseparatestep.Here,theappropriateposteriordistributionistheinverseWishart(Gelman,etal.,1995,p.481)withdegreesoffreedomanddimensionp,wherepisthenumberofequationsintheseeminglyunrelatedregressionmodel.Incaseswherethenumberofexplanatoryvarablesvariesfromoneequationto23 thenext,Clarifycalculatesnkforeachequationandsetsequaltothemeanofthosevalues.Toobtainsimulationsof,theprogramdrawsfromaWishartwithscalefactor(^)1andinvertsthedraws.ThealgorithmfordrawingfromtheWishartreliesonBartlett'sdecomposition,whichisconciselysummarizedinJohnson(1987,p.204)andRipley(1987,pp.99-100).estsimpproducesdrawsthathaveanexpectedvalueofp1^,whichapproaches^asgoestoin nity.Insmallsamplesthisprocedureisconservative,since�p1,implyingthatE(~)�^.Forallmodels,simulationsofthemainandancillaryparametersarerandom.Thismeansthat,inanygivenrunofestsimp,theaveragevalueof~\rmaybeslightlysmallerorlargerthanthepointestimate^\r,thoughtheapproximationbecomesmoreprecisewithahighernumberofsimulations.Userscanforcethemeanofthesimulatedparameterstoequalthevectorofpointestimatesbyrequestingantitheticalsimulations(Stern1997,pp.2028-29).Theantisimoptioninstructstheprogramtodrawrandomnumbersinpairsfromtheuniform[0,1]distribution,withtheseconddrawbeingthecomplementofthe rst.Forinstance,ifthe rstdrawis0.3thenthecomplementarydrawis0.7.Thedrawsare,therefore,exactlybalancedaroundthemeanoftheuniformdistribution.Theseanthitheticalsimulationsarethenusedtoobtainantitheticalorbalanceddrawsfromthemultivariatenormal.Whenusersareanalyzingasingledataset,Clarifyestimatesasinglevector^\rwithvariance^V(^\r)anddrawsallMsimulationsbasedonthoseestimates.Thetablethatappearsonthescreengivestheexactpointestimatesandstandarderrors,insteadofreportingthemeansandstandarddeviationsofthesimulations.Theprocedureissomewhatmorecomplicatedwhentheresearcheremploysthemioptiontoanalyzeseveralimputeddatasets.Inthiscase,estsimprepeatsthefollowingalgorithmItimes,whereIisthenumberofcompleteddatasets:estimatetheparametersandtheirvariance-covariancematrixconditionalontheinformationindataseti(i=1;2;:::;I),andthendrawM=Isetsofparametersfromtheirsamplingdistribution.ByrepeatingthisalgorithmItimes,theprogramgeneratesMsetsofsimulatedparameters.Theoutputtablegivestheanalyticalpoint-estimate,standarderror,andt-statisticforeachparameter,insteadofreportingthemeansandstandarddeviationsofthesimulations.Speci cally,themultiple-imputationpointestimateforparameterqisq=1IPI=1^qiandthevarianceassociatedwithqisaweightedcombinationofthewithin-imputationandbetween-imputationvariances:V(q)=w+(1+I1)b,wherew=1IPI=1V(^qi)andb=1I1PI=1(^qiq)2.Theratioofq(theparameterestimate)toV(q)1=2(itsstandarderror)formsat-statisticwithdegreesoffreedom=(I1)[1+u(1+I1)b]2.Formoreinformationabouttheseprocedures,seeKing,etal.(2001)andSchafer(1997,pp.109-110).10.2Algorithmsforsetxsetxallowstheusertochooserealorhypotheticalvaluesfortheexplanatoryvariables(theXs).Theprogramemploysstandardformulaeforthemean,theminimum,themaximum,percentiles,24 andotherdescriptivestatistics.Iftheuserisanalyzingseveralimputeddatasets,setxwillcalculatetheaveragestatisticacrossthedatasets.Forinstance,thecommandsetxx1meanwillcalculatethemeanofx1ineachdataset,andthensetx1equaltotheaverageofthosemeans.Similarly,thecommandsetxx1x1[3]willobtainthevalueofx1inthethirdrowofeachimputeddataset,andthensetx1equaltotheaverageofthosevalues.10.3Algorithmsforsimqisimqisimulatesquantitiesofinterestbasedontheparametersthatweregeneratedbyestsimpandthex-valuesthatwerechosenwithsetx.Theprogramobtainssimulationsofthedependentvariableandusesthemtocalculateexpectedvalues,probabilities, rstdi erences,andotherquantitiesofinterest.Thisprocedureworksinallcasesbutinvolvessomeapproximationerror,whichuserscanmakearbitrarilysmallbychoosingasucientnumberofsimulations.Inmanycases,though,shortcutsexistthatcancurtailbothcomputationtimeandapproximationerror.simqiemployssuchshortcutswheneverpossible.Here,wesketchthealgorithmsforeachmodelthatClarifysupports.regress:Theexactalgorithminsimqidependsonwhethertheuserhastransformedthedependentvariable(e.g.,takenthelogofy)priortoestimation.Ifnosuchtransformationhasoccurred,theprogramgeneratesonepredictedvalueaccordingtotheformula~y=Xc~ +~,where~ isavectorofsimulatede ectcoecientsand~isonedrawfromN(0;~2).Likewise,theprogramsimulatesoneexpectedvalueas~E(y)=Xc~ .Thealgorithmbecomesabitmorecomplicatediftheusertransformedthedependentvariablepriortoestimation,andwouldliketoreversethetransformationwheninterpretingtheresults.Letfrepresentafunction,asidenti edbythetfunc()option,thatreversesthetransformation.Iffhasbeenspeci ed,theprogramsimulatesonepredictedvalueaccordingtotheformulaf(Xc~ +~).Foranexpectedvalue,theprogramdrawsmvaluesof~d(d=1;2;:::;m)fromN(0;~2)andthencomputes(1=m)Pm=1f(Xc~ +~d),whichistheaverageofmpredictedvalues.logit:Theformulafor~,thesimulatedprobabilitythatthedependentvariableytakesonavalueof1,is1=(1+eXc~ ).Toobtainonesimulationofy,theprogramdrawsanumberfromtheBernoullidistributionwithparameter~.probit:Theformulafor~,thesimulatedprobabilitythatthedependentvariableytakesonavalueof1,is(Xc~ )whereisthec.d.f.ofthestandardnormaldistribution.Toobtainonesimulationofy,theprogramdrawsanumberfromtheBernoullidistributionwithparameter~.ologit:Theexactformuladependsonthenumberofcategoriesinthedependentvariable.Supposetherearethreecategories.Let~ representonesimulatedvectorofe ectcoecientsandlet~loand~histandfordrawsofthecutpoints.Toobtainonesimulationoftheprobabilitiesforeachcategory(y=0;y=1;y=2),theprogramcalculates:~0~Pr(y=0)=11+e(Xc~ ~lo),~1~Pr(y=1)=25 11+e(Xc~ ~hi)11+e(Xc~ ~lo),and~2~Pr(y=2)=111+e(Xc~ ~hi).Withtheseresults,theprogramcandrawapredictedvalue,~y,fromamultinomialdistributionwithparameters~0,~1,~2,andn=1.oprobit:Theexactformuladependsonthenumberofcategoriesinthedependentvariable.Sup-posetherearethreecategories.Let~ representonesimulatedvectorofe ectcoecientsandlet~loand~histandfordrawsofthecutpoints.Toobtainonesimulationoftheprobabilitiesforeachcategory(y=0;y=1;y=2),theprogramcalculates~0~Pr(y=0)=(~loXc~ ),~1~Pr(y=1)=(~hiXc~ )(~loXc~ ),and~2~Pr(y=2)=(Xc~ ~hi).Withtheseresults,theprogramcandrawapredictedvalue,~y,fromamultinomialdistributionwithparameters~0,~1,~2,andn=1.mlogit:TheprobabilityequationfortheKnominaloutcomesofthemultinomiallogitis~j~Pr(y=j)=eXc~ jPK=1eXc~ k,whereoneoftheJoutcomesisthebasecategory,suchthatthee ectcoecients~ forthatcategoryaresettozero.Withtheseresults,theprogramcandrawapredictedvalue,~y,fromamultinomialdistributionwithparametersequaltothe~sandn=1.poisson:Theformulafortheexpectedvalue~iseXc~ ,andtheprobabilitythatthedependentvariabletakesontheintegervaluejis~Pr(y=j)=e~~jj!.Toobtainonepredictedvalue,theprogramdraws~yfromaPoissondistributionwithparameter~.ThePoissonsimulatorisadaptedfromPress,etal.(1992),pp.293-95.nbreg:Theformulafortheexpectedvalue~iseXc~ ,justasinthePoissonregressionmodel(Long1997,pp.230-33).Theprobabilitythatthedependentvaluetakesontheintegervaluejcanbesimulatedas~Pr(y=j)=(j+~ 1)j!(~ 1)~ 1~ 1+~~ 1~~ 1+~j.simqiobtains~ ,the\overdispersion"parameter,bydrawingsimulationsofln( )andtheotherparametersfromthemultivariatenormaldistributionandthencalculatingeln( ).Toobtainapredictedvalue~y,theprogramdrawsonenumberfromapoissondistributionwithmeaneXc~ +~,wheree~issimulatedfromagammadis-tributionwithshapeparameter~ 1andscaleparameter~ .When~ 11,thegammasimulatorisbasedonthealgorithmdevelopedbyAhrensandDieter,asdescribedinRipley(1987,p.88).Forothervaluesof~ 1,thegammasimulatorisbasedontheprocedurebyBest,asdescribedinDevroye(1986,p.410).sureg:Aswithregress,thealgorithmforinterpretingtheresultsofasuregdependsonwhethertheusertransformedthedependentvariable.Iftheuserestimatedthemodelwithouttransformingthedependentvariable,theprogramgeneratesonepredictedvalueforequationkaccordingtotheformula~yk=Xc~ k+~k,where~ kisavectorofsimulatede ectcoecientsforequationkand~kisasimulateddisturbancetermforthatequation.Disturbancesforallequationsaredrawnsimultaneouslyfromamultivariatenormaldistributionwithmean0andvariancematrix~,asobtainedfromtheinverseWishart.Likewise,theprogramsimulatesoneexpectedvalueforequationkas~E(yk)=Xc~ k.Iftheuserhastransformedthedependentvariable,letfrepresent26 thefunctionthatreversesthetransformation.Theprogramsimulatesonepredictedvalueforequationkaccordingtotheformulaf(Xc~ k+~k).Foranexpectedvalue,theprogramdrawsmsetsofdisturbancetermsfromN(0;~)andindexesthemas~k;d,wherekmarkstheequationandd=1;2;:::;m.Then,foreachequationktheprogramcomputes(1=m)Pm=1f(Xc~ k+~k;d),whichistheaverageofmpredictedvalues.weibull:Thealgorithmdependsonwhichmetric,proportionalhazard(PH)oracceleratedfailure-time(AFT)metric,wasusedattheestsimpstage.Theexpectedvalueisde nedas~1=~p(1+1=~p).IntheAFTmetric,~=eXc~ ~p;inthePHmetric,~=eXc~ .Theprogramobtainssimulationsoftheancillaryshapeparameterpdrawingbyln(p)andtheotherparametersfromamultivariatenormaldistributionandthencalculatingeln(p).Toobtainapredictedvalue,theprogramdrawsonenumberfromtheWeibulldistributionwithparameters~and~p.27 11ReferencesDevroye,Luc(1986).Non-UniformRandomVariateGeneration.NewYork:Springer-Verlag.King,Gary,MichaelTomz,andJasonWittenberg(2000).\MakingtheMostofStatisticalAnalyses:ImprovingInterpretationandPresentation."AmericanJournalofPoliticalScience44,no.2(April2000):347-61.King,Gary,JamesHonaker,AnneJoseph,andKennethScheve(2001).\AnalyzingIncompletePoliticalScienceData:AnAlternativeAlgorithmforMultipleImputation."AmericanPoliticalScienceReview95,no.1(March2001):49-69.Johnson,MarkE.(1987).MultivariateStatisticalSimulation.NewYork:JohnWiley&Sons.Long,J.Scott(1997).RegressionModelsforCategoricalandLimitedDependentVariables.Thou-sandOaks,CA:SAGEPublications.Press,WilliamH.,SaulA.Teukolsky,WilliamT.Vetterling,andBrianP.Flannery(1992).Nu-mericalRecipesinC:TheArtofScienti cComputing,2nded.NewYork:CambridgeUniversityPress.,BrianD.(1987).StochasticSimulation.NewYork:JohnWiley&Sons.Schafer,J.L.(1997).AnalysisofIncompleteMultivariateData.NewYork:Chapman&Hall.Stern,Steven(1997).\Simulation-BasedEstimation."JournalofEconomicLiterature35,no.4(December):2006-39.12AcknowledgementsWegratefullyacknowledgecommentsandsuggestionsbyNickCox,WilliamGould,AndrewD.Martin,andKenScheve.WealsowishtothankthemanyusersofClarifywhohaveprovidednumeroussuggestionsformakingthesoftwaremore\rexibleanduser-friendly.Partsofthispro-gramwereinspiredbyJ.ScottLong,\CATDEV:StataModulesforInterpretationofCategoricalDependentVariables"(IndianaUniversity,April16,1998).ThemultipleimputationprocedureinestsimpextendsuponthemiestcommandwrittenbyKenScheve(HarvardUniversity,February6,1999).28