TheStataJournal20077Number2pp268271Statatip45GettingthosedataintoshapeChristopherFBaumDepartmentofEconomicsBostonCollegeChestnutHillMA02467baumbceduNicholasJCoxDepartmentofGeographyDurhamU ID: 415792
Download Pdf The PPT/PDF document "TheStataJournalEditorJosephNewtonDepartm..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
TheStataJournalEditorJosephNewtonDepartmentofStatisticsTexasA&MUniversityCollegeStation,Texas77843979-845-3142;FAX979-845-3144jnewton@stata-journal.comEditorNicholasJ.CoxDepartmentofGeographyDurhamUniversitySouthRoadDurhamCityDH13LEUKn.j.cox@stata-journal.comAssociateEditorsChristopherF.BaumBostonCollegeRinoBellocco TheStataJournal(2007)7,Number2,pp.268{271Statatip45:GettingthosedataintoshapeChristopherF.BaumDepartmentofEconomicsBostonCollegeChestnutHill,MA02467baum@bc.eduNicholasJ.CoxDepartmentofGeographyDurhamUniversityDurhamCity,UKn.j.cox@durham.ac.ukAreyourdatainshape?Thatis,aretheyinthestructurethatyouneedtoconducttheanalysisyouhaveinmind?Datasourcesoftenprovidethedatainastructurethatissuitableforpresentationbutclumsyforstatisticalanalysis.OneofthekeydatamanagementtoolsthatStataprovidesisreshape;see[D]reshape.Ifyouneedtomodifythestructureofyourdata,youshouldbefamiliarwithreshapeanditstwofunctions:reshapewideandreshapelong.Inthistip,wediscusshowtwoapplicationsofreshapemaybethesolutiontosomeknottydatamanagementproblems.Asarstexample,considerthisquestionpostedonStatalistbyanindividualwhohasadatasetinthewideform:countrytradeflowYr1990Yr1991Armeniaimports105120Armeniaexports90100Boliviaimports200230Boliviaexports80115Colombiaimports100105Colombiaexports7071Hewouldliketoreshapethedataintolongform:countryyearimportsexportsArmenia199010590Armenia1991120100Bolivia199020080Bolivia1991230115Colombia199010070Colombia199110571c\r2007StataCorpLPdm0031 C.F.BaumandN.J.Cox269Wemustexchangetherolesofyearsandtrade\rowsintheoriginaldatatoarriveatthedesiredstructure,suitableforanalysisasxtdata.Thisexchangecanbehandledbytwosuccessiveapplicationsofreshape:.reshapelongYr,i(countrytradeflow)(note:j=19901991)Datawide-longNumberofobs.6-12Numberofvariables4-4jvariable(2values)-_jxijvariables:Yr1990Yr1991-YrThistransformationswingsthedataintolongformwitheachobservationidentiedbycountry,tradeflow,andthenewvariablej,takingonthevaluesofyear.Wenowperformreshapewidetomakeimportsandexportsintoseparatevariables:.rename_jyear.reshapewideYr,i(countryyear)j(tradeflow)string(note:j=exportsimports)Datalong-wideNumberofobs.12-6Numberofvariables4-4jvariable(2values)tradeflow-(dropped)xijvariables:Yr-YrexportsYrimportsIfwetransformthedatatowideformonceagain,thei()optioncontainscountryandyear,asthosearethedesiredidentiersoneachobservationofthetargetdataset.Wespecifythattradeflowisthej()variableforreshape,indicatingthatitisastringvariable.Thedatanowhavethedesiredstructure.Althoughwehaveillustratedthisdouble-reshapetransformationwithonlyafewcountries,years,andvariables,thetechniquegeneralizestoanynumberofeach.Asasecondexampleofsuccessiveapplicationsofreshape,considertheWorldBank'sWorldDevelopmentIndicators(WDI)dataset.1Theirextractprogramgen-eratesacomma-separatedvalue(CSV)databaseextract,readablebyExcelorStata,butthestructureofthosedatahindersanalysisaspaneldata.Forarecentyear,theheaderlineoftheCSVleis"Seriescode","CountryCode","CountryName","1960","1961","1962","1963","1964","1965","1966","1967","1968","1969","1970","1971","1972","1973","1974","1975","1976","1977","1978","1979","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004"1.Seehttp://econ.worldbank.org. 270Statatip45Thatis,eachrowoftheCSVlecontainsavariableandcountrycombination,withthecolumnsrepresentingtheelementsofthetimeseries.2Ourtargetdatasetstructureisthatappropriateforpanel-datamodeling,withthevariablesascolumnsandrowslabeledbycountryandyear.Twoapplicationsofreshapewillagainbeneededtoreachthetargetformat.Werstinsheet(see[D]insheet)thedataandtransformthetriliteralcountrycodeintoanumericcodewiththecountrycodesaslabels:.insheetusingwdiex.raw,commanames.encodecountrycode,generate(cc).dropcountrycodeWethenmustaddressthatthetime-seriesvariablesarenamedvar4-var48,astheheaderlineprovidedinvalidStatavariablenames(numericvalues)forthosecolumns.Weuserename(see[D]rename)tochangev4tod1960,v5tod1961,andsoon:forvi=4/48{renamev`i'd`=1956+`i''}Wenowarereadytocarryouttherstreshape.Wewanttoidentifytherowsofthereshapeddatasetbybothcountrycode(cc)andseriescode,thevariablename.ThereshapelongwilltransformafragmentoftheWDIdatasetcontainingtwoseriesandfourcountries:.reshapelongd,i(ccseriescode)j(year)(note:j=196019611962196319641965196619671968196919701971197219731974197519761977197819791980198119821983198419851986198719881989199019911992199319941995199619971998199920002001200220032004)Datawide-longNumberofobs.7-315Numberofvariables48-5jvariable(45values)-yearxijvariables:d1960d1961...d2004-d2.Avariationoccasionallyencounteredwillresemblethisstructure,butwithperiodsinreversechronologicalorder.Thesolutionherecanbeusedtodealwiththatproblemaswell. C.F.BaumandN.J.Cox271.listin1/15ccseriesc~eyearcountrynamed1.AFGadjnetsav1960Afghanistan.2.AFGadjnetsav1961Afghanistan.3.AFGadjnetsav1962Afghanistan.4.AFGadjnetsav1963Afghanistan.5.AFGadjnetsav1964Afghanistan.6.AFGadjnetsav1965Afghanistan.7.AFGadjnetsav1966Afghanistan.8.AFGadjnetsav1967Afghanistan.9.AFGadjnetsav1968Afghanistan.10.AFGadjnetsav1969Afghanistan.11.AFGadjnetsav1970Afghanistan-2.9712912.AFGadjnetsav1971Afghanistan-5.5451813.AFGadjnetsav1972Afghanistan-2.4072614.AFGadjnetsav1973Afghanistan-.18828115.AFGadjnetsav1974Afghanistan1.39753Therowsofthedataarenowlabeledbyyear,butoneproblemremains:allvariablesforagivencountryarestackedvertically.Tounstackthevariablesandputtheminshapeforxtreg(see[XT]xtreg),wemustcarryoutasecondreshapethatspreadsthevariablesacrossthecolumns,specifyingccandyearastheivariablesandseriescodeasthejvariable.Sincethatvariablehasstringcontent,weusethestringoption..reshapewided,i(ccyear)j(seriescode)string(note:j=adjnetsavadjsavC02)Datalong-wideNumberofobs.315-180Numberofvariables5-5jvariable(2values)seriescode-(dropped)xijvariables:d-dadjnetsavdadjsavC02.ordercccountryname.tssetccyearpanelvariable:cc(stronglybalanced)timevariable:year,1960to2004Afterthistransformation,thedataarenowinshapeforxtmodeling,tabulation,orgraphics.Asillustratedhere,thereshapecommandcantransformeventhemostinconvenientdatastructureintothestructureneededforyourresearch.Itmaytakemorethanoneapplicationofreshapetogettherefromhere,butitcandothejob.