/
2SIMENGAUREanother(disjoint)setofindividualsmovebetweensomeotherrms.T 2SIMENGAUREanother(disjoint)setofindividualsmovebetweensomeotherrms.T

2SIMENGAUREanother(disjoint)setofindividualsmovebetweensomeother rms.T - PDF document

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
375 views
Uploaded On 2016-03-21

2SIMENGAUREanother(disjoint)setofindividualsmovebetweensomeother rms.T - PPT Presentation

librarylfeLoadingrequiredpackageMatrixsetseed42x1rnorm20f1sample8lengthx1replaceTRUE10f2sample8lengthx1replaceTRUE10e1sinf1002f22rnormlengthx1y25x1e1meane1 ID: 264798

library(lfe)##Loadingrequiredpackage:Matrixset.seed(42)x1rnorm(20)f1sample(8 length(x1) replace=TRUE)/10f2sample(8 length(x1) replace=TRUE)/10e1sin(f1)+0.02*f2^2+rnorm(length(x1))y2.5*x1+(e1-mean(e1))

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "2SIMENGAUREanother(disjoint)setofindivid..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Multicollinearity,identi cation,andestimablefunctionsSimenGaureAbstract.Sincethereisquitealotofconfusionhereandthereaboutwhathappenswhenfactorsarecollinear;hereisawalkthroughoftheidenti cationproblemswhichmayariseinmodelswithmanydummies,andhowlfehandlesthem.(Or,attheveryleast,attemptstohandlethem).1.ContextThelfepackageisusedforordinaryleastsquaresestimation,i.e.modelswhichconceptuallymaybeestimatedbylmas lm(y~x1+x2+...+xm+f1+f2+...+fn)wheref1,f2,...,fnarefactors.Thestandardmethodistointroduceadummyvariableforeachlevelofeachfactor.Thisistoomuchasitintroducesmulticollinearitiesinthesystem.Conceptually,thesystemmaystillbesolved,buttherearemanydi erentsolutions.Inallofthem,thedi erencebetweenthecoecientsforeachfactorwillbethesame.Theambiguityistypicallysolvedbyremovingasingledummyvariableforeachfactor,thisistermedareference.Thisislikeforcingthecoecientforthisdummyvariabletozero,andtheotherlevelsarethenseenasrelativetothiszero.Otherwaystosolvetheproblemistoforcethesumofthecoecientstobezero,oronemayenforcesomeotherconstraint,typicallyviathecontrastsargumenttolm.Thedefaultinlmistohaveareferencelevelineachfactor,andacommoninterceptterm.Inlfethesameestimationcanbeperformedby felm(y~x1+x2+...+xm|f1+f2+...+fn)Sincefelmconceptuallydoesexactlythesameaslm,thecontrastsapproachmayworktheretoo.Orrather,itisactuallynotnecessarythatfelmhandlesitatall,itisonlynecessaryifoneneedstofetchthecoecientsforthefactorlevelswithgetfe.lfeisintendedforverylargedatasets,withfactorswithmanylevels.Thentheapproachwithasingleconstraintforeachfactormaysometimesnotbesucient.Thestandardexampleintheeconometricsliterature(seee.g.[2])isthecasewithtwofactors,oneforindividuals,andonefor rmstheseindividualsworkfor,chang-ingjobsnowandthen.Whathappensinpracticeisthatthelabourmarketmaybedisconnected,sothatonesetofindividualsmovebetweenonesetof rms,and1 2SIMENGAUREanother(disjoint)setofindividualsmovebetweensomeother rms.Thishappensfornoobviousreason,andisdatadependent,notintrinsictothemodel.Theremaybeseveralsuchcomponents.I.e.therearemoremulticollinearitiesinthesystemthantheobviousones.Insuchacase,thereisnowaytocomparecoecientsfromdi erentconnectedcomponents,itisnotsucientwithasingleindividualrefer-ence.Theproblemmaybephrasedingraphtheoreticterms(seee.g.[1,3,4]),anditcanbeshownthatitissucientwithonereferencelevelineachoftheconnectedcomponents.Thisiswhatlfedoes,inthecasewithtwofactorsitidenti esthesecomponents,andforceoneleveltozeroinoneofthefactors.Intheexamplesbelow,rathersmallrandomlygenerateddatasetsareused.lfeishardlythebestsolutionfortheseproblems,theyaresolelyusedtoillustratesomeconcepts.IcanassurethereaderthatnoCPUs,sleepingpatterns,romanticrela-tionships,treesorcats,noranimalsingeneral,wereharmedduringdatacollectionandanalysis.2.Identi cationwithtwofactorsInthecasewithtwofactors,identi cationiswell-known.getfewillpartitionthedatasetintoconnectedcomponents,andintroduceareferencelevelineachcomponent: library(lfe)##Loadingrequiredpackage:Matrixset.seed(42)x1rnorm(20)f1sample(8,length(x1),replace=TRUE)/10f2sample(8,length(x1),replace=TRUE)/10e1sin(f1)+0.02*f2^2+rnorm(length(x1))y2.5*x1+(e1-mean(e1))summary(estfelm(y~x1|f1+f2))####Call:##felm(formula=y~x1|f1+f2)####Residuals:##Min1QMedian3QMax##-0.7331-0.17510.00000.11390.7331####Coefficients:##EstimateStd.Errortvalue&#x--52;倀Pr(|t|)##x11.96090.28546.8720.000998***##---##Signif.codes:0 *** 0.001 ** 0.01 * 0.05 . 0.1 1####Residualstandarderror:0.8097on5degreesoffreedom##MultipleR-squared(fullmodel):0.9849AdjustedR-squared:0.9425##MultipleR-squared(projmodel):0.9042AdjustedR-squared:0.6361##F-statistic(fullmodel):23.23on14and5DF,p-value:0.001318##F-statistic(projmodel):47.22on1and5DF,p-value:0.0009982 MULTICOLLINEARITY,IDENTIFICATION,ANDESTIMABLEFUNCTIONS3Weexaminetheestimablefunctionproducedbyefactory. efefactory(est)is.estimable(ef,est$fe)##[1]TRUEgetfe(est)##effectobscompfeidx##f1.0.10.8423045312f10.1##f1.0.20.4236657541f10.2##f1.0.30.6040985222f10.3##f1.0.40.9016683541f10.4##f1.0.50.6742599622f10.5##f1.0.61.0873761821f10.6##f1.0.7-1.1856316521f10.7##f1.0.80.3876950431f10.8##f2.0.1-2.1776245321f20.1##f2.0.20.0000000061f20.2##f2.0.30.4401316611f20.3##f2.0.4-0.9375407311f20.4##f2.0.50.0000000032f20.5##f2.0.6-0.5959834341f20.6##f2.0.7-0.1680796122f20.7##f2.0.8-0.0247890311f20.8Aswecanseefromthecompentry,therearetwocomponents,thesecondonewithf1=0.1,f1=0.3,f1=0.5,andf2=0.5andf2=0.7.Areferenceisintroducedineachofthecomponents,i.e.f2.0.2=0andf2.0.5=0.Ifwelookatthedataset,thecomponentstructurebecomesclearer: data.frame(f1,f2,comp=est$cfactor)##f1f2comp##10.30.52##20.70.21##30.60.21##40.20.61##50.80.61##60.40.21##70.40.41##80.60.31##90.20.61##100.50.52##110.40.21##120.50.72##130.40.61##140.20.21##150.80.21##160.20.81##170.30.52##180.80.11 4SIMENGAURE ##190.70.11##200.10.72Observations1,10,12,17,and20belongtocomponent2;nootherobservationhasf1%in%c(0.1,0.3,0.5)orf2%in%c(c0.5,0.7),thusitisclearthatcoef- cientsforthesecannotbecomparedtoothercoecients.lmissilentaboutthiscomponentstructure,hencecoecientsarehardtointerpret.Though,predictivepropertiesandresidualsarethesame: f1factor(f1);f2factor(f2)summary(lm(y~x1+f1+f2))####Call:##lm(formula=y~x1+f1+f2)####Residuals:##1234567##2.095e-017.331e-018.327e-17-4.393e-01-6.495e-01-5.678e-011.457e-16##891011121314##6.939e-186.029e-013.469e-178.197e-027.633e-174.859e-01-1.636e-01##151617181920##-8.366e-02-9.021e-17-2.095e-017.331e-01-7.331e-01-4.857e-16####Coefficients:(1notdefinedbecauseofsingularities)##EstimateStd.Errortvalue&#x--52;倀Pr(|t|)##(Intercept)0.67420.89300.7550.484267##x11.96090.28546.8720.000998***##f10.2-2.42821.4826-1.6380.162390##f10.3-0.23821.5798-0.1510.886042##f10.4-1.95021.6137-1.2080.280892##f10.5-0.16801.1778-0.1430.892114##f10.6-1.76451.6753-1.0530.340448##f10.7-4.03751.5625-2.5840.049189*##f10.8-2.46421.5037-1.6390.162193##f20.22.17761.02492.1250.086978.##f20.32.61781.48371.7640.137940##f20.41.24011.65080.7510.486366##f20.50.16811.32690.1270.904134##f20.61.58161.10801.4280.212781##f20.7NANANANA##f20.82.15281.38081.5590.179716##---##Signif.codes:0 *** 0.001 ** 0.01 * 0.05 . 0.1 1####Residualstandarderror:0.8097on5degreesoffreedom##MultipleR-squared:0.9849,AdjustedR-squared:0.9425##F-statistic:23.23on14and5DF,p-value:0.001318 MULTICOLLINEARITY,IDENTIFICATION,ANDESTIMABLEFUNCTIONS53.Identi cationwiththreeormorefactorsInthecasewiththreeormorefactors,thereisnogeneralintuitivetheory(yet)forhandlingidenti cationproblems.lferesortstothesimple-mindedapproachthatnon-obviousmulticollinearitiesariseamongthe rsttwofactors,andassumesitissucientwithasinglereferencelevelforeachoftheremainingfactors,i.e.thattheyinprinciplecouldbespeci edasordinarydummies.Inotherwords,theorderofthefactorsinthemodelspeci cationisimportant.Atypicalexamplewouldbe3factors;individuals, rmsandeducation: estfelm(logwage~x1+x2|id+firm+edu)getfe(est)Thiswillresultinthesamenumberofreferencesasifusingthemodel logwage~x1+x2+edu|id+firmthoughitmayrunfaster(orslower).Alternatively,onecouldspecifythemodelas logwage~x1+x2|firm+edu+idThiswouldnotaccountforapartioningofthelabourmarketalongindivid-ual/ rm,butalong rm/education,usingasinglereferencelevelfortheindividuals.Inthisexample,thereissomereasontosuspectthatitisnotsucient,dependingonhoweduisspeci ed.Thereexistsnogeneralschemethatsetsupsuitablerefer-encegroupswhentherearemorethantwofactors.Itmayhappenthatthedefaultissucient.Thefunctiongetfewillcheckwhetherthisisso,anditwillyieldawarningabout'non-estimablefunction'ifnot.Withsomeluckitmaybepossibletorearrangetheorderofthefactorstoavoidthissituation.Thereisnothingspecialwithlfeinthisrespect.Youwillmeetthesameproblemwithlm,itwillremoveareferencelevel(ordummy-variable)ineachfactor,butthesystemwillstillcontainmulticollinearities.Youmayremovereferencelevelsuntilallthemulticollinearitiesaregone,butthereisnoobviouswaytointerprettheresultingcoecients.Toillustrate,theclassicalexampleiswhenyouincludeafactorforage(inyears),afactorforobservationyear,andafactorforyearofbirth.Youpickareferenceindividual,e.g.age=50,year=2013andbirth=1963,butthisisnotsu-cienttoremoveallthemulticollinearities.Ifyouanalyzethisproblem(seee.g.[6])youwill ndthatthecoecientsareonlyidenti eduptolineartrends.Youmayforcethelineartrendbetweenbirth=1963andbirth=1990tozero,byremovingthereferencelevelbirth=1990,andthesystemwillbefreeofmulticollinearities.Inthiscasethebirthcoecientshavetheinterpretationasbeingdeviationsfromalineartrendbetween1963and1990,thoughyoudonotknowwhichlineartrend.Theageandyearcoecientsarealsorelativetothissameunknowntrend.Intheabovecase,themulticollinearityisobviouslybuiltintothemodel,anditispossibletoremoveitand ndsomeintuitiveinterpretationofthecoecients.Inthegeneralcase,wheneitherlmorgetfereportsahandfulofnon-obviousspuriousmulticollinearitesbetweenfactorswithmanylevels,youprobablywillnotbeableto ndanyreasonablewaytointerpretcoecients.Ofcourse,certainlinear 6SIMENGAUREcombinationsofcoecientswillbeunique,i.e.estimable,andthesemaybefoundbye.g.theproceduresin[5,8],butthegeneralpictureismuddy.lfedoesnotprovideasolutiontothisproblem,however,getfewillstillprovideavectorofcoecientswhichresultsfrom ndinganon-uniquesolutiontoacertainsetofequations.Togetanysensefromthis,anestimablefunctionmustbeapplied.Thesimplestoneistopickareferenceforeachfactorandsubtractthiscoecientfromeachoftheothercoecientsinthesamefactor,andaddittoacommonintercept,howeverinthecasethisdoesnotresultinanestimablefunction,youareoutofluck.Ifyouforsomereasonbelievethatyouknowofanestimablefunction,youmayprovidethistogetfeviatheef-argument.Thereisanexampleinthegetfedocumentation.Youmayalsotestitforestimabilitywiththefunctionis.estimable,thisisaprobabilistictestwhichalmostneverfails(see[4,Remark6.2]).4.SpecifyinganestimablefunctionAmodelofthetype y~x1+x2+f1+f2+f3maybewritteninmatrixnotationas(1)y=X +D +;whereXisamatrixwithcolumnsx1andx2andDismatrixofdummiescon-structedfromthelevelsofthefactorsf1,f2,f3.Formally,anestimablefunctioninourcontextisamatrixoperatorwhoserowspaceiscontainedintherowspaceofD.Thatis,anestimablefunctionmaybewrittenasamatrix.Likethecontrastsargumenttolm.However,thelfepackageusesanR-functioninstead.Thatis,felmiscalled rst,itusestheFrisch-Waugh-LovelltheoremtoprojectouttheD termfrom(1)(see[4,Remark3.2]): estfelm(y~x1+x2|f1+f2+f3)Thisyieldstheparametersforx1andx2,i.e.^ .To nd^ ,theparametersforthelevelsoff1,f2,f3,getfesolvesacertainlinearsystem(see[4,eq.(14)]):(2)D =wherethevectorcanbecomputedwhenwehave^ .Thisdoesnotidentify uniquely,wehavetoapplyanestimablefunctionto .TheestimablefunctionFischaracterizedbythepropertythatF 1=F 2whenever 1and 2aresolutionstoequation(2).RatherthancodingFasamatrix,lfecodesitasafunction.Itisofcoursepossibletoletthefunctionapplyamatrix,sothisisnotamaterialdistinction.So,let'slookatanexampleofhowanestimablefunctionmaybemade: library(lfe)x1rnorm(100)f1sample(7,100,replace=TRUE)f2sample(8,100,replace=TRUE)/8f3sample(10,100,replace=TRUE)/10e1sin(f1)+0.02*f2^2+0.17*f3^3+rnorm(100)y2.5*x1+(e1-mean(e1)) MULTICOLLINEARITY,IDENTIFICATION,ANDESTIMABLEFUNCTIONS7 summary(estfelm(y~x1|f1+f2+f3))####Call:##felm(formula=y~x1|f1+f2+f3)####Residuals:##Min1QMedian3QMax##-2.18822-0.552220.092780.628582.31181####Coefficients:##EstimateStd.Errortvalue&#x--52;倀Pr(|t|)##x12.55380.102624.88***##---##Signif.codes:0 *** 0.001 ** 0.01 * 0.05 . 0.1 1####Residualstandarderror:0.9963on76degreesoffreedom##MultipleR-squared(fullmodel):0.9086AdjustedR-squared:0.8809##MultipleR-squared(projmodel):0.8907AdjustedR-squared:0.8576##F-statistic(fullmodel):32.84on23and76DF,p-value:2.2e-16##F-statistic(projmodel):619.2on1and76DF,p-value:2.2e-16##***Standarderrorsmaybetoohighduetomorethan2groupsandexactDOF=FALSEInthiscase,with3factorswecannotbecertainthatitissucientwithasinglereferenceintwoofthefactors,butwetryitasanexercise.(lfedoesnotincludeanintercept,itissubsumedinoneofthefactors,soitshouldtentativelybesucientwithareferenceforthetwoothers).Theinputtoourestimablefunctionisasolution ofequation(2).Thear-gumentaddnamesisalogical,settoTRUEwhenthefunctionshouldaddnamestotheresultingvector.Thecoecientsisorderedthesamewayasthelevelsinthefactors.Weshouldpickasinglereferenceinfactorsf2,f3,subtractthese,andaddthesumtothe rstfactor: effunction(gamma,addnames)fref2gamma[[8]]ref3gamma[[16]]gamma[1:7]gamma[1:7]+ref2+ref3gamma[8:15]gamma[8:15]-ref2gamma[16:25]gamma[16:25]-ref3if(addnames)fnames(gamma)c(paste( f1 ,1:7,sep= . ),paste( f2 ,1:8,sep= . ),paste( f3 ,1:10,sep= . ))ggammagis.estimable(ef,fe=est$fe)##[1]TRUEgetfe(est,ef=ef) 8SIMENGAURE ##effect##f1.1-0.013634682##f1.20.727611420##f1.3-0.521386749##f1.4-0.646496809##f1.5-1.568204155##f1.6-0.151511048##f1.70.286980841##f2.10.000000000##f2.2-0.289569658##f2.30.168627982##f2.4-0.658310494##f2.50.253613291##f2.60.427094488##f2.7-0.249330433##f2.8-0.772323808##f3.10.000000000##f3.2-0.004888500##f3.3-0.205494033##f3.40.449689498##f3.50.729926376##f3.60.697845803##f3.70.569065140##f3.80.583417051##f3.90.113820998##f3.100.005328265Wemaycomparethistothedefaultestimablefunction,whichpicksareferenceineachconnectedcomponentasde nedbythetwo rstfactors. getfe(est)##effectobscompfeidx##f1.1-0.74124610161f11##f1.20.00000000191f12##f1.3-1.24899817151f13##f1.4-1.37410823121f14##f1.5-2.29581558101f15##f1.6-0.87912247161f16##f1.7-0.44063058121f17##f2.0.1251.29667656111f20.125##f2.0.251.00710691151f20.25##f2.0.3751.46530454141f20.375##f2.0.50.63836607111f20.5##f2.0.6251.55028985121f20.625##f2.0.751.72377105121f20.75##f2.0.8751.04734613141f20.875##f2.10.52435275111f21##f3.0.1-0.5690651482f30.1##f3.0.2-0.57395364112f30.2 MULTICOLLINEARITY,IDENTIFICATION,ANDESTIMABLEFUNCTIONS9 ##f3.0.3-0.77455917102f30.3##f3.0.4-0.1193756472f30.4##f3.0.50.16086124112f30.5##f3.0.60.1287806672f30.6##f3.0.70.00000000142f30.7##f3.0.80.01435191132f30.8##f3.0.9-0.4552441552f30.9##f3.1-0.56373688142f31Weseethatthedefaulthassomemoreinformation.Itusesthelevelnames,andsomemoreinformation,addedlikethis: efactory(est)##function(v,addnames)##{##esumsum(v[extrarefs])##dfv[refsubs]##subifelse(is.na(df),0,df)##dfv[refsuba]##addifelse(is.na(df),0,df+esum)##vv-sub+add##if(addnames){##names(v)nm##attr(v,"extra")list(obs=obs,comp=comp,fe=fef,##idx=idx)##}##v##}## yte; ode;&#x:-52;倀0x556a0d0de320##nvi;&#xronm;nt:;&#x-525;0x556a0cd58200I.e.whenaskedtoprovidelevelnames,itisalsopossibletoaddadditionalinformationasalist(ordata.frame)asanattribute'extra'.Thevectorsextrarefs,refsubs,refsubaetc.areprecomputedbyefactoryforspeede-ciency.Hereistheaboveexample,butwecreateaninterceptinstead,anddon'treportthezero-coecients,sothatitcloselyresemblestheoutputfromlm f1factor(f1);f2factor(f2);f3factor(f3)effunction(gamma,addnames)fref1gamma[[1]]ref2gamma[[8]]ref3gamma[[16]]#puttheinterceptinthefirstcoordinategamma[[1]]ref1+ref2+ref3gamma[2:7]gamma[2:7]-ref1gamma[8:14]gamma[9:15]-ref2gamma[15:23]gamma[17:25]-ref3length(gamma)23if(addnames)f 10SIMENGAURE names(gamma)c( (Intercept) ,paste( f1 ,levels(f1)[2:7],sep= ),paste( f2 ,levels(f2)[2:8],sep= ),paste( f3 ,levels(f3)[2:10],sep= ))ggammaggetfe(est,ef=ef,bN=1000,se=TRUE)##effectse##(Intercept)-0.0136346820.5114435##f120.7412461020.3160189##f13-0.5077520650.3373377##f14-0.6328621260.3572938##f15-1.5545694720.3811126##f16-0.1378763680.3402928##f170.3006155240.3485067##f20.25-0.2895696570.4047196##f20.3750.1686279820.3873691##f20.5-0.6583104960.4563872##f20.6250.2536132890.4170575##f20.750.4270944890.4318387##f20.875-0.2493304340.4006819##f21-0.7723238080.4004357##f30.2-0.0048885010.4376401##f30.3-0.2054940350.4464411##f30.40.4496894990.4995956##f30.50.7299263750.4339349##f30.60.6978458040.4643069##f30.70.5690651410.4218616##f30.80.5834170500.4345362##f30.90.1138209920.5315648##f310.0053282650.4159621#comparewithlmsummary(lm(y~x1+f1+f2+f3))####Call:##lm(formula=y~x1+f1+f2+f3)####Residuals:##Min1QMedian3QMax##-2.18822-0.552220.092780.628582.31181####Coefficients:##EstimateStd.Errortvalue&#x--52;倀Pr(|t|)##(Intercept)-0.0136350.579452-0.0240.981289##x12.5537810.10262724.8842e-16***##f120.7412460.3855221.9230.058264.##f13-0.5077520.393773-1.2890.201151 MULTICOLLINEARITY,IDENTIFICATION,ANDESTIMABLEFUNCTIONS11 ##f14-0.6328620.420832-1.5040.136768##f15-1.5545690.444675-3.4960.000792***##f16-0.1378760.387709-0.3560.723111##f170.3006160.4170710.7210.473257##f20.25-0.2895700.455406-0.6360.526785##f20.3750.1686280.4572980.3690.713340##f20.5-0.6583100.516023-1.2760.205934##f20.6250.2536130.4756300.5330.595440##f20.750.4270940.5036380.8480.399090##f20.875-0.2493300.458179-0.5440.587913##f21-0.7723240.460361-1.6780.097524.##f30.2-0.0048890.491509-0.0100.992091##f30.3-0.2054940.510660-0.4020.688513##f30.40.4496890.5674680.7920.430565##f30.50.7299260.5045711.4470.152113##f30.60.6978460.5462661.2770.205320##f30.70.5690650.4668831.2190.226667##f30.80.5834170.4739721.2310.222152##f30.90.1138210.5846930.1950.846172##f310.0053280.4672700.0110.990932##---##Signif.codes:0 *** 0.001 ** 0.01 * 0.05 . 0.1 1####Residualstandarderror:0.9963on76degreesoffreedom##MultipleR-squared:0.9086,AdjustedR-squared:0.8809##F-statistic:32.84on23and76DF,p-value:2.2e-165.Non-estimabilityWeconsideranotherexample.Toensurespuriousrelationstherearealmostasmanyfactorlevelsasthereareobservations,anditwillbehardto ndenoughestimablefunctiontointerpretallthecoecients.Thecoecientforx1isstillestimated,butwithalargestandarderror.Notethatthisisanillustrationofnon-obviousnon-estimabilitywhichmayoccurinmuchlargerdatasets,theauthordoesnotendorseusingthiskindofmodelforthekindofdatayou ndbelow. set.seed(55)x1rnorm(25)f1sample(9,length(x1),replace=TRUE)f2sample(8,length(x1),replace=TRUE)f3sample(8,length(x1),replace=TRUE)e1sin(f1)+0.02*f2^2+0.17*f3^3+rnorm(length(x1))y2.5*x1+(e1-mean(e1))summary(estfelm(y~x1|f1+f2+f3))####Call:##felm(formula=y~x1|f1+f2+f3)## 12SIMENGAURE ##Residuals:##Min1QMedian3QMax##-0.43725-0.099460.000000.050470.38973####Coefficients:##EstimateStd.Errortvalue�Pr(|t|)##x10.97351.31110.7430.593####Residualstandarderror:1.146on1degreesoffreedom##MultipleR-squared(fullmodel):0.9999AdjustedR-squared:0.9977##MultipleR-squared(projmodel):0.3554AdjustedR-squared:-14.47##F-statistic(fullmodel):447.3on23and1DF,p-value:0.03731##F-statistic(projmodel):0.5514on1and1DF,p-value:0.5934##***Standarderrorsmaybetoohighduetomorethan2groupsandexactDOF=FALSEThedefaultestimablefunctionfails,andthecoecientsfromgetfearenotuseable.getfeyieldsawarninginthiscase. efefactory(est)is.estimable(ef,est$fe)##Warninginis.estimable(ef,est$fe):non-estimablefunction,largesterror2e-04incoordinate4("f1.4")##[1]FALSEIndeed,therank-de ciencyislargerthanexpected.Therearemorespuriousrelationsbetweenthefactorsthanwhatcanbeaccountedforbylookingatcom-ponentsinthetwo rstfactors.Inthislow-dimensionalexamplewemay ndthematrixDofequation(2),andits(column)rankde ciencyislargerthan2. f1factor(f1);f2factor(f2);f3factor(f3)DmakeDmatrix(list(f1,f2,f3))dim(D)##[1]2525ncol(D)-as.integer(rankMatrix(D))##[1]3Alternativelywecanuseaninternalfunctioninlfefor ndingtherankde -ciencydirectly. lfe:::rankDefic(list(f1,f2,f3))##[1]3Thisrank-de ciencyalsohasanimpactonthestandarderrorscomputedbyfelm.Iftherank-de ciencyissmallrelativetothedegreesoffreedomthestandarderrorsarescaledslightlyupwardsifweignoretherankde ciency,butifitislarge,theimpactonthestandarderrorscanbesubstantial.Theabovementionedrank-computationprocedurecanbeactivatedbyspecifyingexactDOF=TRUEinthecalltofelm,butitmaybetime-consumingifthefactorshavemanylevels.Computingtherankdoesnotinitselfhelpus ndestimablefunctionsforgetfe. MULTICOLLINEARITY,IDENTIFICATION,ANDESTIMABLEFUNCTIONS13 summary(estfelm(y~x1|f1+f2+f3,exactDOF=TRUE))####Call:##felm(formula=y~x1|f1+f2+f3,exactDOF=TRUE)####Residuals:##Min1QMedian3QMax##-0.43725-0.099460.000000.050470.38973####Coefficients:##EstimateStd.Errortvalue&#x--52;倀Pr(|t|)##x10.97350.92711.050.404####Residualstandarderror:0.8105on2degreesoffreedom##MultipleR-squared(fullmodel):0.9999AdjustedR-squared:0.9988##MultipleR-squared(projmodel):0.3554AdjustedR-squared:-6.735##F-statistic(fullmodel):935.2on22and2DF,p-value:0.001069##F-statistic(projmodel):1.103on1and2DF,p-value:0.4038Wecangetanideawhathappensifwekeepthedummiesforf3.Inthiscase,with2factors,lfewillpartitionthedatasetintoconnectedcomponentsandaccountforallthemulticollinearitiesamongthefactorsf1andf2justasabove,butthisisnotsucient.Theinterpretationoftheresultingcoecientsisnotstraightforward. summary(estfelm(y~x1+f3|f1+f2,exactDOF=TRUE))##Warninginchol.default(mat,pivot=TRUE,tol=tol):thematrixiseitherrank-deficientorindefinite##Warninginchol.default(mat,pivot=TRUE,tol=tol):thematrixiseitherrank-deficientorindefinite####Call:##felm(formula=y~x1+f3|f1+f2,exactDOF=TRUE)####Residuals:##Min1QMedian3QMax##-0.43725-0.099460.000000.050470.38973####Coefficients:##EstimateStd.Errortvalue&#x--52;倀Pr(|t|)##x10.97350.92711.0500.403842##f320.43171.03940.4150.718239##f335.16961.13584.5520.045034*##f349.82952.17564.5180.045659*##f3519.07911.349314.1400.004964**##f3634.71342.341414.8260.004519**##f3755.07271.419738.7910.000664***##f38NANANANA 14SIMENGAURE ##---##Signif.codes:0 *** 0.001 ** 0.01 * 0.05 . 0.1 1####Residualstandarderror:0.8105on2degreesoffreedom##MultipleR-squared(fullmodel):0.9999AdjustedR-squared:0.9988##MultipleR-squared(projmodel):0.9994AdjustedR-squared:0.9929##F-statistic(fullmodel):935.2on22and2DF,p-value:0.001069##F-statistic(projmodel):425on8and2DF,p-value:0.002349getfe(est)##effectobscompfeidx##f1.1-24.118434751f11##f1.2-25.631718141f12##f1.3-24.556762131f13##f1.455.036522611f14##f1.5-27.544522621f15##f1.6-22.773403421f16##f1.7-24.357051821f17##f1.8-24.688484931f18##f1.9-26.335563031f19##f2.1-0.270164421f21##f2.2-0.503904631f22##f2.33.665652411f23##f2.4-4.085860511f24##f2.5-1.329232841f25##f2.60.000000061f26##f2.71.065861061f27##f2.83.378635321f28Inthisparticularexample,wemayuseadi erentorderofthefactors,andweseethatbypartitioningthedatasetonthefactorsf1,f3insteadoff1,f2,thereare2connectedcomponents(thefactorf2getsitsowncomp-code,butthisisnotagraphtheoreticcomponentnumber,itmerelyindicatesthatthereisaseparatereferenceamongthese). summary(estfelm(y~x1|f1+f3+f2,exactDOF=TRUE))####Call:##felm(formula=y~x1|f1+f3+f2,exactDOF=TRUE)####Residuals:##Min1QMedian3QMax##-0.43725-0.099460.000000.050470.38973####Coefficients:##EstimateStd.Errortvalue&#x--52;倀Pr(|t|)##x10.97350.92711.050.404####Residualstandarderror:0.8105on2degreesoffreedom MULTICOLLINEARITY,IDENTIFICATION,ANDESTIMABLEFUNCTIONS15 ##MultipleR-squared(fullmodel):0.9999AdjustedR-squared:0.9988##MultipleR-squared(projmodel):0.3554AdjustedR-squared:-6.735##F-statistic(fullmodel):935.2on22and2DF,p-value:0.001069##F-statistic(projmodel):1.103on1and2DF,p-value:0.4038is.estimable(efactory(est),est$fe)##[1]TRUEgetfe(est)##effectobscompfeidx##f1.10.000000051f11##f1.2-1.513283341f12##f1.3-0.438327331f13##f1.40.000000012f14##f1.5-3.426087721f15##f1.61.345031321f16##f1.7-0.238617121f17##f1.8-0.570050331f18##f1.9-2.217128331f19##f3.1-24.118434731f31##f3.2-23.686784021f32##f3.3-18.948811341f33##f3.4-14.288971911f34##f3.5-5.039326551f35##f3.610.594981141f36##f3.730.954267451f37##f3.855.036522612f38##f2.1-0.270164323f21##f2.2-0.503904533f22##f2.33.665652413f23##f2.4-4.085860613f24##f2.5-1.329232843f25##f2.60.000000063f26##f2.71.065861163f27##f2.83.378635523f28Belowisthesameestimationinlm.Weseethatthecoecientforx1isidenticaltotheonefromfelm,butthereisnoobviousrelationbetweene.g.thecoecientsforf1;thedi erencef14-f15isnotthesameforlmandfelm.Sincetheseareindi erentcomponents,theyarenotcomparable.Butofcourse,ifwecompareinthesamecomponent,e.g.f16-f17ortakeacombinationwhichactuallyoccursinthedataset,itisunique(estimable): data.frame(f1,f2,f3)[1,]##f1f2f3##1263I.e.ifweaddthecoecientsf1.2+f2.6+f3.3andincludetheinterceptforlm,wewillgetthesamenumberforbothlmandfelm.Thatis,forpredictingtheactualdataset,estimabilityplaysnorole,weobtainthesameresidualsanyway.Itisonlyforpredictingoutsideofthedatasetestimabilityisimportant. 16SIMENGAURE summary(estlm(y~x1+f1+f2+f3))####Call:##lm(formula=y~x1+f1+f2+f3)####Residuals:##1234567##3.883e-01-2.873e-01-4.899e-021.485e-013.378e-011.388e-17-5.047e-02##891011121314##5.047e-02-4.372e-013.883e-01-3.407e-01-5.047e-02-9.714e-173.393e-01##15161718192021##4.163e-17-4.163e-174.899e-02-1.485e-01-4.163e-173.897e-01-4.899e-02##22232425##-2.398e-01-3.393e-01-4.163e-17-9.946e-02####Coefficients:(1notdefinedbecauseofsingularities)##EstimateStd.Errortvalue&#x--52;倀Pr(|t|)##(Intercept)-24.38861.1202-21.7720.002103**##x10.97350.92711.0500.403842##f12-1.51331.2712-1.1900.356003##f13-0.43831.0229-0.4290.710016##f1479.15502.256935.0730.000812***##f15-3.42611.2614-2.7160.113027##f161.34502.88790.4660.687194##f17-0.23860.9916-0.2410.832255##f18-0.57012.0710-0.2750.808947##f19-2.21711.1201-1.9790.186330##f22-0.23372.2869-0.1020.927917##f233.93582.70711.4540.283177##f24-3.81573.1342-1.2170.347584##f25-1.05911.2320-0.8600.480585##f260.27020.97010.2780.806791##f271.33601.11191.2020.352520##f283.64881.49172.4460.134276##f320.43171.03940.4150.718239##f335.16961.13584.5520.045034*##f349.82952.17564.5180.045659*##f3519.07911.349314.1400.004964**##f3634.71342.341414.8260.004519**##f3755.07271.419738.7910.000664***##f38NANANANA##---##Signif.codes:0 *** 0.001 ** 0.01 * 0.05 . 0.1 1####Residualstandarderror:0.8105on2degreesoffreedom##MultipleR-squared:0.9999,AdjustedR-squared:0.9988##F-statistic:935.2on22and2DF,p-value:0.001069 MULTICOLLINEARITY,IDENTIFICATION,ANDESTIMABLEFUNCTIONS176.Weeks-WilliamspartitionsThereisapartialsolutiontothenon-estimabilityproblemin[8].Theirideaistopartitionthedatasetintocomponentsinwhichalldi erencesbetweenfactorlevelsareestimable.Thecomponentsareconnectedcomponentsofasubgraphofane-dimensionalgridgraphwhereeisthenumberoffactors.Thatis,agraphisconstructedwiththeobservationsasvertices,twoobservationsareadjacent(inagraphtheoreticsense)iftheydi erinatmostoneofthefactors.Thedatasetisthenpartitionedinto(graphtheoretic)connectedcomponents.It'sa nerpartitioningthantheabove,andconsequentlyintroducesmorereferencelevelsthanisnecessaryforidenti cation.I.e.itdoesnot ndallestimablefunctions,butinsomecases(e.g.in[7])thelargestcomponentwillbesucientlylargeforproperanalysis.Itisofcoursealwaysaquestionwhethersuchanendogenousselectionofobservationswillyieldadatasetwhichresultsinunbiasedcoecients.ThispartitioningcanbedonebythecompfactorfunctionwithargumentWW=TRUE: felist(f1,f2,f3)wwcompcompfactor(fe,WW=TRUE)Ithasmorelevelsthantherankde ciency lfe:::rankDefic(fe)##[1]3nlevels(wwcomp)##[1]17andeachofitscomponentsarecontainedinacomponentofthepreviouslyconsideredcomponents,nomatterwhichtwofactorsweconsider.Forthecaseoftwofactors,theconceptscoincide. nlevels(interaction(compfactor(fe),wwcomp))##[1]17#pickthelargestcomponent:wwdatadata.frame(y,x1,f1,f2,f3)[wwcomp==1,]print(wwdata)##yx1f1f2f3##228.45513-1.812376850277##330.614520.151582984367##532.359770.001908206177##1131.19345-0.048910950377##1434.32095-0.360763148187##2012.579600.993657777376Thatis,wecanstartinoneoftheobservationsandtravelthroughallofthembychangingjustoneoff1,f2,f3atatime.Though,inthisparticularexample,therearemoreparametersthanthereareobservations,soanestimationwouldnotbefeasible.efactorycannoteasilybemodi edtoproduceanestimablefunctioncorre-spondingtoWWcomponents.Thereasonisthatefactory,andthelogicingetfe,workonpartitionsoffactorlevels,notonpartitionsofthedataset,thesearethesameforthetwo-factorcase. 18SIMENGAUREWWpartitionshavethepropertythatifyoupickanytwoofthefactorsandpartitionaWW-componentintothepreviouslymentionednon-WWpartitions,therewillbeonlyonecomponent,henceyoumayuseanyoftheestimablefunctionsfromefactoryoneachpartition.Thatis,awaytouseWWpartitionswithlfeistodothewholeanalysisonthelargestWW-component.felmmaystillbeusedonthewholedataset,anditmayyielddi erentresultsthanwhatyougetbyanalysingthelargestWW-component.Hereisalargerexample: set.seed(135)xrnorm(10000)f1sample(1000,length(x),replace=TRUE)f2(f1+sample(18,length(x),replace=TRUE))%%500f3(f2+sample(9,length(x),replace=TRUE))%%500yx+1e-4*f1+sin(f2^2)+cos(f3)^3+0.5*rnorm(length(x))datasetdata.frame(y,x,f1,f2,f3)summary(estfelm(y~x|f1+f2+f3,data=dataset,exactDOF=TRUE))####Call:##felm(formula=y~x|f1+f2+f3,data=dataset,exactDOF=TRUE)####Residuals:##Min1QMedian3QMax##-1.63055-0.29857-0.002360.305991.79423####Coefficients:##EstimateStd.Errortvalue&#x--52;倀Pr(|t|)##x0.9985520.005548180***##---##Signif.codes:0 *** 0.001 ** 0.01 * 0.05 . 0.1 1####Residualstandarderror:0.4957on8001degreesoffreedom##MultipleR-squared(fullmodel):0.9058AdjustedR-squared:0.8822##MultipleR-squared(projmodel):0.8019AdjustedR-squared:0.7524##F-statistic(fullmodel):38.49on1998and8001DF,p-value:2.2e-16##F-statistic(projmodel):3.239e+04on1and8001DF,p-value:2.2e-16Wecountthenumberofconnectedcomponentsinf1,f2,andseethatthisissucienttoensureestimability nlevels(est$cfactor)##[1]1is.estimable(efactory(est),est$fe)##[1]TRUEnrow(alphagetfe(est))##[1]2000 MULTICOLLINEARITY,IDENTIFICATION,ANDESTIMABLEFUNCTIONS19Ithasrankde ciencyonelessthanthenumberoffactors: lfe:::rankDefic(est$fe)##[1]2ThenweanalysethelargestWW-component wwcompcompfactor(est$fe,WW=TRUE)nlevels(wwcomp)##[1]933wwsetwwcomp==1sum(wwset)##[1]3129summary(wwestfelm(y~x|f1+f2+f3,data=dataset,subset=wwset,exactDOF=TRUE))####Call:##felm(formula=y~x|f1+f2+f3,data=dataset,exactDOF=TRUE,subset=wwset)####Residuals:##Min1QMedian3QMax##-1.3765-0.27840.00000.27911.5951####Coefficients:##EstimateStd.Errortvalue&#x--52;倀Pr(|t|)##x0.9943900.009889100.6***##---##Signif.codes:0 *** 0.001 ** 0.01 * 0.05 . 0.1 1####Residualstandarderror:0.4858on2314degreesoffreedom##MultipleR-squared(fullmodel):0.9182AdjustedR-squared:0.8894##MultipleR-squared(projmodel):0.8138AdjustedR-squared:0.7483##F-statistic(fullmodel):31.91on814and2314DF,p-value:2.2e-16##F-statistic(projmodel):1.011e+04on1and2314DF,p-value:2.2e-16Weseethatwegetthesamecoecientforxinthiscase.Thisisnotsurprising,thereisnoobviousreasontobelievethatourselectionofobservationsisskewedinthisrandomlycreateddataset.Thisonehasthesamerankde ciency: lfe:::rankDefic(wwest$fe)##[1]2butasmallernumberofidenti ablecoecients. nrow(wwalphagetfe(wwest))##[1]816Wemaycomparee ectswhicharecommontothetwomethods: 20SIMENGAURE head(wwalpha)##effectobscompfeidx##f1.351.932424111f135##f1.380.804965531f138##f1.400.239241331f140##f1.411.089662421f141##f1.420.642877141f142##f1.431.426841141f143alpha[c(35,38,40:43),]##effectobscompfeidx##f1.350.9581561101f135##f1.380.636739091f138##f1.400.8802633121f140##f1.410.8586244131f141##f1.420.8983646131f142##f1.431.2634717121f143butthereisnoobviousrelationbetweene.g.f1.35-f1.36,theyareverydi erentinthetwoestimations.Thecoecientsarefromdi erentdatasets,andthestandarderrorsarelarge(0:7)withthisfewobservationsforeachfactorlevel.Thenumberofidenti edcoecientsforeachfactorvaries(these gurescontainthetworeferences): table(wwalpha[, fe ])####f1f2f3##417198201References[1]J.M.Abowd,R.H.Creecy,andF.Kramarz,ComputingPersonandFirmE ectsUsingLinkedLongitudinalEmployer-EmployeeData,Tech.ReportTP-2002-06,U.S.CensusBureau,2002.[2]J.M.Abowd,F.Kramarz,andD.N.Margolis,HighWageWorkersandHighWageFirms,Econometrica67(1999),no.2,251{333.[3]J.A.EcclestonandA.Hedayat,OntheTheoryofConnectedDesigns:CharacterizationandOptimality,Ann.Statist.2(1974),1238{1255.[4]S.Gaure,OLSwithMultipleHighDimensionalCategoryVariables,ComputationalStatisticsandDataAnalysis66(2013),8{18.[5]J.D.GodolphinandE.J.Godolphin,Ontheconnectivityofrow-columndesigns,Util.Math.60(2001),51{65.[6]L.L.Kupper,J.M.Janis,I.A.Salama,C.N.Yoshizawa,andB.G.Greenberg,Age-Period-CohortAnalysis:AnIllustrationoftheProblemsinAssessingInteractioninOneObservationPerCellData,Commun.Statist.-Theor.Meth.12(1983),no.23,2779{2807.[7]S.M.Torres,P.Portugal,J.T.Addison,andP.Guimar~aes,TheSourcesofWageVariation:AThree-WayHigh-DimensionalFixedE ectsRegressionModel.,IZADiscussionPaper7276,InstitutefortheStudyofLabor(IZA),March2013.[8]D.L.WeeksandD.R.Williams,ANoteontheDeterminationofConnectednessinanN-WayCrossClassi cation,Technometrics6(1964),no.3,319{324.RagnarFrischCentreforEconomicResearch,Oslo,Norway