/
R1:ltu:Nu'IsAmemiya. 'I'., ''Multivariate Regrcsion and Simultaneous L R1:ltu:Nu'IsAmemiya. 'I'., ''Multivariate Regrcsion and Simultaneous L

R1:ltu:Nu'IsAmemiya. 'I'., ''Multivariate Regrcsion and Simultaneous L - PDF document

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
369 views
Uploaded On 2015-11-20

R1:ltu:Nu'IsAmemiya. 'I'., ''Multivariate Regrcsion and Simultaneous L - PPT Presentation

5083APPENDIX AI u227Estimates of the covarince structureare obtained from the interequation residual correlationbetween the residuals froni equation16 and the wage function I5b Note that ID: 200110

083APPENDIX AI (u22)".7Estimates the covari;:nce

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "R1:ltu:Nu'IsAmemiya. 'I'., ''Multivariat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

R1:ltu:Nu'IsAmemiya. 'I'., ''Multivariate Regrcsion and Simultaneous Lquation Models When thel)cpendentVariables are Truncated Normal," Econo,netrica 42 No. 6, November 1974: 999-101 I.--, "Regression Analysis When the I)epcndent Variable is Truncated Normal," !:coflo,net,jca4l,No.6: 997-1017.Cragg, J., "Some Statictical Models for Limited Dependent Variables with Application to the L)emandfor Durable Coods,' Econoutetrica 39, No. 5, September, 1971: 829-844.Cramer, H., Mathematical Methods in Statistics. l'rince ton, 1946.Feller, W., An Iniroducrio,i to Probability Theory and Its Applications, Vol. 2. Wiley, New York, I 971.Goldberger, A., "Linear Regression in Truncated Samples," Social Systems Research Institute,University of Wisconsin-Madison, May 23, 1975.(lronau, Reuben, "The Effect of Children on the Housewife's Value of Time," Journal of PoliticalEconomy, March/April, 1973.-,"WageComparisonsASelectivityBias,'JouriialofPoliticalEconomyNovernber/Deccniber, 1974.1-laberman, Shelby, The Analysis of Frequency Data. University of Chicago Press, 1974.Heckman, James J., "Shadow Prices, Market Wages and Labor Supply," Econonietrica, July 1974."Sample Selection Bias as a Specification Error," Unpublished Manuscript, Rand Corpora-tion, April 1976.Johnson, N. and S. Kotz, Distributions in Statistics: ('onfinuous Multit'ariate Distributions. Wiley, NewYork, 1972.Lewis, H. Gregg, "Comments on Selectivity Biases in Wage Comparisons," Journal of PoliticalEconomy, November/December 1974.Unionism and Relative Wages. University of Chicago Press, 1963.Nelson, Forrest, "Censored Regression Models with Unobserved Stochastic Censoring Thresholds,"National Bureau of Economic Research Working Paper No. 63, 1975.Rothcnbcrg, T. and C. Leenders," Efficient Estimation of Simultaneous Equation Systems,"Econometrica 32, No. 1, January 1964: 57-76.Shea, etal., Dual Careers: A l.ongitudina! Study of Labor Market Experience. Columbus, Ohio: Centerfor Human Resource Research, Ohio State University, May 1970.'Fobin, J., "Estimation of Relationships for Limited Dependent Variables," Econwnetrica 26, 1958:24-36.492SL 5,083APPENDIX AI (u22)".7Estimates of the covari;:nce structureare obtained from the inter-equation residual correlationbetween the residuals froni equation(16) and the wage function (I5b). Note that the estimate oftaken from the regression coefficient ofequation (16) is 53.Ol.An example ot the potential incost saving may be useful,It cost $700 to produceestimates of the likelihood functionreported in Table 3 and $15 to produce the initial consistentestimates and the GLS estimates.49() OnTABLE 3MAXIMUM LIKELII-ioot) ESTIMATES ANt) INITIALCONSISTENT ESTIMATES OFTIlEHECKMAN (1974) MODEIAnnual Hours("t" statistics in parentheses)Log Likelihood-5,778Log Likelihood under-5,783null hypothesis of noselection biasInitialLikelihoodEstimates inConsistentFirst StepOptimumOriginal PaperEstimatesIterateNatural LogarithmMarket Wage Equation Yie(Coefficients of i3)Intercept-0.412-0.982-0.435-0.593(5.28)(8.93)(8.70)Education0.06790.076!0.0686(1.0688(13.58)(10.15)(17.20)Experience0.02000.0480.02050.025(10.001(12.00)(1.14)Natural Logarithm ofReservation Wage Y3(Coefficients of f33)Intercept-0.1191(1.77)Effect of hours on0.152x ioreservation wage(796)(1)Husband's wage0.00946(2.49)Wife's education0.0574(10.44)Assets0.185x 10(3.14)Nbr. children 60.114(6.48)Std. Deviation in0.329Mkt. Wage Equation(32.90)Std. Deviation in0.363Reservation Wage Eq.(24.20)(133Interequation Correl.0.725(11.69)0.051(7.29)0.0534(7.63)0.135x io-(2.45)(52.63)0.452(37.36)16 Eitherweighted or unweighted estimators may be used, and as discussed in Section II, a moreeffictent estimator exploits the information that the regression coefficient is thesquare root of thepopulation variance.-0.623(32.28)0.63x io-(12.60)0.532(28.00)-0.103-0.0964(2.10)0.9xl00.l4SxI()(1383.18)0.004180.0238(4.76)0.0610.0548(13.70)0.1702x 1060.285x 10(0.41)0.1150.116(7.25)0.3200.253(23.00)0.3510.259(26.16)0.65410.3.53x i00.3 17(14.22)(4.23)-6,414-6,102 R2Rhood functionA,(o2)2/y(16)hI=2(_1A1)+X,i=1,...,I,,f32X2,/l/2j22 ,y)The labor supply equationis just identified because the onlyvariable that appears in the wagenotin the reservation wageequation is labor market experience. Hence thebetween estimating equationsand (15e') is immaterial.15The estimateis obtained by dividing thecoefficient for experience iii the femalewageequation (0.0203 in the GLSestimates)into the coefficient for experiencein the hours of workequation (-79). The resultingestimate ILEAST SQUARES SYSTEM FOR ANNUAL HOURS WORKED("t" STA11STICS IN PARNTIIESES)�8.39. i'1,08x 10(2.6) J-fiX2i/lo2) Y11h1PIlL EMPIRICAL PERFORMANCE OF THE ESTIMATORI 2 A derivationof the sample likelihood function for this model is provided in Heckman (1974). Ilor specifierty, Jet Y, be the market wage that woman icould earnwere sheto work. Y2, is the difference between market vagesh and reservation WagesY. I louis of work arc piooi tional to the diiieiciice betweenniarkc wagesandreservation wages when this difference is positive with the factorofproportionality denoted by I /y. This proportionality factor lutist be positiveif the modelis toaccord with economic theory.In this model, the parameters of the functions determiningY and Y1are ofdirect interest. The equation for reservationwagesmay be written as(ISa)(1 5b)hence(!5c)where= (X1X31)andU, =- U.The hours of work equation is given byyor, in reduced form.=--(XIpI-x11$3)+!(u11_U11)VVU31 and U11 are assumedto he joint normal variates withzero mean and thecovariance structure is unconstrainedNote that(I Sb')(15e')= X J3 +YIi = XllPi + U1,Y21 = X1J3 - X3j31 + UU3 = X2113, + LI2,E(U1) =(o, 2O1+o33)whereE(Li1tJ31)=,31=1,2,3.In this notation,(T12(Tii).X1j3 +(1I2(u-22)'(122�E(h1h1 0)=_!(X,1,31Xfl3)+A1V(o22)484 None of these two step estimators of (0.22)1/2 attains the Cramer-Rao lowerbound so that use of the Rothcnherg one step estimator is recommended whenpossible. An advantage of the multiplicity of eStiiflitors for 13 and ((r,,)2 is thatthey allow a check on the appropriateness of the model. For example, if the�probability ot the event Y U is not itS closely linked to the equation for Y21 as'Fobin assumed, theestimated from (l3h) will not be proportional to the131/(0.22)1/2 estimated from probit analysis.-Finally, note that unconstrained estimates of equation (13b) are likely to heimprecise because A and its estimate are nonlinear functions of theX2regressorsthat appear in that equation. Sinceand A are positively correlated (oftenstrongly so) multicollinearity may be a problem and for that reason constrainedestimators can produce niore reasonable results.The procedure for more general models is similar to that outlined for Tohin's�model. In our second example, suppose that we observe Y, only when Y, 0, that�we do not observe actual values of Y7, but we know whether or not Y2 0 for allobservations from a random sample. This is the model of Gronau and Lewis.As before, we may estimate çb and A from probit analysis. The estimated A isthen used as a regressor in equation (I 3a). Regression estimates of the parametersare consistent estimators. To estimate the approximate generalized least squaresversion of (l3a), we may use the residuals from this regression to estimate theweights given in equation (14c).'An alternative procedure uses the information from (1 4c) in conjunction with(13a) to simultaneously estimate 13, p, and o. From the definition of p givenbelow equation (14c) note that equation (13a) may be written as(1 3a')=-t--p(a1+ V11.The weighted estimator that utilizes the information that the coefficient of A, is aparameter of the population variance, chooses 13, p. and 0.ii tominimize-1_(Y - X1J31 P(0.i 1)'"2A1)222I-I-p (A1A1)with the A1 estimated from probit analysis used in place of the true A1. Asbefre,we cannot be sure that in small samplesthis estimator exists although in largesamples it must exist. Asymptotically, this procedure yields estimatorsthat areconsistent but are inefficient compared to maximum likelihood estimators.As a final example that is the topic of the empirical workreported below,consider the model of equations (lOa)(lOe) with normalityassumed for U1, and(J2, arid censored sampling assumed. Y2 isobserved up to an unknown factor of��proportionality when Y2O, and Y1 is observed only when Y2O.This 3xamplecombines aspects of the two previous examples.483S positive. This gives anof (fX,/((r2,) 1/2)and hencefor each observation. This estimate(I 3b)squares estimates of the coefficients iii (I 3b) mayitbiajijedNote that a weighted(I 3h) can he estimated tothe hetero_scedasticity that arises fromFor at least twothis procedure doesuliliie all of theavailableinformation. The first reason is that the proCedure ignores the information thattheprobit function estimatesup to a factor of proportionality. One could Utilizethis information to write (1 3h) as(13b')(r22)If2(_+A1)-i- "2i(o.2)u/2,aas a regressor.9and unweighted versionsofweights(14a).Still,estimates are not fully efficient.theTheand thean(ff22)2. One can use thissquaredto equal the residual variance.Thus one can solve thefor (T22)2,'iiI(1+41A1A)-(T22of A and 4, areplace ofleftofis theestimate(a22)1"2and ispositive(a22)"2possessesmore efficient than the previous estimator.'"°Note that the estimated A1 -4land the actual Y, are positive numbers. Hence the least squaresestimate of (o.22)I2 is positive.l'he equation for (if)'12 is given by(if22)='/2il+MAl+A,A)1,l+,A1-A)11112V1__i(A,-,)2ft(!+,A1-A)1(A-12for!-=dO.1,l4-,A--A1When the last condition does not hold,itis straightforward to develop the appropriate expression fr(if22)1t2In either case consistency is readily verified.Nothing guarantees that d isForexample, if all observations havea probability of sample inclusion that exceeds 15 percent, dandnorealroot need exist in a small sample althoughina large sample, one must exist. Itisinteresting tonotethatnonexistence is most likely in samples with observationsfor which the probability of sampleinclusion is high, i.e., precisely in thosecircumstances when least squares is an appropriate estimator,the raneof variation in A1 is small,and we would place little weight on the regression estimate482r ,nd.,f--X', the probability of sample inclusiongoes to zero, i'idlim L'( V,)Umi l"( "2i V1)()liiii E(V) '(l _i2r.The weighting implicit in (11 S underscores the crucialnature of the assumptionthat all observations are drawn from the same population distribution.As a practical matter, we do not k flowand X, and hence we cannot estimateequations (I 3a) and (h) unless there is prior informationon A1. In the case of acensored .samplc, it is possible to compute the probability thatan observation hasdata missing on Y1 and hence ills possible touse probit analysis to estimate I andA1. Thus, denoting d1 as a ran(lom variable with the value ofone when Y isobserved, the sample likelihood for the probil analysis is.=[j [F(1)}'1"W)Jd,.Subject to the standard identification conditions in probit analysis. it is possibletomaximize f to obtain consistent estimates 01 f3,and hence A. These estimatesof A, may be used in place of the true A1 as regressors in equations (I 3a) and (b).When regression estimates of the coefficients In equations (13a) and (l3h)al-cpossible, they yield consistent estimates of the true parameters since A, estimatedfrom probit analysis is a consistent estimator of the true A, and Slutsky's theoremapplies. More efficient estimates may be obtained from the approximate GLSestimates which converge in distribution to the true (LS estimates by the Cramerconvergence theorem (Cramer, 1941)). Other estimates may he obtained fromcovariance structure. Each set of estimates may he used as initial consistentestimates for estimation of the likelihood furction. As Rothenberg and Leenders(1964) have shown, one Newton step toward Optimizing the likelihood functionproduces estimates that arc asymptotically efficient in the sense that they attainthe ('ramcrRao lower hound.Consideration of three special cases will help to focus ideas. First considerTohin's model which is presented in equation ( 13h) in the notation of this section.In Tohin's original model, we observe Y2 only ifitis positive hut for allobservations in a random sample we know whether or not Y, is positive. In thetwo Stage procedure proposed here, first estimate the probit model determining1 he IikIihood fUilCtI()Il is straight fI)Iward- Using the flotation in the text brthe case of V1observed when Y,0, Y nut obsersed otherwise, and V2 not continuously measured, the likelihoodbecomesj[Jxh( V11LI21) htL][J..h7(U2) dU2Er'[J:i2 h2(U2) dU2YThe likelihood (unction for the other eases is straightforward.481 X214,V11U21'2observa-V,1Thus, the estimatedresidual varja,1ce forcourse, the summationterm can he estimatedsomodified. But the GLSsampling variances Forthe(14a) converges to(I +.A1A). Ofthat the standard O1Svariance Formulae may beparametef-5 are lower and hence preferable.E( V,)('22( I +fA, - A)E(V11V,)ri,( I +q1A1 --As)E(V1)=r11t(l whereand f and F respectively are the density and distribution function of the standardnormal distribution. The Tobin model is a special case with ItU1) a singulardensity since U,U."A1" is the inverse of Mill's ratio and is known as the hazard rate in reliabilitytheory. There are several interesting properties of A1:Its denominator is the probability that observation i has data for Y1.The lower the probability that an observation has data onthe greaterthe value of A for that observation.More precisely, using a result due to Feller (1968) and cited in Haberman'sproof of the concavity of the probit likelihood function, (Haberman, 1974, p.309), it is straightforward to show that�0,ui4lirn A=courn A =0.in samples in which the selectivity problem is unimportant (i.e., theselection rule ensures that all potential population observationssampled) A, becomes negligibly small so that least squares estimates of thehavepoperties.Usingthese results,may write(22)=A1.(4T22)if weA1, or could estimate it, least squares could be applied to estimatethein equation (12a).if we could measure Y2, when�0, asTobin's model, knowledge ofand A,directestimation of 132 and (a22)h/2 by least squares without having to resort tooptimizing likelihood functions.We may add disturbances to equations (12a) and (12b) to reach themodel(ff22)=X21/32+.u22A+where6 Note thatInormalized density of(1 3a)(1 3b) Lled to the following model:(1 Oa)E(Y,X1. YO) XE( UILJ.(lOb)= 1i/J Y,0,I otherwise(1 Oc)E(h1 1x21,0) =y21o).Equations (lOa) and (b) are as before. Equation (lOc) exploits the informationthat we observeup to a positive factor of proportionality ifis positive.'These examples are not intended as a complete literature survey. Yet theyillustrate that the basic statistical models for limited dependent variables, censorirtg and truncation may be summarized in a simple general model for missing data.Regression estimates of (Ia) fit on a selected sample omit the final term on theright hand side of equation (9). Thus the bias that arises from using least squares tofit models for limited dependent variables or models with censoring or truncationarises solely because the conditional mean of U1, is not included as a regressor.The bias that arises from truncation or selection may be interpreted as arisingfrom an ordinary specification error with the conditional mean deleted as anexplanatory variable. In general, one cannot sign the direction of bias that arisesfrom omitting this conditional mean.4A crucial distinction is the one between a truncated sample and a censoredsample. In a truncated sample one cannot use the available data to estimate theIn the next section, I examine a technique that enables one to use this estimatedprobability to estimate the missing conditional mean for each observation. Thesion analysis so that estimators with desirable large sample properties may bederived from cotnputationally simple methods.Suppose thatthe joint density of U1, andis bivariatenormal. Using vell known results in the literature (see, e.g., Johnson and Kotz(1972), pp. 112-113)(TI2� X21/32) --478 observeand observed purchases are zero. Thus,Tohin's model the�sample selection rule is given by (6, and we may writeE(YIIFXh,As noted by Cragg (1971) and Nelson (1975), the rule generating theobserved data need not be as closely related to the model of equation (I a)as Tohinassumes. Consider the following decision rule: we obtain data on Y1, if anotherrandom variable crosses a threshold, i.e., ifwhile if the opposite inequality holds we do not obtain data on Y11. The choice ofzero as a threshold is an inessential normalization. Also, note that we could definea dummy random variable= I with the propertiesdl 1ff Y20,dO otherwiseand proceed to analyze the joint distribution ofand d1, dispensing withaltogether. The advantage in using selection rule representation (7) is that itpermits a unified summary of the existing literature.Using this representation, we may write equation (5) asY1X11, 1'21= X1J31If U1 is independent of U21, the conditional mean of. U1 is zero, and the sampleselection process into the incomplete sample is random. In the general case, theconditional mean of the disturbance in the incomplete sample is a function ofMoreover, the effect of such sample selection is that X2 variables that do notbelong in the population regression function appear to be statistically significant inequations fit on selected samples.3A good example of this phenomenon arises in the Gronau (1974)Lewis(1974) wage selectivity bias problem. In their analysesis the wage rate whichis only observed for working women, and Y, is an index of labor force attachment(which in the absence of fixed costs of work may be interpreted as the differencebetween market wages and reservation wages). If the presence of children affectsthe work decision but does riot affect market wages, regression evidence fromnot necessarily evidence that there is market discrimination against such womenlower wages. Moreover, regression evidence that such extraneous variables"explain" wage rates may be interpreted as evidence that selection bias is present.For a final example, I draw on my own work (Heckman, 1974). Letting Y0 bethe wage rate for woman 1, and Y2, be the difference between market wages andreservation wages, a woman works if�0. Using results from the theory oflabor supply, one can show that under certain simplifying assumptions workinghours,are proportional to 1'21. If this proportionality factor is�l/y( 0), we areIf theregressor in X2 is "1'., so that the probability of sample IflCIUSIOfl is the samefor allobservations, only the intercept is biased.[S =(Ib)2i =+ 1121E(U)=iiP.476Y1.=+'I2Tobin actually assumesa separate knownSec(1958). AnnaLsand Social Measurenient, 5/4, 1 V76This paper presents a unified treatment of statistical nior/els [(JT trt4flc(41j0,J, sampleselection and limiteddependent variables. A simple estimator is proposed that permits estimationof those models by leastsquares, and probil analysts. In an empirical example, it is shown that the estimator yieldsestimates closeto the maximum likelihood estimates.This paper presents a unified sunimary of statistical models forsample selection,truncation and limited dependent variables. The bias that arisesfrom using leastsquares when such models apply is characterized as a simple specificationerror oromitted variable problem. A computationally simple estimatorapplicable to suchmodels is proposed that amounts to estimating the omitted variableand usingleast squares including the estimated omitted variableas a regressor.The estimator discussed in this paper is not new. A grouped dataversion of itappears in papers on sample selection bias by Gronau (1974) and Lewis (1974).This paper extends the analysis in those papers by developing thestatisticalproperties of the estimator and demonstrating that the method is applicableto awider class of models, and a more varied class of empirical settings, thantheoriginal papers consider.The paper is in three parts. First, I discuss the common structure of modelsofsample selection, truncation and limited dependent variables. Then I discussand propose an estimator for these models. Finally, I apply the estimator toreestimate a model of female labor supply and wages. In this example, I demon-strate that the consistent estimator discussed here closely approximates estimatesobtained from optimizing a computationally more complicated likelihood func-tion.I. A Two EQUATION MODELTo simplify the exposition, I consider a two equation model. Few new pointsarise in the multivariate case, and the multivariate extension is straightforward.* This researchwas funded by a HEW grant to the Rand Corporation and a I)epartmcnt of LaborASPER grant to the National Bureau of Economic Research. Neither organization is responsible forthe contents of this paper. An earlier version oF this paper appeared a "Shadow Prices, Market Wagesand Labor Supply Revisited: Sonic Computational and Conceptual Simplifications and RevisedEstimates,' June 1975. 1 have received useful comments from T. Amensiya, Gary Chamberlin, JohnCogan. Zvi Griliches, Reuben Gronau, Ed Learner, Lung Fei Lee, H. Gregg i.ewis, Mark Killings-worth T. Macurdy, Bill Rogers, and T. Paul Schultz at various stages of this research. None areresponsible for any errors that remain in this paper. Ralph Shneiva, performed the calculationsreported below.'The Lewis paper s an extended comment on Gronau's paper. Thus credia for developing themethod belongs to (ironau alihough Lewis' paper considerably extends and clarifies Gronau'sanalysis.475 R1:ltu:Nu'IsAmemiya. 'I'., ''Multivariate Regrcsion and Simultaneous Lquation Models When thel)cpendentVariables are Truncated Normal," Econo,netrica 42 No. 6, November 1974: 999-101 I.--, "Regression Analysis When the I)epcndent Variable is Truncated Normal," !:coflo,net,jca4l,No.6: 997-1017.Cragg, J., "Some Statictical Models for Limited Dependent Variables with Application to the L)emandfor Durable Coods,' Econoutetrica 39, No. 5, September, 1971: 829-844.Cramer, H., Mathematical Methods in Statistics. l'rince ton, 1946.Feller, W., An Iniroducrio,i to Probability Theory and Its Applications, Vol. 2. Wiley, New York, I 971.Goldberger, A., "Linear Regression in Truncated Samples," Social Systems Research Institute,University of Wisconsin-Madison, May 23, 1975.(lronau, Reuben, "The Effect of Children on the Housewife's Value of Time," Journal of PoliticalEconomy, March/April, 1973.-,"WageComparisonsASelectivityBias,'JouriialofPoliticalEconomyNovernber/Deccniber, 1974.1-laberman, Shelby, The Analysis of Frequency Data. University of Chicago Press, 1974.Heckman, James J., "Shadow Prices, Market Wages and Labor Supply," Econonietrica, July 1974."Sample Selection Bias as a Specification Error," Unpublished Manuscript, Rand Corpora-tion, April 1976.Johnson, N. and S. Kotz, Distributions in Statistics: ('onfinuous Multit'ariate Distributions. Wiley, NewYork, 1972.Lewis, H. Gregg, "Comments on Selectivity Biases in Wage Comparisons," Journal of PoliticalEconomy, November/December 1974.Unionism and Relative Wages. University of Chicago Press, 1963.Nelson, Forrest, "Censored Regression Models with Unobserved Stochastic Censoring Thresholds,"National Bureau of Economic Research Working Paper No. 63, 1975.Rothcnbcrg, T. and C. Leenders," Efficient Estimation of Simultaneous Equation Systems,"Econometrica 32, No. 1, January 1964: 57-76.Shea, etal., Dual Careers: A l.ongitudina! Study of Labor Market Experience. Columbus, Ohio: Centerfor Human Resource Research, Ohio State University, May 1970.'Fobin, J., "Estimation of Relationships for Limited Dependent Variables," Econwnetrica 26, 1958:24-36.492SL 5,083APPENDIX AI (u22)".7Estimates of the covari;:nce structureare obtained from the inter-equation residual correlationbetween the residuals froni equation(16) and the wage function (I5b). Note that the estimate oftaken from the regression coefficient ofequation (16) is 53.Ol.An example ot the potential incost saving may be useful,It cost $700 to produceestimates of the likelihood functionreported in Table 3 and $15 to produce the initial consistentestimates and the GLS estimates.49() OnTABLE 3MAXIMUM LIKELII-ioot) ESTIMATES ANt) INITIALCONSISTENT ESTIMATES OFTIlEHECKMAN (1974) MODEIAnnual Hours("t" statistics in parentheses)Log Likelihood-5,778Log Likelihood under-5,783null hypothesis of noselection biasInitialLikelihoodEstimates inConsistentFirst StepOptimumOriginal PaperEstimatesIterateNatural LogarithmMarket Wage Equation Yie(Coefficients of i3)Intercept-0.412-0.982-0.435-0.593(5.28)(8.93)(8.70)Education0.06790.076!0.0686(1.0688(13.58)(10.15)(17.20)Experience0.02000.0480.02050.025(10.001(12.00)(1.14)Natural Logarithm ofReservation Wage Y3(Coefficients of f33)Intercept-0.1191(1.77)Effect of hours on0.152x ioreservation wage(796)(1)Husband's wage0.00946(2.49)Wife's education0.0574(10.44)Assets0.185x 10(3.14)Nbr. children 60.114(6.48)Std. Deviation in0.329Mkt. Wage Equation(32.90)Std. Deviation in0.363Reservation Wage Eq.(24.20)(133Interequation Correl.0.725(11.69)0.051(7.29)0.0534(7.63)0.135x io-(2.45)(52.63)0.452(37.36)16 Eitherweighted or unweighted estimators may be used, and as discussed in Section II, a moreeffictent estimator exploits the information that the regression coefficient is thesquare root of thepopulation variance.-0.623(32.28)0.63x io-(12.60)0.532(28.00)-0.103-0.0964(2.10)0.9xl00.l4SxI()(1383.18)0.004180.0238(4.76)0.0610.0548(13.70)0.1702x 1060.285x 10(0.41)0.1150.116(7.25)0.3200.253(23.00)0.3510.259(26.16)0.65410.3.53x i00.3 17(14.22)(4.23)-6,414-6,102 R2Rhood functionA,(o2)2/y(16)hI=2(_1A1)+X,i=1,...,I,,f32X2,/l/2j22 ,y)The labor supply equationis just identified because the onlyvariable that appears in the wagenotin the reservation wageequation is labor market experience. Hence thebetween estimating equationsand (15e') is immaterial.15The estimateis obtained by dividing thecoefficient for experience iii the femalewageequation (0.0203 in the GLSestimates)into the coefficient for experiencein the hours of workequation (-79). The resultingestimate ILEAST SQUARES SYSTEM FOR ANNUAL HOURS WORKED("t" STA11STICS IN PARNTIIESES)�8.39. i'1,08x 10(2.6) J-fiX2i/lo2) Y11h1PIlL EMPIRICAL PERFORMANCE OF THE ESTIMATORI 2 A derivationof the sample likelihood function for this model is provided in Heckman (1974). Ilor specifierty, Jet Y, be the market wage that woman icould earnwere sheto work. Y2, is the difference between market vagesh and reservation WagesY. I louis of work arc piooi tional to the diiieiciice betweenniarkc wagesandreservation wages when this difference is positive with the factorofproportionality denoted by I /y. This proportionality factor lutist be positiveif the modelis toaccord with economic theory.In this model, the parameters of the functions determiningY and Y1are ofdirect interest. The equation for reservationwagesmay be written as(ISa)(1 5b)hence(!5c)where= (X1X31)andU, =- U.The hours of work equation is given byyor, in reduced form.=--(XIpI-x11$3)+!(u11_U11)VVU31 and U11 are assumedto he joint normal variates withzero mean and thecovariance structure is unconstrainedNote that(I Sb')(15e')= X J3 +YIi = XllPi + U1,Y21 = X1J3 - X3j31 + UU3 = X2113, + LI2,E(U1) =(o, 2O1+o33)whereE(Li1tJ31)=,31=1,2,3.In this notation,(T12(Tii).X1j3 +(1I2(u-22)'(122�E(h1h1 0)=_!(X,1,31Xfl3)+A1V(o22)484 None of these two step estimators of (0.22)1/2 attains the Cramer-Rao lowerbound so that use of the Rothcnherg one step estimator is recommended whenpossible. An advantage of the multiplicity of eStiiflitors for 13 and ((r,,)2 is thatthey allow a check on the appropriateness of the model. For example, if the�probability ot the event Y U is not itS closely linked to the equation for Y21 as'Fobin assumed, theestimated from (l3h) will not be proportional to the131/(0.22)1/2 estimated from probit analysis.-Finally, note that unconstrained estimates of equation (13b) are likely to heimprecise because A and its estimate are nonlinear functions of theX2regressorsthat appear in that equation. Sinceand A are positively correlated (oftenstrongly so) multicollinearity may be a problem and for that reason constrainedestimators can produce niore reasonable results.The procedure for more general models is similar to that outlined for Tohin's�model. In our second example, suppose that we observe Y, only when Y, 0, that�we do not observe actual values of Y7, but we know whether or not Y2 0 for allobservations from a random sample. This is the model of Gronau and Lewis.As before, we may estimate çb and A from probit analysis. The estimated A isthen used as a regressor in equation (I 3a). Regression estimates of the parametersare consistent estimators. To estimate the approximate generalized least squaresversion of (l3a), we may use the residuals from this regression to estimate theweights given in equation (14c).'An alternative procedure uses the information from (1 4c) in conjunction with(13a) to simultaneously estimate 13, p, and o. From the definition of p givenbelow equation (14c) note that equation (13a) may be written as(1 3a')=-t--p(a1+ V11.The weighted estimator that utilizes the information that the coefficient of A, is aparameter of the population variance, chooses 13, p. and 0.ii tominimize-1_(Y - X1J31 P(0.i 1)'"2A1)222I-I-p (A1A1)with the A1 estimated from probit analysis used in place of the true A1. Asbefre,we cannot be sure that in small samplesthis estimator exists although in largesamples it must exist. Asymptotically, this procedure yields estimatorsthat areconsistent but are inefficient compared to maximum likelihood estimators.As a final example that is the topic of the empirical workreported below,consider the model of equations (lOa)(lOe) with normalityassumed for U1, and(J2, arid censored sampling assumed. Y2 isobserved up to an unknown factor of��proportionality when Y2O, and Y1 is observed only when Y2O.This 3xamplecombines aspects of the two previous examples.483S positive. This gives anof (fX,/((r2,) 1/2)and hencefor each observation. This estimate(I 3b)squares estimates of the coefficients iii (I 3b) mayitbiajijedNote that a weighted(I 3h) can he estimated tothe hetero_scedasticity that arises fromFor at least twothis procedure doesuliliie all of theavailableinformation. The first reason is that the proCedure ignores the information thattheprobit function estimatesup to a factor of proportionality. One could Utilizethis information to write (1 3h) as(13b')(r22)If2(_+A1)-i- "2i(o.2)u/2,aas a regressor.9and unweighted versionsofweights(14a).Still,estimates are not fully efficient.theTheand thean(ff22)2. One can use thissquaredto equal the residual variance.Thus one can solve thefor (T22)2,'iiI(1+41A1A)-(T22of A and 4, areplace ofleftofis theestimate(a22)1"2and ispositive(a22)"2possessesmore efficient than the previous estimator.'"°Note that the estimated A1 -4land the actual Y, are positive numbers. Hence the least squaresestimate of (o.22)I2 is positive.l'he equation for (if)'12 is given by(if22)='/2il+MAl+A,A)1,l+,A1-A)11112V1__i(A,-,)2ft(!+,A1-A)1(A-12for!-=dO.1,l4-,A--A1When the last condition does not hold,itis straightforward to develop the appropriate expression fr(if22)1t2In either case consistency is readily verified.Nothing guarantees that d isForexample, if all observations havea probability of sample inclusion that exceeds 15 percent, dandnorealroot need exist in a small sample althoughina large sample, one must exist. Itisinteresting tonotethatnonexistence is most likely in samples with observationsfor which the probability of sampleinclusion is high, i.e., precisely in thosecircumstances when least squares is an appropriate estimator,the raneof variation in A1 is small,and we would place little weight on the regression estimate482r ,nd.,f--X', the probability of sample inclusiongoes to zero, i'idlim L'( V,)Umi l"( "2i V1)()liiii E(V) '(l _i2r.The weighting implicit in (11 S underscores the crucialnature of the assumptionthat all observations are drawn from the same population distribution.As a practical matter, we do not k flowand X, and hence we cannot estimateequations (I 3a) and (h) unless there is prior informationon A1. In the case of acensored .samplc, it is possible to compute the probability thatan observation hasdata missing on Y1 and hence ills possible touse probit analysis to estimate I andA1. Thus, denoting d1 as a ran(lom variable with the value ofone when Y isobserved, the sample likelihood for the probil analysis is.=[j [F(1)}'1"W)Jd,.Subject to the standard identification conditions in probit analysis. it is possibletomaximize f to obtain consistent estimates 01 f3,and hence A. These estimatesof A, may be used in place of the true A1 as regressors in equations (I 3a) and (b).When regression estimates of the coefficients In equations (13a) and (l3h)al-cpossible, they yield consistent estimates of the true parameters since A, estimatedfrom probit analysis is a consistent estimator of the true A, and Slutsky's theoremapplies. More efficient estimates may be obtained from the approximate GLSestimates which converge in distribution to the true (LS estimates by the Cramerconvergence theorem (Cramer, 1941)). Other estimates may he obtained fromcovariance structure. Each set of estimates may he used as initial consistentestimates for estimation of the likelihood furction. As Rothenberg and Leenders(1964) have shown, one Newton step toward Optimizing the likelihood functionproduces estimates that arc asymptotically efficient in the sense that they attainthe ('ramcrRao lower hound.Consideration of three special cases will help to focus ideas. First considerTohin's model which is presented in equation ( 13h) in the notation of this section.In Tohin's original model, we observe Y2 only ifitis positive hut for allobservations in a random sample we know whether or not Y, is positive. In thetwo Stage procedure proposed here, first estimate the probit model determining1 he IikIihood fUilCtI()Il is straight fI)Iward- Using the flotation in the text brthe case of V1observed when Y,0, Y nut obsersed otherwise, and V2 not continuously measured, the likelihoodbecomesj[Jxh( V11LI21) htL][J..h7(U2) dU2Er'[J:i2 h2(U2) dU2YThe likelihood (unction for the other eases is straightforward.481 X214,V11U21'2observa-V,1Thus, the estimatedresidual varja,1ce forcourse, the summationterm can he estimatedsomodified. But the GLSsampling variances Forthe(14a) converges to(I +.A1A). Ofthat the standard O1Svariance Formulae may beparametef-5 are lower and hence preferable.E( V,)('22( I +fA, - A)E(V11V,)ri,( I +q1A1 --As)E(V1)=r11t(l whereand f and F respectively are the density and distribution function of the standardnormal distribution. The Tobin model is a special case with ItU1) a singulardensity since U,U."A1" is the inverse of Mill's ratio and is known as the hazard rate in reliabilitytheory. There are several interesting properties of A1:Its denominator is the probability that observation i has data for Y1.The lower the probability that an observation has data onthe greaterthe value of A for that observation.More precisely, using a result due to Feller (1968) and cited in Haberman'sproof of the concavity of the probit likelihood function, (Haberman, 1974, p.309), it is straightforward to show that�0,ui4lirn A=courn A =0.in samples in which the selectivity problem is unimportant (i.e., theselection rule ensures that all potential population observationssampled) A, becomes negligibly small so that least squares estimates of thehavepoperties.Usingthese results,may write(22)=A1.(4T22)if weA1, or could estimate it, least squares could be applied to estimatethein equation (12a).if we could measure Y2, when�0, asTobin's model, knowledge ofand A,directestimation of 132 and (a22)h/2 by least squares without having to resort tooptimizing likelihood functions.We may add disturbances to equations (12a) and (12b) to reach themodel(ff22)=X21/32+.u22A+where6 Note thatInormalized density of(1 3a)(1 3b) Lled to the following model:(1 Oa)E(Y,X1. YO) XE( UILJ.(lOb)= 1i/J Y,0,I otherwise(1 Oc)E(h1 1x21,0) =y21o).Equations (lOa) and (b) are as before. Equation (lOc) exploits the informationthat we observeup to a positive factor of proportionality ifis positive.'These examples are not intended as a complete literature survey. Yet theyillustrate that the basic statistical models for limited dependent variables, censorirtg and truncation may be summarized in a simple general model for missing data.Regression estimates of (Ia) fit on a selected sample omit the final term on theright hand side of equation (9). Thus the bias that arises from using least squares tofit models for limited dependent variables or models with censoring or truncationarises solely because the conditional mean of U1, is not included as a regressor.The bias that arises from truncation or selection may be interpreted as arisingfrom an ordinary specification error with the conditional mean deleted as anexplanatory variable. In general, one cannot sign the direction of bias that arisesfrom omitting this conditional mean.4A crucial distinction is the one between a truncated sample and a censoredsample. In a truncated sample one cannot use the available data to estimate theIn the next section, I examine a technique that enables one to use this estimatedprobability to estimate the missing conditional mean for each observation. Thesion analysis so that estimators with desirable large sample properties may bederived from cotnputationally simple methods.Suppose thatthe joint density of U1, andis bivariatenormal. Using vell known results in the literature (see, e.g., Johnson and Kotz(1972), pp. 112-113)(TI2� X21/32) --478 observeand observed purchases are zero. Thus,Tohin's model the�sample selection rule is given by (6, and we may writeE(YIIFXh,As noted by Cragg (1971) and Nelson (1975), the rule generating theobserved data need not be as closely related to the model of equation (I a)as Tohinassumes. Consider the following decision rule: we obtain data on Y1, if anotherrandom variable crosses a threshold, i.e., ifwhile if the opposite inequality holds we do not obtain data on Y11. The choice ofzero as a threshold is an inessential normalization. Also, note that we could definea dummy random variable= I with the propertiesdl 1ff Y20,dO otherwiseand proceed to analyze the joint distribution ofand d1, dispensing withaltogether. The advantage in using selection rule representation (7) is that itpermits a unified summary of the existing literature.Using this representation, we may write equation (5) asY1X11, 1'21= X1J31If U1 is independent of U21, the conditional mean of. U1 is zero, and the sampleselection process into the incomplete sample is random. In the general case, theconditional mean of the disturbance in the incomplete sample is a function ofMoreover, the effect of such sample selection is that X2 variables that do notbelong in the population regression function appear to be statistically significant inequations fit on selected samples.3A good example of this phenomenon arises in the Gronau (1974)Lewis(1974) wage selectivity bias problem. In their analysesis the wage rate whichis only observed for working women, and Y, is an index of labor force attachment(which in the absence of fixed costs of work may be interpreted as the differencebetween market wages and reservation wages). If the presence of children affectsthe work decision but does riot affect market wages, regression evidence fromnot necessarily evidence that there is market discrimination against such womenlower wages. Moreover, regression evidence that such extraneous variables"explain" wage rates may be interpreted as evidence that selection bias is present.For a final example, I draw on my own work (Heckman, 1974). Letting Y0 bethe wage rate for woman 1, and Y2, be the difference between market wages andreservation wages, a woman works if�0. Using results from the theory oflabor supply, one can show that under certain simplifying assumptions workinghours,are proportional to 1'21. If this proportionality factor is�l/y( 0), we areIf theregressor in X2 is "1'., so that the probability of sample IflCIUSIOfl is the samefor allobservations, only the intercept is biased.[S =(Ib)2i =+ 1121E(U)=iiP.476Y1.=+'I2Tobin actually assumesa separate knownSec(1958). AnnaLsand Social Measurenient, 5/4, 1 V76This paper presents a unified treatment of statistical nior/els [(JT trt4flc(41j0,J, sampleselection and limiteddependent variables. A simple estimator is proposed that permits estimationof those models by leastsquares, and probil analysts. In an empirical example, it is shown that the estimator yieldsestimates closeto the maximum likelihood estimates.This paper presents a unified sunimary of statistical models forsample selection,truncation and limited dependent variables. The bias that arisesfrom using leastsquares when such models apply is characterized as a simple specificationerror oromitted variable problem. A computationally simple estimatorapplicable to suchmodels is proposed that amounts to estimating the omitted variableand usingleast squares including the estimated omitted variableas a regressor.The estimator discussed in this paper is not new. A grouped dataversion of itappears in papers on sample selection bias by Gronau (1974) and Lewis (1974).This paper extends the analysis in those papers by developing thestatisticalproperties of the estimator and demonstrating that the method is applicableto awider class of models, and a more varied class of empirical settings, thantheoriginal papers consider.The paper is in three parts. First, I discuss the common structure of modelsofsample selection, truncation and limited dependent variables. Then I discussand propose an estimator for these models. Finally, I apply the estimator toreestimate a model of female labor supply and wages. In this example, I demon-strate that the consistent estimator discussed here closely approximates estimatesobtained from optimizing a computationally more complicated likelihood func-tion.I. A Two EQUATION MODELTo simplify the exposition, I consider a two equation model. Few new pointsarise in the multivariate case, and the multivariate extension is straightforward.* This researchwas funded by a HEW grant to the Rand Corporation and a I)epartmcnt of LaborASPER grant to the National Bureau of Economic Research. Neither organization is responsible forthe contents of this paper. An earlier version oF this paper appeared a "Shadow Prices, Market Wagesand Labor Supply Revisited: Sonic Computational and Conceptual Simplifications and RevisedEstimates,' June 1975. 1 have received useful comments from T. Amensiya, Gary Chamberlin, JohnCogan. Zvi Griliches, Reuben Gronau, Ed Learner, Lung Fei Lee, H. Gregg i.ewis, Mark Killings-worth T. Macurdy, Bill Rogers, and T. Paul Schultz at various stages of this research. None areresponsible for any errors that remain in this paper. Ralph Shneiva, performed the calculationsreported below.'The Lewis paper s an extended comment on Gronau's paper. Thus credia for developing themethod belongs to (ironau alihough Lewis' paper considerably extends and clarifies Gronau'sanalysis.475 R1:ltu:Nu'IsAmemiya. 'I'., ''Multivariate Regrcsion and Simultaneous Lquation Models When thel)cpendentVariables are Truncated Normal," Econo,netrica 42 No. 6, November 1974: 999-101 I.--, "Regression Analysis When the I)epcndent Variable is Truncated Normal," !:coflo,net,jca4l,No.6: 997-1017.Cragg, J., "Some Statictical Models for Limited Dependent Variables with Application to the L)emandfor Durable Coods,' Econoutetrica 39, No. 5, September, 1971: 829-844.Cramer, H., Mathematical Methods in Statistics. l'rince ton, 1946.Feller, W., An Iniroducrio,i to Probability Theory and Its Applications, Vol. 2. Wiley, New York, I 971.Goldberger, A., "Linear Regression in Truncated Samples," Social Systems Research Institute,University of Wisconsin-Madison, May 23, 1975.(lronau, Reuben, "The Effect of Children on the Housewife's Value of Time," Journal of PoliticalEconomy, March/April, 1973.-,"WageComparisonsASelectivityBias,'JouriialofPoliticalEconomyNovernber/Deccniber, 1974.1-laberman, Shelby, The Analysis of Frequency Data. University of Chicago Press, 1974.Heckman, James J., "Shadow Prices, Market Wages and Labor Supply," Econonietrica, July 1974."Sample Selection Bias as a Specification Error," Unpublished Manuscript, Rand Corpora-tion, April 1976.Johnson, N. and S. Kotz, Distributions in Statistics: ('onfinuous Multit'ariate Distributions. Wiley, NewYork, 1972.Lewis, H. Gregg, "Comments on Selectivity Biases in Wage Comparisons," Journal of PoliticalEconomy, November/December 1974.Unionism and Relative Wages. University of Chicago Press, 1963.Nelson, Forrest, "Censored Regression Models with Unobserved Stochastic Censoring Thresholds,"National Bureau of Economic Research Working Paper No. 63, 1975.Rothcnbcrg, T. and C. Leenders," Efficient Estimation of Simultaneous Equation Systems,"Econometrica 32, No. 1, January 1964: 57-76.Shea, etal., Dual Careers: A l.ongitudina! Study of Labor Market Experience. Columbus, Ohio: Centerfor Human Resource Research, Ohio State University, May 1970.'Fobin, J., "Estimation of Relationships for Limited Dependent Variables," Econwnetrica 26, 1958:24-36.492SL 5,083APPENDIX AI (u22)".7Estimates of the covari;:nce structureare obtained from the inter-equation residual correlationbetween the residuals froni equation(16) and the wage function (I5b). Note that the estimate oftaken from the regression coefficient ofequation (16) is 53.Ol.An example ot the potential incost saving may be useful,It cost $700 to produceestimates of the likelihood functionreported in Table 3 and $15 to produce the initial consistentestimates and the GLS estimates.49() OnTABLE 3MAXIMUM LIKELII-ioot) ESTIMATES ANt) INITIALCONSISTENT ESTIMATES OFTIlEHECKMAN (1974) MODEIAnnual Hours("t" statistics in parentheses)Log Likelihood-5,778Log Likelihood under-5,783null hypothesis of noselection biasInitialLikelihoodEstimates inConsistentFirst StepOptimumOriginal PaperEstimatesIterateNatural LogarithmMarket Wage Equation Yie(Coefficients of i3)Intercept-0.412-0.982-0.435-0.593(5.28)(8.93)(8.70)Education0.06790.076!0.0686(1.0688(13.58)(10.15)(17.20)Experience0.02000.0480.02050.025(10.001(12.00)(1.14)Natural Logarithm ofReservation Wage Y3(Coefficients of f33)Intercept-0.1191(1.77)Effect of hours on0.152x ioreservation wage(796)(1)Husband's wage0.00946(2.49)Wife's education0.0574(10.44)Assets0.185x 10(3.14)Nbr. children 60.114(6.48)Std. Deviation in0.329Mkt. Wage Equation(32.90)Std. Deviation in0.363Reservation Wage Eq.(24.20)(133Interequation Correl.0.725(11.69)0.051(7.29)0.0534(7.63)0.135x io-(2.45)(52.63)0.452(37.36)16 Eitherweighted or unweighted estimators may be used, and as discussed in Section II, a moreeffictent estimator exploits the information that the regression coefficient is thesquare root of thepopulation variance.-0.623(32.28)0.63x io-(12.60)0.532(28.00)-0.103-0.0964(2.10)0.9xl00.l4SxI()(1383.18)0.004180.0238(4.76)0.0610.0548(13.70)0.1702x 1060.285x 10(0.41)0.1150.116(7.25)0.3200.253(23.00)0.3510.259(26.16)0.65410.3.53x i00.3 17(14.22)(4.23)-6,414-6,102 R2Rhood functionA,(o2)2/y(16)hI=2(_1A1)+X,i=1,...,I,,f32X2,/l/2j22 ,y)The labor supply equationis just identified because the onlyvariable that appears in the wagenotin the reservation wageequation is labor market experience. Hence thebetween estimating equationsand (15e') is immaterial.15The estimateis obtained by dividing thecoefficient for experience iii the femalewageequation (0.0203 in the GLSestimates)into the coefficient for experiencein the hours of workequation (-79). The resultingestimate ILEAST SQUARES SYSTEM FOR ANNUAL HOURS WORKED("t" STA11STICS IN PARNTIIESES)�8.39. i'1,08x 10(2.6) J-fiX2i/lo2) Y11h1PIlL EMPIRICAL PERFORMANCE OF THE ESTIMATORI 2 A derivationof the sample likelihood function for this model is provided in Heckman (1974). Ilor specifierty, Jet Y, be the market wage that woman icould earnwere sheto work. Y2, is the difference between market vagesh and reservation WagesY. I louis of work arc piooi tional to the diiieiciice betweenniarkc wagesandreservation wages when this difference is positive with the factorofproportionality denoted by I /y. This proportionality factor lutist be positiveif the modelis toaccord with economic theory.In this model, the parameters of the functions determiningY and Y1are ofdirect interest. The equation for reservationwagesmay be written as(ISa)(1 5b)hence(!5c)where= (X1X31)andU, =- U.The hours of work equation is given byyor, in reduced form.=--(XIpI-x11$3)+!(u11_U11)VVU31 and U11 are assumedto he joint normal variates withzero mean and thecovariance structure is unconstrainedNote that(I Sb')(15e')= X J3 +YIi = XllPi + U1,Y21 = X1J3 - X3j31 + UU3 = X2113, + LI2,E(U1) =(o, 2O1+o33)whereE(Li1tJ31)=,31=1,2,3.In this notation,(T12(Tii).X1j3 +(1I2(u-22)'(122�E(h1h1 0)=_!(X,1,31Xfl3)+A1V(o22)484 None of these two step estimators of (0.22)1/2 attains the Cramer-Rao lowerbound so that use of the Rothcnherg one step estimator is recommended whenpossible. An advantage of the multiplicity of eStiiflitors for 13 and ((r,,)2 is thatthey allow a check on the appropriateness of the model. For example, if the�probability ot the event Y U is not itS closely linked to the equation for Y21 as'Fobin assumed, theestimated from (l3h) will not be proportional to the131/(0.22)1/2 estimated from probit analysis.-Finally, note that unconstrained estimates of equation (13b) are likely to heimprecise because A and its estimate are nonlinear functions of theX2regressorsthat appear in that equation. Sinceand A are positively correlated (oftenstrongly so) multicollinearity may be a problem and for that reason constrainedestimators can produce niore reasonable results.The procedure for more general models is similar to that outlined for Tohin's�model. In our second example, suppose that we observe Y, only when Y, 0, that�we do not observe actual values of Y7, but we know whether or not Y2 0 for allobservations from a random sample. This is the model of Gronau and Lewis.As before, we may estimate çb and A from probit analysis. The estimated A isthen used as a regressor in equation (I 3a). Regression estimates of the parametersare consistent estimators. To estimate the approximate generalized least squaresversion of (l3a), we may use the residuals from this regression to estimate theweights given in equation (14c).'An alternative procedure uses the information from (1 4c) in conjunction with(13a) to simultaneously estimate 13, p, and o. From the definition of p givenbelow equation (14c) note that equation (13a) may be written as(1 3a')=-t--p(a1+ V11.The weighted estimator that utilizes the information that the coefficient of A, is aparameter of the population variance, chooses 13, p. and 0.ii tominimize-1_(Y - X1J31 P(0.i 1)'"2A1)222I-I-p (A1A1)with the A1 estimated from probit analysis used in place of the true A1. Asbefre,we cannot be sure that in small samplesthis estimator exists although in largesamples it must exist. Asymptotically, this procedure yields estimatorsthat areconsistent but are inefficient compared to maximum likelihood estimators.As a final example that is the topic of the empirical workreported below,consider the model of equations (lOa)(lOe) with normalityassumed for U1, and(J2, arid censored sampling assumed. Y2 isobserved up to an unknown factor of��proportionality when Y2O, and Y1 is observed only when Y2O.This 3xamplecombines aspects of the two previous examples.483S positive. This gives anof (fX,/((r2,) 1/2)and hencefor each observation. This estimate(I 3b)squares estimates of the coefficients iii (I 3b) mayitbiajijedNote that a weighted(I 3h) can he estimated tothe hetero_scedasticity that arises fromFor at least twothis procedure doesuliliie all of theavailableinformation. The first reason is that the proCedure ignores the information thattheprobit function estimatesup to a factor of proportionality. One could Utilizethis information to write (1 3h) as(13b')(r22)If2(_+A1)-i- "2i(o.2)u/2,aas a regressor.9and unweighted versionsofweights(14a).Still,estimates are not fully efficient.theTheand thean(ff22)2. One can use thissquaredto equal the residual variance.Thus one can solve thefor (T22)2,'iiI(1+41A1A)-(T22of A and 4, areplace ofleftofis theestimate(a22)1"2and ispositive(a22)"2possessesmore efficient than the previous estimator.'"°Note that the estimated A1 -4land the actual Y, are positive numbers. Hence the least squaresestimate of (o.22)I2 is positive.l'he equation for (if)'12 is given by(if22)='/2il+MAl+A,A)1,l+,A1-A)11112V1__i(A,-,)2ft(!+,A1-A)1(A-12for!-=dO.1,l4-,A--A1When the last condition does not hold,itis straightforward to develop the appropriate expression fr(if22)1t2In either case consistency is readily verified.Nothing guarantees that d isForexample, if all observations havea probability of sample inclusion that exceeds 15 percent, dandnorealroot need exist in a small sample althoughina large sample, one must exist. Itisinteresting tonotethatnonexistence is most likely in samples with observationsfor which the probability of sampleinclusion is high, i.e., precisely in thosecircumstances when least squares is an appropriate estimator,the raneof variation in A1 is small,and we would place little weight on the regression estimate482r ,nd.,f--X', the probability of sample inclusiongoes to zero, i'idlim L'( V,)Umi l"( "2i V1)()liiii E(V) '(l _i2r.The weighting implicit in (11 S underscores the crucialnature of the assumptionthat all observations are drawn from the same population distribution.As a practical matter, we do not k flowand X, and hence we cannot estimateequations (I 3a) and (h) unless there is prior informationon A1. In the case of acensored .samplc, it is possible to compute the probability thatan observation hasdata missing on Y1 and hence ills possible touse probit analysis to estimate I andA1. Thus, denoting d1 as a ran(lom variable with the value ofone when Y isobserved, the sample likelihood for the probil analysis is.=[j [F(1)}'1"W)Jd,.Subject to the standard identification conditions in probit analysis. it is possibletomaximize f to obtain consistent estimates 01 f3,and hence A. These estimatesof A, may be used in place of the true A1 as regressors in equations (I 3a) and (b).When regression estimates of the coefficients In equations (13a) and (l3h)al-cpossible, they yield consistent estimates of the true parameters since A, estimatedfrom probit analysis is a consistent estimator of the true A, and Slutsky's theoremapplies. More efficient estimates may be obtained from the approximate GLSestimates which converge in distribution to the true (LS estimates by the Cramerconvergence theorem (Cramer, 1941)). Other estimates may he obtained fromcovariance structure. Each set of estimates may he used as initial consistentestimates for estimation of the likelihood furction. As Rothenberg and Leenders(1964) have shown, one Newton step toward Optimizing the likelihood functionproduces estimates that arc asymptotically efficient in the sense that they attainthe ('ramcrRao lower hound.Consideration of three special cases will help to focus ideas. First considerTohin's model which is presented in equation ( 13h) in the notation of this section.In Tohin's original model, we observe Y2 only ifitis positive hut for allobservations in a random sample we know whether or not Y, is positive. In thetwo Stage procedure proposed here, first estimate the probit model determining1 he IikIihood fUilCtI()Il is straight fI)Iward- Using the flotation in the text brthe case of V1observed when Y,0, Y nut obsersed otherwise, and V2 not continuously measured, the likelihoodbecomesj[Jxh( V11LI21) htL][J..h7(U2) dU2Er'[J:i2 h2(U2) dU2YThe likelihood (unction for the other eases is straightforward.481 X214,V11U21'2observa-V,1Thus, the estimatedresidual varja,1ce forcourse, the summationterm can he estimatedsomodified. But the GLSsampling variances Forthe(14a) converges to(I +.A1A). Ofthat the standard O1Svariance Formulae may beparametef-5 are lower and hence preferable.E( V,)('22( I +fA, - A)E(V11V,)ri,( I +q1A1 --As)E(V1)=r11t(l whereand f and F respectively are the density and distribution function of the standardnormal distribution. The Tobin model is a special case with ItU1) a singulardensity since U,U."A1" is the inverse of Mill's ratio and is known as the hazard rate in reliabilitytheory. There are several interesting properties of A1:Its denominator is the probability that observation i has data for Y1.The lower the probability that an observation has data onthe greaterthe value of A for that observation.More precisely, using a result due to Feller (1968) and cited in Haberman'sproof of the concavity of the probit likelihood function, (Haberman, 1974, p.309), it is straightforward to show that�0,ui4lirn A=courn A =0.in samples in which the selectivity problem is unimportant (i.e., theselection rule ensures that all potential population observationssampled) A, becomes negligibly small so that least squares estimates of thehavepoperties.Usingthese results,may write(22)=A1.(4T22)if weA1, or could estimate it, least squares could be applied to estimatethein equation (12a).if we could measure Y2, when�0, asTobin's model, knowledge ofand A,directestimation of 132 and (a22)h/2 by least squares without having to resort tooptimizing likelihood functions.We may add disturbances to equations (12a) and (12b) to reach themodel(ff22)=X21/32+.u22A+where6 Note thatInormalized density of(1 3a)(1 3b) Lled to the following model:(1 Oa)E(Y,X1. YO) XE( UILJ.(lOb)= 1i/J Y,0,I otherwise(1 Oc)E(h1 1x21,0) =y21o).Equations (lOa) and (b) are as before. Equation (lOc) exploits the informationthat we observeup to a positive factor of proportionality ifis positive.'These examples are not intended as a complete literature survey. Yet theyillustrate that the basic statistical models for limited dependent variables, censorirtg and truncation may be summarized in a simple general model for missing data.Regression estimates of (Ia) fit on a selected sample omit the final term on theright hand side of equation (9). Thus the bias that arises from using least squares tofit models for limited dependent variables or models with censoring or truncationarises solely because the conditional mean of U1, is not included as a regressor.The bias that arises from truncation or selection may be interpreted as arisingfrom an ordinary specification error with the conditional mean deleted as anexplanatory variable. In general, one cannot sign the direction of bias that arisesfrom omitting this conditional mean.4A crucial distinction is the one between a truncated sample and a censoredsample. In a truncated sample one cannot use the available data to estimate theIn the next section, I examine a technique that enables one to use this estimatedprobability to estimate the missing conditional mean for each observation. Thesion analysis so that estimators with desirable large sample properties may bederived from cotnputationally simple methods.Suppose thatthe joint density of U1, andis bivariatenormal. Using vell known results in the literature (see, e.g., Johnson and Kotz(1972), pp. 112-113)(TI2� X21/32) --478 observeand observed purchases are zero. Thus,Tohin's model the�sample selection rule is given by (6, and we may writeE(YIIFXh,As noted by Cragg (1971) and Nelson (1975), the rule generating theobserved data need not be as closely related to the model of equation (I a)as Tohinassumes. Consider the following decision rule: we obtain data on Y1, if anotherrandom variable crosses a threshold, i.e., ifwhile if the opposite inequality holds we do not obtain data on Y11. The choice ofzero as a threshold is an inessential normalization. Also, note that we could definea dummy random variable= I with the propertiesdl 1ff Y20,dO otherwiseand proceed to analyze the joint distribution ofand d1, dispensing withaltogether. The advantage in using selection rule representation (7) is that itpermits a unified summary of the existing literature.Using this representation, we may write equation (5) asY1X11, 1'21= X1J31If U1 is independent of U21, the conditional mean of. U1 is zero, and the sampleselection process into the incomplete sample is random. In the general case, theconditional mean of the disturbance in the incomplete sample is a function ofMoreover, the effect of such sample selection is that X2 variables that do notbelong in the population regression function appear to be statistically significant inequations fit on selected samples.3A good example of this phenomenon arises in the Gronau (1974)Lewis(1974) wage selectivity bias problem. In their analysesis the wage rate whichis only observed for working women, and Y, is an index of labor force attachment(which in the absence of fixed costs of work may be interpreted as the differencebetween market wages and reservation wages). If the presence of children affectsthe work decision but does riot affect market wages, regression evidence fromnot necessarily evidence that there is market discrimination against such womenlower wages. Moreover, regression evidence that such extraneous variables"explain" wage rates may be interpreted as evidence that selection bias is present.For a final example, I draw on my own work (Heckman, 1974). Letting Y0 bethe wage rate for woman 1, and Y2, be the difference between market wages andreservation wages, a woman works if�0. Using results from the theory oflabor supply, one can show that under certain simplifying assumptions workinghours,are proportional to 1'21. If this proportionality factor is�l/y( 0), we areIf theregressor in X2 is "1'., so that the probability of sample IflCIUSIOfl is the samefor allobservations, only the intercept is biased.[S =(Ib)2i =+ 1121E(U)=iiP.476Y1.=+'I2Tobin actually assumesa separate knownSec(1958). AnnaLsand Social Measurenient, 5/4, 1 V76This paper presents a unified treatment of statistical nior/els [(JT trt4flc(41j0,J, sampleselection and limiteddependent variables. A simple estimator is proposed that permits estimationof those models by leastsquares, and probil analysts. In an empirical example, it is shown that the estimator yieldsestimates closeto the maximum likelihood estimates.This paper presents a unified sunimary of statistical models forsample selection,truncation and limited dependent variables. The bias that arisesfrom using leastsquares when such models apply is characterized as a simple specificationerror oromitted variable problem. A computationally simple estimatorapplicable to suchmodels is proposed that amounts to estimating the omitted variableand usingleast squares including the estimated omitted variableas a regressor.The estimator discussed in this paper is not new. A grouped dataversion of itappears in papers on sample selection bias by Gronau (1974) and Lewis (1974).This paper extends the analysis in those papers by developing thestatisticalproperties of the estimator and demonstrating that the method is applicableto awider class of models, and a more varied class of empirical settings, thantheoriginal papers consider.The paper is in three parts. First, I discuss the common structure of modelsofsample selection, truncation and limited dependent variables. Then I discussand propose an estimator for these models. Finally, I apply the estimator toreestimate a model of female labor supply and wages. In this example, I demon-strate that the consistent estimator discussed here closely approximates estimatesobtained from optimizing a computationally more complicated likelihood func-tion.I. A Two EQUATION MODELTo simplify the exposition, I consider a two equation model. Few new pointsarise in the multivariate case, and the multivariate extension is straightforward.* This researchwas funded by a HEW grant to the Rand Corporation and a I)epartmcnt of LaborASPER grant to the National Bureau of Economic Research. Neither organization is responsible forthe contents of this paper. An earlier version oF this paper appeared a "Shadow Prices, Market Wagesand Labor Supply Revisited: Sonic Computational and Conceptual Simplifications and RevisedEstimates,' June 1975. 1 have received useful comments from T. Amensiya, Gary Chamberlin, JohnCogan. Zvi Griliches, Reuben Gronau, Ed Learner, Lung Fei Lee, H. Gregg i.ewis, Mark Killings-worth T. Macurdy, Bill Rogers, and T. Paul Schultz at various stages of this research. None areresponsible for any errors that remain in this paper. Ralph Shneiva, performed the calculationsreported below.'The Lewis paper s an extended comment on Gronau's paper. Thus credia for developing themethod belongs to (ironau alihough Lewis' paper considerably extends and clarifies Gronau'sanalysis.475