/
STATA APPLICATIONS STATA APPLICATIONS

STATA APPLICATIONS - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
384 views
Uploaded On 2017-07-12

STATA APPLICATIONS - PPT Presentation

Task 1 last year Computer assignment The data set businddta contains information on Gross National Income GNI per capita and the number of days to open a business and to enforce a contract in a sample of 135 countries It was extracted from the Doing Business dataset a dataset c ID: 569354

number question daysopen days question number days daysopen business education open answer gnipc capita head household years regression dataset gni characteristics stata

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "STATA APPLICATIONS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

STATA APPLICATIONSSlide2

Task

1 last

year- Computer assignment

The data set busind.dta contains information on Gross National Income (GNI) per capita and the number of days to open a business and to enforce a contract in a sample of 135 countries. It was extracted from the “Doing Business” dataset, a dataset collected by the World Bank based on expert opinions in each country. The variable

gnipc

measures GNI per capita in thousand $. The variable

daysopen

measures the average number of days needed to open a business in that country, and

daysenforce

measures the average number of days needed to enforce a given type of contract.

(

i

) Find

the average GNI per capita and the average number of days to open a business, and the average number of days to enforce a contract.Slide3

Answer

to question (i)

Stata command: use

busind,clear

s

u daysenforce daysopen gnipc(ii) In how many countries does it take on average less than 5 days to open a business? What is the maximum number of days to open a business in the dataset? In which countries does it take more than 200 days to open a business? Slide4

Answer

to Question

(ii)

Stata command:

s

u daysopen if daysopen<=5list country if daysopen>=200Slide5

Question (iii)

Estimate

the following

simple

regression

model:Give a careful interpretation of estimates b1 and b0. Are the signs what you expected them to be? Slide6

Answer

to Question (iii)

Stata commands:

r

eg

gnipc daysopenSlide7

Question (iv)

Question:

What kind

of

factors

are contained in u? Are these likely to be correlated with the number of days to open a business?Answer

:

Factors contained in u are factors that explain the GNI par capita apart from the number of days to open a business. You might be conscious that there are many other factors, such as economic institutions, education, savings, consumption, R&D… Some factors are likely to be correlated with the number of days to open a business, such as the qu

a

lity of economic institutions.Slide8

Question (v)

Question:

What is according to this model the predicted income for a country where it takes 5 days to open a business? And the predicted income for a country where it takes 200 days to open a business? Show how you can calculate the answers by hand (once you have obtained the estimation results). Do the obtained levels of income seem reasonable? Explain.

.

Slide9

Answer

to Question (v)

You can compute predicted values for the dependent variable in two ways: by “displaying” when daysopen=5 and daysopen=200

Stata

commands

:display _b[daysopen]*5+_b[_cons]10.894018display _b[daysopen

]*200+_b[

_cons

]

-7.5347099Slide10

Answer

to question (v)

or by generating the fitted value of the dependent variable :reg

gnipc

daysopenpredict gnipc_hat A problem arises with this second method as there is no observation with daysopen=200, so that it is impossible to get the value of gnipc_hat for daysopen=200.Slide11

To illustrate our fitted values, we can draw the OLS regression line:

scatter

gnipc

daysopen

||lfit gnipc daysopenSlide12

Question (vi)

Estimate

the following simple regression model and give a careful interpretation of b1

.Slide13

Answer

to Question

(vi)

Stata command:

reg

gnipc daysenforceSlide14

Question (viii)Slide15

Question (vii)

Comparing the estimates of the models in (iii) and (v), which one explains more of the variation in income per capita across countries. Can you infer whether the duration to open a business or the duration for enforcing contracts is more strongly correlated with income per capita?

Answer: How much of the variation of GNI per capita (y) is explained by an independent variable is given by the R2

. The greater the R

2

, the more variation of y is explained by x. The R2 of the regression of GNI per capita on the number of days to open a business is about 13% and the R2 of the regression of GNI per capita on the number of days to enforce a contract 21%. That means that this variable explains more of the variation of the gni per capita than the former. It means that the duration for enforcing contract is more strongly correlated with income per capita than the number of days to open a business. Here, the correlation between gnipc and daysenforce is equal to -0.46 and the correlation between gnipc and daysopen is equal to -0.36.Slide16

Question (viii)

Estimate

the following simple

regression

model

and give a careful interpretation of b1.Slide17

Answer

to Question (viii)

Stata commands:gen

lngnipc

=ln(

gnipc)reg lngnipc daysopenSlide18

Do

these results allow you to draw conclusions regarding the desirability of policies aimed at reducing the number of days for opening a business in certain developing countries

?The dataset contains 135 countries, and hence does not contain information about all the countries in the world. Do you think one should account for that when interpreting the regression results. Why? Slide19

Task

2 last

year- Computer exerciseThe dataset nepalind.dta contains data from 706 children of 15 years old in Nepal. The data come from the 2003 Nepal Living Standard Survey (NLSS) Living Standard Measurement Survey (LSMS). We want to analyze this data to understand the number of years of education. Illiteracy and low levels of education are a major concern in Nepal, so it would be good to know which type of factors could be explaining education of the present generation, to know what type of policies to implement. The dataset has some information on household characteristics and characteristics of the child, and of the household head.

The

NLSS is a LSMS-type survey, which are country-wide representative surveys that statistical offices in developing countries conduct with the support of the World Bank to determine poverty levels, determinants of poverty, etc. See

www.worldbank.org/lsms for more info. Slide20

Question 1

Write

a paragraph describing the dataset using the standard descriptive statistics (also called summary statistics, or “D-stats”). Add a table with the d-stats.Slide21
Slide22

Question (1)

Child characteristics

 

Male (%)

52

Health status (%)

Good

69.5

Fair

30

Poor

0.5

Years of education

5.5 (3.6)Slide23

Question (1)

Household characteristics

 

Number of household members

6.8 (2.73)

under 18 years old

3.5 (1.74)

between 18 and 59

3.0 (1.49)

60 or older

0.3 (0.63)

Age of the head

46 (10.5)

Education of the head

2.8 (4.1)

Land owned (in ha)

0.74 (1.05)

Value of jewelries (in rupees)

13985 (26726)

Distance to school (in hours)

0.29 (0.31)

Number of observations

706

Standard errors into parenthesisSlide24

Question 2:

Show the distribution of the different values of years of education in the dataset. Drop the variables that have values higher than 10. Explain why that might be a smart thing to do, before doing any regression analysis.

. hist educ,discrete

(start=0, width=1)Slide25

Question (3):

Specify a model that allows explaining the number of years of education as a function of father’s age, the number of active adults (between 18 and 60 years old) and the number of elderly (60 or older) and all other variables you think are interesting and appropriate.

Make sure only to include variables that are exogenous and discuss why the variables you include can be considered exogenous. Estimate the model and give a careful interpretation of each of the coefficients (sign, size, and significance!). Do you find any of

your

results counterintuitive?Slide26

Tips to

answer

question (3)Each variable that

you

add into the model must be related to educ in

some

way

, and

should

not

violate

the

ZCM

assumption

=>

they

must

be

exogenous

=>

ask

yourself

:

x

caused

by

y?

i.e.

possibility

of reverse

causality

?

One

third

factor determines

both

x and y? in

this

case

correlation

is

not

causation

, and x

is

not

exogenous

.

u and x

related

for

some

other

reason

?

Gender

?

Head´s

age

? Nb of active

adults

?

Number

of

elderly

?

Head´s

education

?

Land

owned

?

distance

to

school

?

Value

jewelry

? Nb of

children

?

Health

?Slide27

A reasonable model to estimate:

Expected

signs of coefficients? Argue.Slide28

Question

4:

What is the minimum significance level at which one can reject that hypothesis that age of the household head does not affect education levels?The p-value gives the smallest significant level at which an hypothesis H0 can be rejected. In other words, a low p-value indicates that the tested hypothesis is unlikely. The minimum significance level at which one can reject the hypothesis that the age of the household head does not affect education levels is given by the p-value of the test β

1

=0. Then, one can directly read on the

stata output that this minimum significance level is 1.4%.Slide29

Question (5)

Do

your results allow you to conclude that the effects of the number of active adults in the household is different than the effect of elderly? State the null hypothesis and the alternative hypothesis you are testing, and the significance level you are considering. Does your answer differ depending on which significance level you consider?Slide30

Answer

to Question (5)

Need to test null hypothesis

:

H

0: β3=β4 against H1:β3≠β4 You just need command "test".Slide31

Question (6)

Test whether the characteristics of the household head are jointly significant. Show how to do this in

stata, and calculate the test by hand in 2 different ways. What can you conclude about the role of household head characteristics on education of the children?Slide32

Answer

to Question

(6)Slide33

Question 6: compute F-test

Run

the unrestricted and restricted models, and compute either

SSR or R2

form

of the F-statistic.reg educ head_age head_educ nractad nrold r2_supown distschool male scalar r2_ur=e(r2) scalar df=e(df_r) reg educ nractad nrold r2_supown distschool male scalar r2_r=e(r2)Slide34

Question (9)

Child characteristics

Non missing

Missing

Male (%)

53

48

Health status (%)

Good

69.5

74

Fair

30

26

Poor

0.5

.

Years of education

5.3

6.8Slide35

Question (9)

Household characteristics

Non missing

Missing

Number of household members

6.9

6.1

under 18 years old

3.5

2.8

between 18 and 59

3.0

2.9

60 or older

0.3

0.4

Age of the head

46.5

48.2

Education of the head

2.6

5.5

Land owned (in ha)

0.77

0.29

Value of jewelries (in rupees)

12212

35488

Distance to school (in hours)

0.29

.

Number of observations

600

46