Recitation 11 3262013 TA Zhen Zhang zhangz19sttmsuedu Office hour C500 WH 34 PM Tuesday office tel 4323342 Helproom A102 WH 900AM100PM Monday Class meet on Tuesday ID: 788713
Download The PPT/PDF document "STT 200 – Lecture 5, section 23,24" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
STT 200 – Lecture 5, section 23,24Recitation 11(3/26/2013)
TA: Zhen Zhangzhangz19@stt.msu.eduOffice hour: (C500 WH) 3-4 PM Tuesday(office tel.: 432-3342)Help-room: (A102 WH) 9:00AM-1:00PM, MondayClass meet on Tuesday: 12:40 – 1:30PM A224 WH, Section 231:50 – 2:40PM A234 WH, Section 24
1
Slide2Main GoalsUnderstand the sampling distribution of sample proportion .
The normal model , where is the population proportion, and is the sample size. 2
Slide3Data3Here are data from a population of 400 people, indicating whether they do ("Yes") or don't ("No") have wireless internet service at home. Please copy the following chunk and paste in R.
haswi <- c("Yes","Yes","Yes","No","Yes","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes", "No","Yes","No","No","Yes","No","No","No","No","No","Yes","Yes","Yes","No","Yes","Yes","Yes","No","No","No","Yes","Yes","No","Yes","Yes","Yes","Yes","No","No","No","No","No","No","Yes","Yes","No","Yes","No","No","Yes","No","No","No","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","No","Yes","No","No","Yes","No","No","No","No","No","Yes","Yes","No","Yes","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","Yes","Yes","No","No","No","No","Yes","No","Yes","No","Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes","No","No","Yes","No","Yes","Yes","Yes","No","No","Yes","No","Yes","Yes","Yes","No","Yes","No","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","Yes","No","No","No","Yes","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","No","Yes","Yes","No","Yes","Yes","Yes","No","No","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","No","Yes","No","No","Yes","No","Yes","Yes","No","Yes","Yes","No","No","Yes","No","Yes","Yes","Yes","No","Yes","Yes","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","Yes","No","No","No","Yes","Yes","Yes","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","No","Yes","No","No","Yes","Yes","No","Yes","Yes","No","Yes","Yes","No","Yes","No","No","Yes","Yes","No","Yes","No","Yes","No","Yes","No","Yes","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","Yes","No","Yes","Yes","No","No","Yes","Yes","Yes","No","Yes","No","No","Yes","No","Yes","Yes","Yes","No","No","Yes","No","Yes","No","Yes","Yes","Yes","No","No","No","Yes","No","No","No","Yes","Yes","Yes","No","Yes","No","Yes","No","No","Yes","Yes","Yes","No","Yes","No","Yes","No","No","Yes","Yes","Yes","No","No","No","No","Yes","No","No","No","No","No","Yes","No","Yes","Yes","No","Yes","Yes","Yes","Yes","No","No","No","Yes","No","No","Yes","Yes","No","Yes","Yes","Yes","No","Yes","No","No","No","No","Yes","No","No","No","Yes","Yes","Yes","Yes","Yes","No","Yes","Yes","No","Yes","Yes","No","Yes","No","Yes","No","No","No","Yes","No","No","Yes","No")
Slide4Data4Here is a table of integers between 1 and 400 chosen at random. R chuck:rd <- c(92,149,41,310,307,130,296,130,77,399,212,301,25,177,313,147,298,160,354,20,199
, 191,104,164,216,399,25,99,28,91,211,357,350,301,39,372,61,67,304,333,174,321,191,157,316,172,5,277,78,396,208,126,162,311,17,287,138,160,124,266,177,209,361,41,398,9,79,299,257,315,40,278,2,225,206,383,254,74,335,159,37,360,9,393,143,246,305,152,90,312,208,172,117,277,93,399,226,8,231,386,136,75,38,56,37,267,381,63,52,231,287,94,50,77,179,337,387,318,112,219,17,356,77,183,259,258,141,198,30,36,61,306,65,330,161,348,19,20,61,275,365,241,115,4,338,205,108,241,190,374,323,243,146,318,217,375,267,44,373,185,341,283,200,178,266,390,232,263,386,36,270,50,315,83,90,281,260,41,305,136,116,185,25,338,4,367,296,183,103,290,208,170,143,158,198,132,155,144,26,104,281,150,240,68,67,339,389,345,141,268,349,99,147,65,170,375,317,251,185,278,80,250,4,378,175,130,359,319,400,59,166,147,130,107,123,304,234,41,20,165,96,115,272,149,142,75,262,235,106,107,354,362,2,81,89,309,371,10,282,202,203,156,386,130,252,26,387,143,237,183,328,306,27,187,310,321,183,109,198,200,281,70,394,378,203,42,34,318,156,255,354,53,196,20,382,97,292,188,179,69,151,14,348,311,389,298,399,104,300,243,163,316,328,65,167,200,301,305,27,176,69,301,188,192,242,350,92,86,42,373,195,118,64,289,329,131,156,252,169,299,191,302,19,83,220,326,229,285,267,351,333,101,128,146,307,304,245,264,149,163,353,276,296,243,8,127,31,210,263,33,384,176,125,275,76,45,60,59,143,324,281,376,298,54,62,170,295,293,27,183,126,375,21,294,242,364,145,138,52,267,26,308,391,352,78,98,211,174,277,176,74,295,64,315,171,135,159,111,79,348,88,23,348,111,188,16,152,212,104,349,14,272,209,73,238,146,50,113,103,204,389,158,260,344,207,329,184,250,38,231,292,300,34,170,343,233,275,14,15,244,104,96,234,297,113,270,369,202,37,310,294,64,183,253,299,287,225,166,260,125,198,2,180,219,117,358,191,301,310,254,230,296,2,134,67,186,265,161,130,257,166,339,33,332,137,61,340,16,212,209,42,315,8,269,68,389,316,355,62,51,64,388,260,319,244,116,265,169,153,147,170,59,329,261,384,272,367,177,217,278,266,307,182,225,80,264,342,280,350,366,280,156,323,208,110,37,266,260,59,33,314,80,185,185,87,228,246,61,369,60,119,179,326,223,128,62,98,130,283,328,225,398,3,138,140,84,381,234,131,364,294,59,343,126,93,14,204,50,35,161,15,142,275,72,254,194,309,115,344,378,267,23,111,168,334,92,213,1,181,246,336,52,82,4,115,286,3,87,121,84,281,181,58,372,232,30,279,258,154,37,6,113,125,317,123,198,25,388,268,106)
Slide5Problems5Use the following procedure to choose 25 people at random from the population on the first page. For each person, record whether he does or does not have wireless internet service.
(a) Choose a starting point in the table of integers between 1 and 400 by closing your eyes and pointing at the table.(b) Starting there, use the next 25 numbers to choose your sample of size 25. (There's a small chance that you'll pick the same person twice, but we'll not worry about that.) Record the 25 yeses and nos here:What proportion of the 25 people in your sample said yes? This is , the sample proportion.The population proportion who have wireless internet service is . How far is your estimate from the true value ?
Slide6Simulation6Suppose we have many students who draw a sample with size 25 and get a sample proportion , We plot the histogram of
’s obtained by all students, and impose the density of on it.
Slide7Problems7Comment: is (before the data are collected) a random variable, and that we’ll use what we know about its distribution to try to quantify how confident we are in its estimation of
.Now we'll investigate more generally. First, using the facts that: the mean of is , and that the standard deviation of is , compute the mean and standard deviation of in our case when population proportion and sample size
.
standard deviation of
.
Problems8Next use the fact that a normal model is a good model for the distribution of to compute the probability that
is within of the actual value of . , thus the probability:
The z-score for
under the normal model above is
with the area below is
.
Similarly, the z-score for
under the normal model above is
with the area below
is
. So the area in-between is
.
Or:
normcdf
(
) or
normcdf(
) in a calculator, or
pnorm
(1.007) –
pnorm
(-1.007) in R.
Problems9Repeat the two questions above, but this time with . mean will again be
but standard deviation of , smaller! And
Slide10Appendix10R codes for the problems.# prob
4:n <- 25; p <- 0.5575( sdphat <- sqrt(p*(1-p)/n) )# prob 5:( pnorm(p+0.1, p, sdphat) - pnorm(p-0.1, p, sdphat) )# prob 6:n2 <- 100( sdphat2 <- sqrt
(p*(1-p)/n2) )
(
pnorm
(p+0.1, p, sdphat2) -
pnorm
(p-0.1, p, sdphat2)
)
# comparison of n=25 and n=100
vec
<-
seq
(0.01,0.99, length=1000)
par(
yaxt
='
n',mar
=c(4,.3,.3,.3
))
plot(
dnorm
(
vec
, p, sdphat2)~
vec
, type='n',
ylab
=' ',xlab=expression(hat(p)))grid(col='gray80')lines(dnorm(vec, p, sdphat)~vec, lty=1, lwd=2)lines(dnorm(vec, p, sdphat2)~vec, lty=2, lwd=2)abline(v=p, col='red', lty=2)text(x=p,y=0,labels=paste("p =",round(p,4)),col='red')legend('topleft', legend=c(paste('N(',round(p,4),', ',round(sdphat,4),'), n=25
',sep
=''), paste
('N(',round(p,4),', ',round(sdphat2,4),'),
n=100
',sep
='')),
bg
='gray90', inset=.02,
lty
=c(1,2),
lwd
=c(2,2))
Appendix(cont’d)11R codes for the simulations(N <- length(haswi))
(L <- length(rd))# prob 1:set.seed(20); n <- 25( mystart <- sample(1:L, size=1) )( myindex <- rd[mystart+c(1:n)] )( mysample <- haswi[myindex] )# prob 2:
(
myphat
<- sum(
mysample
=="Yes")/n
)
#
prob
3:
p <-
0.5575
(
p -
myphat
)
#
above is
for one students. For many students, we have
phats
set.seed
(241);
phats
<- numeric(
nstudents
<- 10000)for (t in 1:nstudents){ mystarts <- sample(1:L, size=1) myindexs <- rd[mystarts+c(1:n)] mysamples <- haswi[myindexs] phats[t] <- sum(mysamples=="Yes")/n}phats <- na.omit(phats)# prob 4:( sdphat <- sqrt(p*(1-p)/n)
)
hist
(
phats
, xlab=expression(hat(p)),
freq
=F, main
='')
vec
<-
seq
(min(
phats
), max(
phats
), length=1000); lines(
dnorm
(
vec
, p,
sdphat
)~
vec
)
Thank you.12