/
A few slides about sampling and hypothesis testing. A few slides about sampling and hypothesis testing.

A few slides about sampling and hypothesis testing. - PowerPoint Presentation

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
403 views
Uploaded On 2015-11-14

A few slides about sampling and hypothesis testing. - PPT Presentation

2013 Michael J Rosenfeld Draft date 1142013 The sample frame or sample universe is the data that our sample is drawn from In the case of the March 2000 CPS the sample universe includes all people residing in the US in March 2000 who were not living in institutional settings This sa ID: 193216

universe sample size hypothesis sample universe hypothesis size sampling data random representative fraction true samples hypotheses large convenience reject

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A few slides about sampling and hypothes..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A few slides about sampling and hypothesis testing.

©2013 Michael J. Rosenfeld

Draft date: 1/14/2013Slide2

The sample frame, or sample universe, is the data that our sample is drawn from. In the case of the March, 2000 CPS, the sample universe includes all people residing in the US in March, 2000, who were not living in institutional settings. This sample frame has N members.

The Sample Universe

Our sample

In theory, our sample is drawn from the sample universe. The simplest way to think about this is to think that we start with the sample universe, and we randomly select n cases from the sample universe for our sample. Generally we expect n, the size of our sample, to be much smaller than the sampling universe N, i.e. n<<N. In the case of the CPS, our sample is about 1/2000 as large as the sample universe. If our sample is randomly selected from the Sample Universe, that means that our sample is a representative sample (representative of the sample universe), and this means we can use our sample to answer hypotheses about the sample universe.

1. The sample universe, and our sample

Random samplingSlide3

The Sample Universe (size N)

Our random, representative sample, size n

2

. Hypothesis testing

Hypothesis 1

Data from our sample allow us to either accept or reject hypothesis 1 about the sample universe

Note: We make hypotheses about the sample universe, and we test those hypotheses with data from our sample. There is a lot we don’t know about the sample universe, since the data we have on hand (our sample) is only a small part of the much larger sample universe. We don’t make hypotheses about our sample because we already know all there is to know about our sample.Slide4

The Sample Universe (size N)

Our random, representative sample, size n

3. Our sample is one of many potential samples.

Other potential representative samples drawn from the same sample universe

Let’s say that Hypothesis 1 is that X=0. Think of X as some value in the sample universe that we cannot measure directly. In our data, x=b, and b≠0. One way we think about hypothesis testing is to ask this question: if hypothesis 1 were true, meaning X=0, how likely would we be to find the value of x as large as b in our sample? Or in other words, if Hypothesis 1 were true, what percentage of the random other samples would yield a value for x as large as b? If the answer is less than 5%, then we generally reject the Hypothesis. Evidence from our sample leads us to believe that the true (unmeasured) value of X in the sample universe is not equal to zero.

Hypothesis 1, X=0

In our data, x=b, and we decide this a value of x as large as b is unlikely if Hypothesis 1 were true, so we reject Hypothesis 1Slide5

The Sample Universe (size N)

Our random, representative sample, size n

4

. How sample size matters.

Hypothesis 1, X=0

In our data, x=b, and we decide this a value of x as large as b is unlikely if Hypothesis 1 were true, so we reject Hypothesis 1

Interestingly, and surprisingly to most students, the sample size we really care about is n. As long as N is much larger, that is as long as n<<N, then it doesn’t matter how big

N is.

The sample size of our sample, n, is what determines the standard errors of our means, and the power of our tests.

Remember:

And

And remember it is the small n, the n of our random sample that we are talking about here.Slide6

The Sample Universe (size N)

Our random, representative sample, size n

5

. Sampling fraction.

Random sample

The ratio of n/N is called the sampling fraction. Remember that

Well, there is actually what is called a finite sample correction, so that the true value of

Var

(avg(X)) is:

With the part in square brackets representing the finite sample correction. When the sampling fraction is small, the finite sample correction is basically 1, and you can ignore it. When the sampling fraction is large (let’s say you have the half the sample universe in your sample), then you are shrinking the variance of your averages, which makes sense, because the variance is measure of uncertainty, and if you have a substantial proportion of all the possible data in your hand, you have a lot less uncertainty about the sample universe. And when sampling fraction is 1, when you have the entire sampling universe in your hand, then the finite sample correction is zero, and the variance of the average is zero, which makes sense because there is no uncertainty left. When we are looking at a dataset of, let’s say, how 100 US senators voted on some bill, we can fit models to that data, but we cannot test statistical hypotheses about the models, because there is no larger sample universe that the 100 senators are drawn from. 100 senators are the entire sampling universe of US senators at any one time.Slide7

The Sample Universe (size N)

Our sample, size n

6. What about convenience samples?

Let’s say you want to study the attitudes of college students. So you create a survey, and you field this survey to your friends. Your friends are what is called a convenience sample- they are a subset of the sample universe of college students, but they are not a random subset. You cannot test hypotheses about the sample universe with a convenience sample

subset (at least not the way you would with a random sample). Convenience

sample are easy to get- they are convenient, but not

as useful as random, representative samples. Any time you are doing research, or reading someone else’s research, you should know:What is the sample universe?Is the sampling fraction substantially less than 1? If the sampling fraction is close to 1, then we may have good data but we do not need statistical tools to analyze the data.

If the sampling fraction is small, is our sample a random and representative sample? If so, then the standard statistical tools can be used. If our sample is not a representative sample, then hypothesis testing is not appropriate.There are times when a random representative sample is simply not obtainable, and convenience sample is all you can get. Then one needs to think about creative ways to use convenience sample. There are things you can do with a convenience sample. You can randomly subset the convenience sample, and do experiments. You can take your convenience sample and match it to another sample that is similar in most respects but different in a key respect. You can start with your convenience sample and ask them to introduce you to other people in the sampling universe that they know- this is known as snowball sampling or respondent driven sampling. Eventually, if you follow the network chain out far enough, you can end up with a representative sample of the

sample universe.