/
Chapter 13:  Effect Sizes and Power Chapter 13:  Effect Sizes and Power

Chapter 13: Effect Sizes and Power - PowerPoint Presentation

freya
freya . @freya
Follow
65 views
Uploaded On 2023-11-22

Chapter 13: Effect Sizes and Power - PPT Presentation

statistically significant does not mean important IQs of UW undergraduates Suppose we measured the IQs of 10000 UW undergraduates and found a mean IQ of 1003 If we were to conduct a onetailed ztest to determine if this mean is ID: 1034290

effect power test size power effect size test type error scores standard hypothesis null rejection area reject region sample

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Chapter 13: Effect Sizes and Power" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Chapter 13: Effect Sizes and Power‘statistically significant’ does not mean ‘important’ IQ’s of UW undergraduatesSuppose we measured the IQ’s of 10,000 UW undergraduates and found a mean IQ of 100.3. If we were to conduct a one-tailed z-test to determine if this mean is greater than the US population that has a mean of 100 and a standard deviation of 15. Use a = .05-3-2-10123zWe’d find that we could reject H0 with a=.05.But is a difference of 0.3 IQ points important? area = a = .05z=2

2. If you want to read a lot about statistically significant effects that have small effect sizes…

3. Some journals require the authors to report the ‘effect size’, along with the outcomes of statistical tests to let the reader interpret whether the effect is ‘big’ enough to be important. Remember, to calculate t, we divide by the standard error of the mean:But the standard error of the mean shrinks with increasing n. We need a measure of the size of the difference between our observation and the null hypothesis that doesn’t depend on experimental parameters like n.

4. For t-tests we’ll use Cohen’s d: as a measure of effect sizeWhere mhyp is the mean for the null hypothesis. This is just like calculating t for a t-test except we divide by sX instead of the standard error of the mean  Effect size: the difference between our observation and the null hypothesis in terms of standard deviations. Formally: effect size is “an estimate of the degree to which the treatment effect is present in the population, expressed as a number free of the original measurement unit”.

5. Back to our made-up IQ example where we had a mean of 100.3 and a standard deviation of 15The effect size is:The study found that UW IQ’s are only 0.02 standard deviations above 100. This is a small effect size, even though it is statistically significant.

6. Reporting effect size has the advantage that since it doesn’t depend on n, the value is more easily compared across studies. A conventional interpretation of effect size is that (in absolute value):0.8 is large,0.5 is medium0.2 is small.0.80.50.2

7. There are two unavoidable types of errors in hypothesis testing: type I and type II errors.A Type I error is when we reject H0 when it is actually true. Pr(Type I error) = a A Type II error is when we fail to reject H0 even though it false. Pr(Type II error) = b More commonly, we talk about the probability of correctly rejecting H0, The probability of this happening is called power: Power = Pr(correct rejection of HO) = 1-b.True state of the worldDecision based on your sampleType I Error (a)Type II Error (b)Correctly reject H0 (1-b = power)Correctly fail to reject HO (1-a)Fail to reject HOReject HOHO is trueHO is false

8. HO is trueHO is falseTrue state of the worldDecision based on your sampleFail to reject HOReject HOCorrectly fail to reject HO (1-a)Type II Error (b)Type I Error (a)Correctly reject H0 (1-b = power)

9. Alpha (a) is therefore the probability that a Type I error will occur.Type I errors (a)A Type I error occurs when our statistic (z or t) falls within the region or rejection even though the null hypothesis is true. For example, for a one-tailed z-test using a = .05, the distribution of z scores and the rejection regions look like this:-4-3-2-101234z scorePr(Type I error) = a

10. At Type II error happens when the null hypothesis is false but you fail to reject it anyway. To calculate the probability of a type II error, we need to know the true distribution of the population. This is weird because the true distribution of the population is the thing we’re trying to figure out in the first place. Type II errors

11. Type II errors happen only if the null hypothesis is false.For example, suppose we’re conducting a one-tailed z-test with a = .05, and the true population mean has a mean z score of 1 (mtrue = 1). We still use the same critical value that we did under the null hypothesis. But now the distribution of z-values is centered around z=1. The blue shaded region is the probability of correctly rejecting the null hypothesis. Type II errors happen when z falls outside the rejection region, so the probability of making a Type II error is 1- blue shaded area.Type II errors: beta (b) and power (1-b)mhyp = 0mtrue = 1Zcrit = 1.645-3-2-10123451-b = power (blue shaded area)z-scorea = Pr(type I error) (red shaded area)

12. Calculating power, the probability of correctly rejecting HO1) Find the rejection region under the null hypothesis: With a = .05, zcrit = 1.645 (Table A, column C), so the rejection region is z>1.6452) The new rejection region will by shifted down by utrue – uhyp = 1 1.645-1 = 0.645, so the new rejection region is z>0.6453) Find the area in the new rejection region The power is the area for z above 0.645 is .2611 (Table A, Column C)Type II errors: beta (b)mhyp = 0mtrue = 1Zcrit = 1.645-3-2-10123451-b = power (blue shaded area)z-scorea (red shaded area)

13. power = 1-bPower is the probability of correctly rejecting the null hypothesis, which is the area in the rejection region. Power in this example is: Pr(z>0.645) = 1-b = .2611 More power is good. Power is the probability of correctly finding an effect in your experiment.A ‘desirable’ level of power is .8mhyp = 0mtrue = 1Zcrit = 1.645-3-2-10123451-b = power (blue shaded area)z-scorea (red shaded area)

14. Example: IQs are normally distributed with a mean of 100 and a standard deviation of 15. Suppose you sampled 100 students and calculated a sample mean and are about to test for a significant increase in IQ using a one-tailed z-test using a=.05. What is the power of this test under the assumption that the true population mean for the group that we’re sampling is 103? Answer: First, we’ll convert everything to z-scores. This makes mhyp = 0 (always), and

15. -4-3-2-101234z-scoreTo calculate power:1) Find the critical value of t under null hypothesis: With a = .05, zcrit = 1.64 (Table A, column C), so the rejection region is z > 1.642) The new rejection region will by shifted over by utrue – uhyp = 2-0 = 2 z > 1.64-2, which is z `> -.363) Find 1-b, the area in the new rejection region Pr(z > -.36) = .6406A power of .6406 means that there is a 64.06% chance of correctly rejecting the null hypothesis (or not making a type II error). power = 1-b = .6406

16. Things that affect power: Variability of the measurePower increases as the standard error of the mean decreases. Ways to decrease the standard error of the mean:Increase the sample size (increase n)Make more accurate measurements (decrease s)a=.05mtrue = 1.0

17. Things that affect power: level of significance (a)-3-2-1012345z scorepower =0.0183-3-2-1012345z scorepower =0.1685-3-2-1012345z scorepower =0.2595-3-2-1012345z scorepower =0.0924a=.05a=.025a=.01a=.001a=?mtrue = 1.0Power decreases as alpha (a) decreases.This is a classic tradeoff: The less willing we are to make a Type I error, the more likely we are going to make a Type II error.

18. -3-2-1012345z scorepower =0.0815-3-2-1012345z scorepower =0.2595-3-2-1012345z scorepower =0.1261-3-2-1012345678z scorepower =0.9907Things that affect power: difference between utrue and uhyp Power increases with effect size: as the difference between means for the true population and the null hypothesis increases.mtrue = 0.25mtrue = 0.5mtrue = 1.0mtrue = 4.0We don’t have control over this: mtrue is the one thing we don’t know (but want to estimate).a=.05mtrue = ?

19. 00.20.40.60.8100.10.20.30.40.50.60.70.80.91Effect size (d)Power Sample size = 50Power curve: shows how power increases with effect sizeTwo-tail a=.05

20.

21.

22.

23.

24. Example: Suppose we’re conducting a two-tailed t-test with one mean with a = .05 with a sample size of n=50. How much of an effect size do we need to obtain a power value of 0.8?Answer: Looking at the appropriate family of power curves, the curve with n=50 passes through a power value of 0.8 when the effect size is 0.4.Example: Suppose we’re conducting a one-tailed t-test with one mean with a = .01 and we have an effect size of 0.6. How large of a sample size do we need to get a power value of 0.8?Answer: Looking at the appropriate family of power curves, looking at a power value of 0.4, the curve with n=30 passes through a power value of 0.8.

25. Example: You decide to sample the test scores of 63 dazzling cats from a population and obtain a mean test scores of 25.6 and a standard deviation of 2.77. Using an alpha value of α = 0.01, is this observed mean significantly different than an expected test scores of 25? What is the effect size? What is the power?

26. Example: You decide to sample the test scores of 63 dazzling cats from a population and obtain a mean test scores of 25.6 and a standard deviation of 2.77. Using an alpha value of α = 0.01, is this observed mean significantly different than an expected test scores of 25? What is the effect size? What is the power? Answer: (Two tailed t-test for one mean) We fail to reject H0 (t(62) = 1.72, tcrit = ±2.6575). The test scores of dazzling cats is not significantly different than 25. Effect size: 0.2166 Power = 0.1759

27. Example: Suppose we’re conducting a two-tailed t-test with a = .05 with a sample size of n=50. How much of an effect size do we need to obtain a power value of 0.8?Answer: Looking at the appropriate family of power curves, the curve with n=50 passes through a power value of 0.8 when the effect size is 0.4.Example: Suppose we’re conducting a one-tailed t-test with a = .01 and we have an effect size of 0.6. How large of a sample size do we need to get a power value of 0.8?Answer: Looking at the appropriate family of power curves, looking at a power value of 0.4, the curve with n=30 passes through a power value of 0.8.

28. Example: You decide to sample the test scores of 63 dazzling cats from a population and obtain a mean test scores of 25.6 and a standard deviation of 2.77. Using an alpha value of α = 0.01, is this observed mean significantly different than an expected test scores of 25? What is the effect size? What is the power?

29. Example) You decide to sample the test scores of 63 dazzling cats from a population and obtain a mean test scores of 25.6 and a standard deviation of 2.77. Using an alpha value of α = 0.01, is this observed mean significantly different than an expected test scores of 25? What is the effect size? What is the power? Answer) (Two tailed t-test for one mean) We fail to reject H0 (t(62) = 1.72, tcrit = ±2.6575). The test scores of dazzling cats is not significantly different than 25. Effect size: 0.2166 Power = 0.1759