Hao Zhou amp David Dueber February 6 2017 Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Outline Fake real life research scenario Significance testing and statistical errors ID: 912503
Download Presentation The PPT/PDF document "Power Analysis with G*Power and Optimal ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Power Analysis with G*Power and Optimal DesignHao Zhou & David DueberFebruary 6, 2017
Applied Psychometric Strategies Lab
Applied Quantitative and Psychometric Series
Slide2OutlineFake real life research scenarioSignificance testing and statistical errorsPower and power analysisExamplesIndependent samples t testChi-square test of independence
HLM
Slide3Let’s design a study…RQ: Do New Yorkers and Kansans spend the same amount of money per month on movies?Sample 50 people from NY and 50 people from KSUse an independent sample t testH
0 = no differenceH
A = difference
Slide4The problem: non-significant resultsNY:
KS:
Fail to reject the null hypothesis!There are two possible explanations:
No mean difference in movie expenses
The sample size was not large enough to detect a true effect (the significance test was insufficiently sensitive)
Many statistics computed in the process of significance testing are analogous to a signal to noise ratiot test: t
= (mean difference)/SEANOVA: F = MSB/MSWRegression coefficient: t = b / SE
A small standard error yields a large test statistic, and thereby a significant result. Yay!Standard errors are generally proportional to 1/Doubling a test statistic requires FOUR times as many participants How does sample size influence significance testing?5
Slide6How can I determine an appropriate sample size?
Power Analysis
Slide7What is the practical utility of performing power analysis?Do not spend excess time and money on participants that you do not needHave a good chance of detecting the effect you are looking forSatisfy the requirements of a funding agency
Slide8Key Concepts: Significance TestingIn significance testing, a p-value is computed for a test statistic:
What errors can occur in significance testing?A
Type I error is the rejection of true null hypothesis (false positive)The alpha value (usually .05) sets the risk of Type I error
Alpha () is the asymptotic probability that the null hypothesis will be rejected if the null hypothesis is trueA Type II error is the retention of a false null hypothesis (false negative)The Type II error rate () is related to nominal alpha level (), effect size, and sample size (N).
Slide10Types of Statistical Errors
Type
s of Statistical ErrorsNull Hypothesis (H0) isTrueFalseStatistical DecisionReject H0Type I Error(α)Correct Decision(1-β
)
Fail to Reject
H
0
Correct Decision
(1-
α
)
Type
II Error
(
β
)
Slide11Type I Error:α =
=
=
Type II Error:
β
=
=
Cautions:
If the null hypothesis is assumed true, then we can
choose
the theoretical Type I error rate
If we assume that a specific
H
1
is true (mean and standard deviation), we can
compute
the theoretical Type II error rate
But, we have no idea what the true population mean is!
11
Statistical Errors
Slide12Null
Distribution
AlternateDistribution
Critical value 1.96
Slide13What is power?Power is the probability that the null hypothesis is rejected given that the alternate hypothesis is true.
=1 –
=
1 –
β
In English:
Power is the likelihood that you detect the effect you are looking for!
Power = 0.80 is a commonly used benchmark
Yes, ~20% of rigorously designed studies fail to detect a true effect
13
Slide14How Alpha, Beta and Effect size are related
http
://rpsychologist.com/d3/NHST14
Slide1515
Slide16The power of a z test: SettingRQ: Does being involved in sports have an effect on weight for 8 year old boys?In the population of all 8 year old boys
= 56
pounds with a = 3 poundsTo compute power, we need to know the true effect of sports involvementIn the population of 8 year old boys involved in sports = 55.3 pounds with = 3 poundsSuppose we randomly sampled from boys involved in sports, but the significance test is a comparison to the full population
Slide17The power of a z test: CalculationSuppose we have a random sample of 30 boys involved in sports. What is the probability of rejecting the null hypothesis that the mean weight of boys involved in sports is 56 pounds?The null hypothesis is rejected if
p < .05, corresponding to |z| > 1.96
The standard error is Therefore, we will reject if or We are sampling from a population with a mean of 55.3 and a standard deviation of 3 (so the standard error is .548) or
corresponds to z < -0.676 or z > 3.23
P(
z < -0.676 or z >
3.23) =
.249
(that’s the power!)
Steps for conducting a power analysis
Software
Step 1:
Step 2:
Step 3:
Step 4:
Statistical Test
α
and Power
Effect Size
Auxiliary
Information
Slide19Step 2: Alpha = .05, Easy-Peasy! [NO!]
α =
.05 is typicalTypically, power = .80, but it is your choice
α
and Power
Slide20Step 3: Effect SizesEffect sizes are standardized versions of the statistics used in significance testing and can be compared across studies
The effect size we use in a power analysis can be derived from a pilot study, from related literature, practical significance, or in desperation, from professional judgment
Effect Sizes
Test
Test Statistic
Effect
Size
Mean Difference
Cohen’s
d
Independence
χ
2
Correlation
r
r
Regression
R
2
ANOVA
or
Test
Test Statistic
Effect
Size
Mean Difference
Cohen’s
d
Independence
χ
2
Correlation
r
r
Regression
R
2
ANOVA
Slide21Effect size cutoffsNO!But if you really have no clue, then look up those tables from Cohen and interpret them as follows:
Effect Size
MeaningSmallI expect this effect to be hard to detectMediumNeither small nor large.LargeI expect this effect to be easy to detect
Slide22Some types of analyses need more information to be able to calculate powerDesign considerations such as allocation ratio for independent samples t testIntraclass correlation coefficient (ICC) for multilevel data
Advanced power analysis can account for much of the messiness in researchExtent of missing
data, outliers, invalid responses, low reliability,
Auxiliary Information
Slide23For exact, F, t, z, and χ2 tests, use G*Power
Only for single-level data (no nesting or clustering)
Assumes perfect measurement, no missing data, etc.For multi-level tests, use Optimal DesignAssumes perfect measurement, no missing data, etc.For complicated designs, and to account for imperfect measurement, missing data, use simulationsE.g., Mplus, SAS, R
Software
Slide24Back to Movie ExpensesRQ: Do New Yorkers and Kansans spend the same amount of money per month on movies?
Independent samples
t test
=.05
Power = .80
????
Allocation Ratio = 1
Step 1:
Step 2:
Step 3:
Step 4:
Statistical Test
α
and Power
Effect Size
Auxiliary
Information
Slide25Estimating Effect SizeBased on data from Bureau of Labor Statistics (BLS)
differences in average salary between the two states indicate that we should expect a difference of $4.21 in monthly movie
expensesThe standard deviation of money spent on movie tickets per month is $18.21G*Power can convert this information to an effect size for us
Slide26Slide27Results
Using equal sample sizes is always vastly more efficient that unequal sample sizes
This is the sample size we need
Slide28Warning: That calculated sample size assumes a lot of things!All of the assumptions of the independent t-test must be met exactly for that sample size to be appropriate
Money spent per month must be continuous (
)Money spent must be approximately normally distributed ()Independence of observations ()No influential outliers (maybe?)Homogeneity of Variance (maybe? but t tests with equal sample sizes are largely robust to this violation )
Slide29Slide30What if it’s easier to recruit from New York than Kansas?
Total of 74 more people!
The size of the Kansas group can not decrease *too* much lest the red distribution be very wideTwice as many NewYorkers as Kansans
Slide31Example 2: chi-square for contingency table (2x2)RQ: Does a generic flu vaccine work better than a placebo?
Chi-square test
= .05
Power = .80
????
None
Step 1:
Step 2:
Step 3:
Step 4:
Statistical Test
α
and Power
Effect Size
Auxiliary
Information
Slide32What is the end data going to look like (sample data)?
No Flu
FluTotalPlacebo23565300Vaccine28515300Total
520
80
600
Slide33Estimating Effect SizePrior research with the name-brandvaccine showed that 14% of peoplereceiving the vaccine and 21% of people not receiving the vaccine
contracted the flu
Estimated effect size of 0.202 from prior research33
Slide34ResultsTotal number of people, split into the two groups (placebo, vaccine) equally
The test for independence is basically a multi-group goodness-of-fit
(GOF) test. 34
Slide35Example 3: Growth ModelRQ: Does an alternate curriculum affect the growth rate of scores on the mathematics part of the MAP test among elementary students?
HLM
= .05
Power = .80
.
50
ICC = ??
Cluster Size = 25
Step 1:
Step 2:
Step 3:
Step 4:
Statistical Test
α
and Power
Effect Size
Auxiliary
Information
Slide36Slide37Warnings about power analysisOften provide a “best case”More sophisticated techniques can account for things like missing data, non-normality, etcCan perform sensitivity analyses based on the range of possible values rather than just a single oneNot generalizable
Slide38Software for conducting power analysisG*Power – extensive within the observed, single level framework when nothing bad ever happensOptimal Design - extensive with the observed, multi-level framework (including repeated measures and longitudinal designs) when nothing bad ever happens
Mplus (and R and, to a lesser extent, SAS and other SEM programs) – has capabilities to conduct power analysis via simulation study VERY flexibly
Slide39Danger, Danger Will Robinson!When testing multiple effects at once, alpha = .05 is inappropriateBonferroni (super conservative!)Adjustments based on outcome correlationsSequential gatekeepingSelective alpha weighting
Combinatorial outcomes (e.g. first MANOVA, then post-hoc testing)
Slide40References and LinksPower Animation: http://rpsychologist.com/d3/NHST/Power analysis seminars from UCLA: https://stats.idre.ucla.edu/other/mult-pkg/seminars/intro-power
/G*Power:
http://www.gpower.hhu.de/en.htmlOptimal Design: http://hlmsoft.net/od/Check out the User’s Guides for many examples