/
Power Analysis with G*Power and Optimal Design Power Analysis with G*Power and Optimal Design

Power Analysis with G*Power and Optimal Design - PowerPoint Presentation

willow
willow . @willow
Follow
345 views
Uploaded On 2022-05-31

Power Analysis with G*Power and Optimal Design - PPT Presentation

Hao Zhou amp David Dueber February 6 2017 Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Outline Fake real life research scenario Significance testing and statistical errors ID: 912503

effect power size test power effect test size step sample error hypothesis type null data significance statistical testing true

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Power Analysis with G*Power and Optimal ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Power Analysis with G*Power and Optimal DesignHao Zhou & David DueberFebruary 6, 2017

Applied Psychometric Strategies Lab

Applied Quantitative and Psychometric Series

Slide2

OutlineFake real life research scenarioSignificance testing and statistical errorsPower and power analysisExamplesIndependent samples t testChi-square test of independence

HLM

Slide3

Let’s design a study…RQ: Do New Yorkers and Kansans spend the same amount of money per month on movies?Sample 50 people from NY and 50 people from KSUse an independent sample t testH

0 = no differenceH

A = difference

Slide4

The problem: non-significant resultsNY:

KS:

Fail to reject the null hypothesis!There are two possible explanations:

No mean difference in movie expenses

The sample size was not large enough to detect a true effect (the significance test was insufficiently sensitive)

 

Slide5

Many statistics computed in the process of significance testing are analogous to a signal to noise ratiot test: t

= (mean difference)/SEANOVA: F = MSB/MSWRegression coefficient: t = b / SE

A small standard error yields a large test statistic, and thereby a significant result. Yay!Standard errors are generally proportional to 1/Doubling a test statistic requires FOUR times as many participants How does sample size influence significance testing?5

Slide6

How can I determine an appropriate sample size?

Power Analysis

Slide7

What is the practical utility of performing power analysis?Do not spend excess time and money on participants that you do not needHave a good chance of detecting the effect you are looking forSatisfy the requirements of a funding agency

Slide8

Key Concepts: Significance TestingIn significance testing, a p-value is computed for a test statistic:

 

Slide9

What errors can occur in significance testing?A

Type I error is the rejection of true null hypothesis (false positive)The alpha value (usually .05) sets the risk of Type I error

Alpha () is the asymptotic probability that the null hypothesis will be rejected if the null hypothesis is trueA Type II error is the retention of a false null hypothesis (false negative)The Type II error rate () is related to nominal alpha level (), effect size, and sample size (N).

Slide10

Types of Statistical Errors

Type

s of Statistical ErrorsNull Hypothesis (H0) isTrueFalseStatistical DecisionReject H0Type I Error(α)Correct Decision(1-β

)

Fail to Reject

H

0

Correct Decision

(1-

α

)

Type

II Error

(

β

)

Slide11

Type I Error:α =

=

=

Type II Error:

β

=

=

Cautions:

If the null hypothesis is assumed true, then we can

choose

the theoretical Type I error rate

If we assume that a specific

H

1

is true (mean and standard deviation), we can

compute

the theoretical Type II error rate

But, we have no idea what the true population mean is!

 

11

Statistical Errors

Slide12

Null

Distribution

 AlternateDistribution

 

Critical value 1.96

Slide13

What is power?Power is the probability that the null hypothesis is rejected given that the alternate hypothesis is true.

=1 –

=

1 –

β

In English:

Power is the likelihood that you detect the effect you are looking for!

Power = 0.80 is a commonly used benchmark

Yes, ~20% of rigorously designed studies fail to detect a true effect

 

13

Slide14

How Alpha, Beta and Effect size are related

http

://rpsychologist.com/d3/NHST14

Slide15

15

Slide16

The power of a z test: SettingRQ: Does being involved in sports have an effect on weight for 8 year old boys?In the population of all 8 year old boys

 = 56

pounds with a  = 3 poundsTo compute power, we need to know the true effect of sports involvementIn the population of 8 year old boys involved in sports  = 55.3 pounds with  = 3 poundsSuppose we randomly sampled from boys involved in sports, but the significance test is a comparison to the full population

Slide17

The power of a z test: CalculationSuppose we have a random sample of 30 boys involved in sports. What is the probability of rejecting the null hypothesis that the mean weight of boys involved in sports is 56 pounds?The null hypothesis is rejected if

p < .05, corresponding to |z| > 1.96

The standard error is Therefore, we will reject if or We are sampling from a population with a mean of 55.3 and a standard deviation of 3 (so the standard error is .548) or

corresponds to z < -0.676 or z > 3.23

P(

z < -0.676 or z >

3.23) =

.249

(that’s the power!)

 

Slide18

Steps for conducting a power analysis

Software

Step 1:

Step 2:

Step 3:

Step 4:

Statistical Test

α

and Power

Effect Size

Auxiliary

Information

Slide19

Step 2: Alpha = .05, Easy-Peasy! [NO!]

α =

.05 is typicalTypically, power = .80, but it is your choice

α

and Power

Slide20

Step 3: Effect SizesEffect sizes are standardized versions of the statistics used in significance testing and can be compared across studies

The effect size we use in a power analysis can be derived from a pilot study, from related literature, practical significance, or in desperation, from professional judgment

Effect Sizes

Test

Test Statistic

Effect

Size

Mean Difference

Cohen’s

d

Independence

χ

2

Correlation

r

r

Regression

R

2

ANOVA

or

Test

Test Statistic

Effect

Size

Mean Difference

Cohen’s

d

Independence

χ

2

Correlation

r

r

Regression

R

2

ANOVA

Slide21

Effect size cutoffsNO!But if you really have no clue, then look up those tables from Cohen and interpret them as follows:

Effect Size

MeaningSmallI expect this effect to be hard to detectMediumNeither small nor large.LargeI expect this effect to be easy to detect

Slide22

Some types of analyses need more information to be able to calculate powerDesign considerations such as allocation ratio for independent samples t testIntraclass correlation coefficient (ICC) for multilevel data

Advanced power analysis can account for much of the messiness in researchExtent of missing

data, outliers, invalid responses, low reliability,

Auxiliary Information

Slide23

For exact, F, t, z, and χ2 tests, use G*Power

Only for single-level data (no nesting or clustering)

Assumes perfect measurement, no missing data, etc.For multi-level tests, use Optimal DesignAssumes perfect measurement, no missing data, etc.For complicated designs, and to account for imperfect measurement, missing data, use simulationsE.g., Mplus, SAS, R

Software

Slide24

Back to Movie ExpensesRQ: Do New Yorkers and Kansans spend the same amount of money per month on movies?

Independent samples

t test

=.05

Power = .80

????

Allocation Ratio = 1

Step 1:

Step 2:

Step 3:

Step 4:

Statistical Test

α

and Power

Effect Size

Auxiliary

Information

Slide25

Estimating Effect SizeBased on data from Bureau of Labor Statistics (BLS)

differences in average salary between the two states indicate that we should expect a difference of $4.21 in monthly movie

expensesThe standard deviation of money spent on movie tickets per month is $18.21G*Power can convert this information to an effect size for us

Slide26

Slide27

Results

Using equal sample sizes is always vastly more efficient that unequal sample sizes

This is the sample size we need

Slide28

Warning: That calculated sample size assumes a lot of things!All of the assumptions of the independent t-test must be met exactly for that sample size to be appropriate

Money spent per month must be continuous (

)Money spent must be approximately normally distributed ()Independence of observations ()No influential outliers (maybe?)Homogeneity of Variance (maybe? but t tests with equal sample sizes are largely robust to this violation )

Slide29

Slide30

What if it’s easier to recruit from New York than Kansas?

Total of 74 more people!

The size of the Kansas group can not decrease *too* much lest the red distribution be very wideTwice as many NewYorkers as Kansans

Slide31

Example 2: chi-square for contingency table (2x2)RQ: Does a generic flu vaccine work better than a placebo?

Chi-square test

 = .05

Power = .80

????

None

Step 1:

Step 2:

Step 3:

Step 4:

Statistical Test

α

and Power

Effect Size

Auxiliary

Information

Slide32

What is the end data going to look like (sample data)?

No Flu

FluTotalPlacebo23565300Vaccine28515300Total

520

80

600

Slide33

Estimating Effect SizePrior research with the name-brandvaccine showed that 14% of peoplereceiving the vaccine and 21% of people not receiving the vaccine

contracted the flu

Estimated effect size of 0.202 from prior research33

Slide34

ResultsTotal number of people, split into the two groups (placebo, vaccine) equally

The test for independence is basically a multi-group goodness-of-fit

(GOF) test. 34

Slide35

Example 3: Growth ModelRQ: Does an alternate curriculum affect the growth rate of scores on the mathematics part of the MAP test among elementary students?

HLM

 = .05

Power = .80

.

50

ICC = ??

Cluster Size = 25

Step 1:

Step 2:

Step 3:

Step 4:

Statistical Test

α

and Power

Effect Size

Auxiliary

Information

Slide36

Slide37

Warnings about power analysisOften provide a “best case”More sophisticated techniques can account for things like missing data, non-normality, etcCan perform sensitivity analyses based on the range of possible values rather than just a single oneNot generalizable

Slide38

Software for conducting power analysisG*Power – extensive within the observed, single level framework when nothing bad ever happensOptimal Design - extensive with the observed, multi-level framework (including repeated measures and longitudinal designs) when nothing bad ever happens

Mplus (and R and, to a lesser extent, SAS and other SEM programs) – has capabilities to conduct power analysis via simulation study VERY flexibly

Slide39

Danger, Danger Will Robinson!When testing multiple effects at once, alpha = .05 is inappropriateBonferroni (super conservative!)Adjustments based on outcome correlationsSequential gatekeepingSelective alpha weighting

Combinatorial outcomes (e.g. first MANOVA, then post-hoc testing)

Slide40

References and LinksPower Animation: http://rpsychologist.com/d3/NHST/Power analysis seminars from UCLA: https://stats.idre.ucla.edu/other/mult-pkg/seminars/intro-power

/G*Power:

http://www.gpower.hhu.de/en.htmlOptimal Design: http://hlmsoft.net/od/Check out the User’s Guides for many examples