/
Introduction to hypothesis testing Introduction to hypothesis testing

Introduction to hypothesis testing - PowerPoint Presentation

liane-varnes
liane-varnes . @liane-varnes
Follow
583 views
Uploaded On 2016-11-06

Introduction to hypothesis testing - PPT Presentation

Probability Seminar 6 A difficult mock question for midterm Plot the following graph People who use Facebook only but not Twitter are generally happier than those who use Twitter only but not Facebook This ID: 485215

hypothesis null sample test null hypothesis test sample probability nhst error tail tests true false sampling population significance difference type salary decision

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Introduction to hypothesis testing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Introduction to hypothesis testing+Probability

Seminar 6Slide2

A difficult mock question for mid-termPlot the following graph. People who use Facebook only (but not Twitter) are generally happier than those who use Twitter only (but not Facebook). This

tendency is weakest among new subscribers. However, after a certain number of years, there is a declining trend for both groups, with a sharper decline among

Twitter users. Slide3

Today’s QuestionWe want to know whether

salary bonuses increases people’s psychological well-being. The average well-being of Delhi’s residents is

3.00 (

SD

=

1.00). We randomly

sampled

a group

of

30 employees and gave them a salary bonus.

Months later, we measure their well-being. The average

well-being

in this sample is 3.50.Slide4

Two possibilitiesColored (original population)

Greys (another population)

The sample mean of 3.50 was drawn from your original population

The sample mean of 3.50 was drawn from another population.Slide5

The real

problem: Randomness

Sampling error: Every sample is likely to have different statistical parameters. Using Excel’s =RANDBETWEEN(1,100):

 

A

B

C

D

E

F

G

H

I

J

1

9358539737

5083078582223

262350183132948318691618755545

354082455359910050859960735

813956548391299163766618425

93

159492

85

17671585659985880

827388842846214899299039954179895

4790913227410325845317915

45861040M55.634.2

47.252.064.233.162.059.549.252.5SD

30.122.234.236.129.228.433.734.333.924.6Slide6

Big Question

Is the .50 difference between the

salary bonus group

and

Delhi residents in

general a result of

the bonus, or simply an

“accident” of sampling

error “randomness”)?Slide7

Two hypotheses are implied

Null hypothesis

The sample comes from a population in which the mean is

3.00

The

difference we observed is due to sampling error

.

Alternative

hypothesis

The sample does not come from a population in which the mean is 3.00

.

The

difference is due to

salary bonus.

(Often called the “research hypothesis.”)Slide8

Mathematically…H0

: μx = 3.00

H

1

:

μ

x

3.00

Null

hypothesis

Alternative

hypothesis

Note that the two hypotheses are mutually exclusive.Slide9

How can we determine which

hypotheses is

more likely

to be true?

The

most popular

tools:

Null Hypothesis

Significance Tests

(NHSTs).

Significance

tests are

quantitative

techniques to evaluate the probability of observing the data, assuming that the null hypothesis is true.

This information is used to make a binary (yes/no) decision about whether the null hypothesis is a viable explanation for the study results. Slide10

The NHST at its coreTwo statistical datasets, A and B,

are compared.Each dataset has its own parameters (e.g., M & SD).

The question is, is A = B? (the null hypothesis)

Often, we want to prove A ≠ B (the alternative hypothesis) by disproving A = B.

If A = B, A-B = 0 (that’s where the ‘null’ comes from)Slide11

NHST & philosophyWe cannot prove that something is true; we can only prove is something is false.

“All swans are white”“Innocent until proven guilty”

Inferential statistics are

probabilistic

.

“My hypothesis is true.”

“How

likely

is my hypothesis is true.”Slide12

Basic probability

In a bag of 100 balls, 5 are red, 95 are blue.

 Slide13

Two basic rules of probability

The bag now has: 5 red, 10 green, and 85 blue balls.What is the probability that I will draw either a red

or

green ball?

Additive rule:

If I draw two balls (with replacement), what is the probability that I will draw a red

and

a green ball?

Multiplicative rule:

 Slide14

Relationship to sampling distributionsRecall: Sampling distribution is the distribution of means for repeated random samplesSlide15

Relationship to sampling distributionsHow extreme is your sample mean of 3.50?

We calculate a z-

scor

e

:

Note: This is different from

One is inferential, one is descriptive

 Slide16

Relationship to sampling distributions

The probability of getting a z-score of ≥2.74 is .003.

 Slide17

How NHSTs workIs .003

a “small” probability?Because

the distribution of sample means is continuous, we

create

an arbitrary point along this continuum for denoting what is “small” and what is “large.”

By convention in psychology,

if the probability of observing the sample mean is less than

5%

, researchers reject the null hypothesis. Slide18

Rules of the NHST GameWhen p

< .05, a result is said to be “statistically significant”

In

short, when a result is statistically significant (

p

< .05), we conclude that the difference we observed was unlikely to be due to sampling error alone. We “

reject the null hypothesis

.”

If

the statistic is not statistically significant (

p

> .05), we conclude that sampling error is a plausible interpretation of the results. We “

fail to reject the null hypothesis

.”Slide19

Binary Yes vs. No criteriaNHSTs were developed for the purpose of making yes/no decisions about the null hypothesis.

As a consequence, the null is either rejected or not, based on the

p

-value

.

Strictly speaking, NHSTs do not test the research hypothesis per se; only the null hypothesis is tested. Slide20

Different significance testsThe previous example was

an example of a z-test of a sample mean

.

(≠ z-score of a sample)

Significance

tests have been developed

for:

difference between two group means:

t-test

difference between two or more group means:

ANOVA

differences between proportions:

chi-squareSlide21

What does statistical significance mean?

The term “significant” does not mean important, substantial, or worthwhile.

Showing that Facebook postings affect your mood with a probability of

p

= .001 with N > 1,000,000 says nothing about how important it is.

More about this in Week 14.Slide22

Inferential Errors and NHSTA yes/no decision about whether the null hypothesis

as a viable explanation can lead to mistakes.What sort of mistakes?Slide23

Null is true

Null is false

Null is true

Null is false

Real

world (population)

Conclusion of the

test (sample)

Correct decision

Correct decision

Type II

error

(false negative)

Type I

error

(false positive)

Inferential Errors and NHSTSlide24

NHST thinking applied to the real world

Null is

true

(

truly not guilty)

Null is

false (truly guilty)

Real

world

Null is

true

(acquittal)

Null is

false

(conviction)

Conclusion of the

testCorrect decisionCorrect decisionType II

error(false negative)Type I error(false positive)Slide25

Or simply…Slide26

The probability of making a Type I error is determined by the experimenter. Often called the alpha value. Usually set to 5%.

This determines how conservative we want to be.

The

probability of making a Type II error is

also determined

by the experimenter. Often called the

beta

value

(more in Week 12 on Power & Effect Size).

Errors in Inference using NHSTSlide27

One-tail or two-tail tests?Previously, H

0: μx̄

=

μ

H

1

:

μ

μ

We could also have H

1 as: H1: μx̄ < μ H1: μx̄ > μ Often in psychology, we use two-tail tests.One-tail (directional)Two-tailSlide28

Problem with one-tail testsBefore collecting data

Null: μx̄ =

30

Alternative:

μ

<

30

After collecting data, you found:

Case 1

μ

x

= 50, p = .0001Case 2 μx = 26, p = .04You must reject H0 in Case 1, but you’re forced to conclude that 50 > 30?! (the mean is grossly opposite to your alternative hypothesis.)Slide29

Problem with two-tail testsBefore collecting data

Null: μx̄ =

30

Alternative:

μ

30

After collecting data, you found:

Case 1

μ

x

= 26, p = .04Case 2 μx = 27, p = .06Two tail tests can be too conservativeReject nullDo not reject nullSlide30

Which should you choose?The debate can continue forever.Most psychologists would choose two-tail tests.

Some psychologists choose Bayesian statistics (not in SRM I and II)What does your theory actually predict?Slide31

Five steps to NHSTState the null and alternative hypothesis

Choose the type of statistical test

Select the significance level (usually 5%), and the tail of the test

Derive the sample statistic (

z, t, F, r, B,

etc.)

Report resultsSlide32

State the appropriate H0 and H1 for the following studies

Researchers want to test whether there is a difference in spatial ability between left- and right-handed people.Researchers want to test whether nurses who work 8-hour shifts deliver higher-quality work than those who work 12-hour shifts.

A psychologist predicted that the number of advertisements shown increases the sales of a product

geometrically

.Slide33

Back to “Today’s Question”

“We want to know whether salary bonuses increases

people’s psychological well-being.

The average well-being of

Delhi’s residents is

3.00 (

SD

=

1.00).

We randomly

sampled

a group

of

30 employees and gave them a salary bonus.

Months later, we measure their well-being. The average well-being in this sample is 3.50.”We derived this solution earlier:

 Slide34

The problemOften the population variance is unknown (Seminar 5).

“The average well-being of Delhi’s residents is 3.00 (

SD

= 1.00

).”

What do we do?Slide35

One-sample t-test

vs.

t

distributions approximate

z

distributions as

N

 

df

stands for “

degrees

of freedom”The number of scores that are free to vary.For one-sample t-test,df = n – 1Slide36

An example using one-sample t-test

Question: Do Ashoka students spend ₹200 a day on food on average?

Suppose

we

sampled

daily food expenditure among 100 students, and found

M

= ₹ 220;

SD

= ₹ 20

.

,

 

Check out the t-distribution table (p. 543)

Google “t-test calculator” and enter the t valueUse software e.g., JASP, SPSS, RSlide37

t-test familyThe previous example was a one-sample t-test.

Very seldom used in psychologyVery useful in quality control, e.g., “Does this batch of batteries meet ISO6001 standards?”Next week:

Independent samples

Dependent samplesSlide38

An alternative to NHST: BayesianProblems with NHST

The significance level is arbitraryIt doesn’t test the research hypothesis directly

Tendency to “accept” or “reject” hypotheses blindly

Bayesian statistics

(Google it; not in SRM I or II)

Bayes

factors represent the weight of evidence in

the

data for competing

hypotheses

Easily implemented in JASP

Has its own problems tooSlide39

SummaryAppreciate randomness in your data.NHST results in binary outcomes; sometimes this is useful, other times not.

z-test is useful to understand statistical inference, but often useless to answer practical questions, which t-test are more suited.

Next week we cover different types of t-test

s

.Slide40

Announcement9 Nov has been declared a university holiday.Course syllabus has been rearranged.

Deadline for research project has been pushed back.