Probability Seminar 6 A difficult mock question for midterm Plot the following graph People who use Facebook only but not Twitter are generally happier than those who use Twitter only but not Facebook This ID: 485215
Download Presentation The PPT/PDF document "Introduction to hypothesis testing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction to hypothesis testing+Probability
Seminar 6Slide2
A difficult mock question for mid-termPlot the following graph. People who use Facebook only (but not Twitter) are generally happier than those who use Twitter only (but not Facebook). This
tendency is weakest among new subscribers. However, after a certain number of years, there is a declining trend for both groups, with a sharper decline among
Twitter users. Slide3
Today’s QuestionWe want to know whether
salary bonuses increases people’s psychological well-being. The average well-being of Delhi’s residents is
3.00 (
SD
=
1.00). We randomly
sampled
a group
of
30 employees and gave them a salary bonus.
Months later, we measure their well-being. The average
well-being
in this sample is 3.50.Slide4
Two possibilitiesColored (original population)
Greys (another population)
The sample mean of 3.50 was drawn from your original population
The sample mean of 3.50 was drawn from another population.Slide5
The real
problem: Randomness
Sampling error: Every sample is likely to have different statistical parameters. Using Excel’s =RANDBETWEEN(1,100):
A
B
C
D
E
F
G
H
I
J
1
9358539737
5083078582223
262350183132948318691618755545
354082455359910050859960735
813956548391299163766618425
93
159492
85
17671585659985880
827388842846214899299039954179895
4790913227410325845317915
45861040M55.634.2
47.252.064.233.162.059.549.252.5SD
30.122.234.236.129.228.433.734.333.924.6Slide6
Big Question
Is the .50 difference between the
salary bonus group
and
Delhi residents in
general a result of
the bonus, or simply an
“accident” of sampling
error “randomness”)?Slide7
Two hypotheses are implied
Null hypothesis
The sample comes from a population in which the mean is
3.00
The
difference we observed is due to sampling error
.
Alternative
hypothesis
The sample does not come from a population in which the mean is 3.00
.
The
difference is due to
salary bonus.
(Often called the “research hypothesis.”)Slide8
Mathematically…H0
: μx = 3.00
H
1
:
μ
x
≠
3.00
Null
hypothesis
Alternative
hypothesis
Note that the two hypotheses are mutually exclusive.Slide9
How can we determine which
hypotheses is
more likely
to be true?
The
most popular
tools:
Null Hypothesis
Significance Tests
(NHSTs).
Significance
tests are
quantitative
techniques to evaluate the probability of observing the data, assuming that the null hypothesis is true.
This information is used to make a binary (yes/no) decision about whether the null hypothesis is a viable explanation for the study results. Slide10
The NHST at its coreTwo statistical datasets, A and B,
are compared.Each dataset has its own parameters (e.g., M & SD).
The question is, is A = B? (the null hypothesis)
Often, we want to prove A ≠ B (the alternative hypothesis) by disproving A = B.
If A = B, A-B = 0 (that’s where the ‘null’ comes from)Slide11
NHST & philosophyWe cannot prove that something is true; we can only prove is something is false.
“All swans are white”“Innocent until proven guilty”
Inferential statistics are
probabilistic
.
“My hypothesis is true.”
“How
likely
is my hypothesis is true.”Slide12
Basic probability
In a bag of 100 balls, 5 are red, 95 are blue.
Slide13
Two basic rules of probability
The bag now has: 5 red, 10 green, and 85 blue balls.What is the probability that I will draw either a red
or
green ball?
Additive rule:
If I draw two balls (with replacement), what is the probability that I will draw a red
and
a green ball?
Multiplicative rule:
Slide14
Relationship to sampling distributionsRecall: Sampling distribution is the distribution of means for repeated random samplesSlide15
Relationship to sampling distributionsHow extreme is your sample mean of 3.50?
We calculate a z-
scor
e
:
Note: This is different from
One is inferential, one is descriptive
Slide16
Relationship to sampling distributions
The probability of getting a z-score of ≥2.74 is .003.
Slide17
How NHSTs workIs .003
a “small” probability?Because
the distribution of sample means is continuous, we
create
an arbitrary point along this continuum for denoting what is “small” and what is “large.”
By convention in psychology,
if the probability of observing the sample mean is less than
5%
, researchers reject the null hypothesis. Slide18
Rules of the NHST GameWhen p
< .05, a result is said to be “statistically significant”
In
short, when a result is statistically significant (
p
< .05), we conclude that the difference we observed was unlikely to be due to sampling error alone. We “
reject the null hypothesis
.”
If
the statistic is not statistically significant (
p
> .05), we conclude that sampling error is a plausible interpretation of the results. We “
fail to reject the null hypothesis
.”Slide19
Binary Yes vs. No criteriaNHSTs were developed for the purpose of making yes/no decisions about the null hypothesis.
As a consequence, the null is either rejected or not, based on the
p
-value
.
Strictly speaking, NHSTs do not test the research hypothesis per se; only the null hypothesis is tested. Slide20
Different significance testsThe previous example was
an example of a z-test of a sample mean
.
(≠ z-score of a sample)
Significance
tests have been developed
for:
difference between two group means:
t-test
difference between two or more group means:
ANOVA
differences between proportions:
chi-squareSlide21
What does statistical significance mean?
The term “significant” does not mean important, substantial, or worthwhile.
Showing that Facebook postings affect your mood with a probability of
p
= .001 with N > 1,000,000 says nothing about how important it is.
More about this in Week 14.Slide22
Inferential Errors and NHSTA yes/no decision about whether the null hypothesis
as a viable explanation can lead to mistakes.What sort of mistakes?Slide23
Null is true
Null is false
Null is true
Null is false
Real
world (population)
Conclusion of the
test (sample)
Correct decision
Correct decision
Type II
error
(false negative)
Type I
error
(false positive)
Inferential Errors and NHSTSlide24
NHST thinking applied to the real world
Null is
true
(
truly not guilty)
Null is
false (truly guilty)
Real
world
Null is
true
(acquittal)
Null is
false
(conviction)
Conclusion of the
testCorrect decisionCorrect decisionType II
error(false negative)Type I error(false positive)Slide25
Or simply…Slide26
The probability of making a Type I error is determined by the experimenter. Often called the alpha value. Usually set to 5%.
This determines how conservative we want to be.
The
probability of making a Type II error is
also determined
by the experimenter. Often called the
beta
value
(more in Week 12 on Power & Effect Size).
Errors in Inference using NHSTSlide27
One-tail or two-tail tests?Previously, H
0: μx̄
=
μ
H
1
:
μ
x̄
≠
μ
We could also have H
1 as: H1: μx̄ < μ H1: μx̄ > μ Often in psychology, we use two-tail tests.One-tail (directional)Two-tailSlide28
Problem with one-tail testsBefore collecting data
Null: μx̄ =
30
Alternative:
μ
x̄
<
30
After collecting data, you found:
Case 1
μ
x
= 50, p = .0001Case 2 μx = 26, p = .04You must reject H0 in Case 1, but you’re forced to conclude that 50 > 30?! (the mean is grossly opposite to your alternative hypothesis.)Slide29
Problem with two-tail testsBefore collecting data
Null: μx̄ =
30
Alternative:
μ
x̄
≠
30
After collecting data, you found:
Case 1
μ
x
= 26, p = .04Case 2 μx = 27, p = .06Two tail tests can be too conservativeReject nullDo not reject nullSlide30
Which should you choose?The debate can continue forever.Most psychologists would choose two-tail tests.
Some psychologists choose Bayesian statistics (not in SRM I and II)What does your theory actually predict?Slide31
Five steps to NHSTState the null and alternative hypothesis
Choose the type of statistical test
Select the significance level (usually 5%), and the tail of the test
Derive the sample statistic (
z, t, F, r, B,
etc.)
Report resultsSlide32
State the appropriate H0 and H1 for the following studies
Researchers want to test whether there is a difference in spatial ability between left- and right-handed people.Researchers want to test whether nurses who work 8-hour shifts deliver higher-quality work than those who work 12-hour shifts.
A psychologist predicted that the number of advertisements shown increases the sales of a product
geometrically
.Slide33
Back to “Today’s Question”
“We want to know whether salary bonuses increases
people’s psychological well-being.
The average well-being of
Delhi’s residents is
3.00 (
SD
=
1.00).
We randomly
sampled
a group
of
30 employees and gave them a salary bonus.
Months later, we measure their well-being. The average well-being in this sample is 3.50.”We derived this solution earlier:
Slide34
The problemOften the population variance is unknown (Seminar 5).
“The average well-being of Delhi’s residents is 3.00 (
SD
= 1.00
).”
What do we do?Slide35
One-sample t-test
vs.
t
distributions approximate
z
distributions as
N
∞
df
stands for “
degrees
of freedom”The number of scores that are free to vary.For one-sample t-test,df = n – 1Slide36
An example using one-sample t-test
Question: Do Ashoka students spend ₹200 a day on food on average?
Suppose
we
sampled
daily food expenditure among 100 students, and found
M
= ₹ 220;
SD
= ₹ 20
.
,
Check out the t-distribution table (p. 543)
Google “t-test calculator” and enter the t valueUse software e.g., JASP, SPSS, RSlide37
t-test familyThe previous example was a one-sample t-test.
Very seldom used in psychologyVery useful in quality control, e.g., “Does this batch of batteries meet ISO6001 standards?”Next week:
Independent samples
Dependent samplesSlide38
An alternative to NHST: BayesianProblems with NHST
The significance level is arbitraryIt doesn’t test the research hypothesis directly
Tendency to “accept” or “reject” hypotheses blindly
Bayesian statistics
(Google it; not in SRM I or II)
Bayes
factors represent the weight of evidence in
the
data for competing
hypotheses
Easily implemented in JASP
Has its own problems tooSlide39
SummaryAppreciate randomness in your data.NHST results in binary outcomes; sometimes this is useful, other times not.
z-test is useful to understand statistical inference, but often useless to answer practical questions, which t-test are more suited.
Next week we cover different types of t-test
s
.Slide40
Announcement9 Nov has been declared a university holiday.Course syllabus has been rearranged.
Deadline for research project has been pushed back.