Beth Chance Allan Rossman Emily Tietjen Cal Poly State University George Cobb Mount Holyoke College httpstatwebcalpolyeducsi Introducing Concepts of Statistical Inference Via Randomization Tests ID: 543945
Download Presentation The PPT/PDF document "John Holcomb - Cleveland State Universit..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
John Holcomb - Cleveland State UniversityBeth Chance, Allan Rossman, Emily Tietjen - Cal Poly State UniversityGeorge Cobb - Mount Holyoke Collegehttp://statweb.calpoly.edu/csi/
Introducing Concepts of Statistical Inference Via Randomization TestsSlide2
Introduction2005 USCOTS Cobb proposed the idea of a new introductory curriculum based on randomization methods.
Cobb (2007) “Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, instead of putting the core logic of inference at the center.”
Why? “Tyranny of the computable.”
How????Slide3
Curriculum GoalUse technology simulations to lead students to develop an understanding of the concepts of statistical significance and p-values.
Focus on statistical process.
Repeated exposure throughout course.Slide4
Learning ProcessResearch study and data.Tactile simulation and class discussion of results.Simulation using a tailored applet.
Empirical
p
-value and discussion.
Conclusion in context.Slide5
Classroom Activities (Example 1): Naughty or Nice?
Hamlin, Wynn, and Bloom (
Nature,
2007)
Videos available at
www.yale.edu/infantlab/socialevaluation/Helper-Hinderer.html
Do the experimental results (14 out of 16) provide convincing evidence that infants have a genuine preference for the helper toy rather than the result occurring by chance alone?
Inference for a binomial proportion.Slide6
How likely such an extreme result would be under the null model of no preference?Begin with each student flipping a coin 16 times.
Combine results from class with a dotplot.
Move to applet to simulate tossing 16 coins for 1000 repetitions.
Determine the proportion of repetitions14 or more flips were heads.
Randomization approachSlide7
Example 2: Sleep Deprivation? Stickgold, James, & Hobson, (
Nature Neuroscience,
2000)
21 subjects randomly assigned to one of two groups: sleep deprived group and unrestricted sleep.
Both groups then allowed as much sleep as wanted on the following two nights.
All subjects re-tested on the third day. Slide8
Example 2: Sleep Deprivation? Randomized experiment.Compared mean improvement in response time to a visual stimulus on a computer screen.
Unrestricted sleep group: 19.82 ms.
Sleep deprived group: 3.90 ms.
Inference for the difference of two means from independent samples.Slide9
Randomization approachHow likely such an extreme result (Diff=15.92 ms
) would be under the null model of no treatment effect?
21 improvement scores written on index cards.
Randomly “deal” cards to two groups and record difference of group means.
Combine results of the class with a dotplot.Slide10
Randomization approachUse an applet to simulate randomization process 1000 times and explore the frequency of a difference in means of 15.92 or more extreme by chance alone.
http://www.rossmanchance.com/appletsSlide11Slide12Slide13
Research GoalMake evidence-based curriculum decisions. Implement and collect data from small-scale classroom experiments.
Identify issues that create difficulties for student understanding.
Formulate questions about the most pedagogically effective way to implement this approach.Slide14
Curriculum Design Issue #1: First Activity
Should the first example that students encounter be one where the result is statistically significant or one where the result is not significant at all?Slide15
AdvantagesStudents may find it easier to judge when an observed result is surprising under a null model.
Starting with an insignificant result may reinforce students’ natural inclinations to regard a
p
-value as the probability the null model is true. Slide16
Classroom ExperimentFour sections of introductory statistics at Cal Poly.
Half the students told 9 of the 16 infants chose the helper toy (non-significant result group).
Half the students told 14 of the 16 infants chose the helper toy (significant result group).
Students given the activity and told to work in pairs.Slide17
Classroom ExperimentTwo instructors: one randomized across sections and the other randomized by individuals.
Follow-up Quiz questions the next class period.Slide18
When I conducted the simulation using 1,000,000 repetitions, I obtained a proportion (empirical p-value) of .402 (.002). Based on this result, which assumes the null model of no genuine preference, the actual result obtained by the researchers,9 of 16 (14 of 16), choosing the helper is
a) impossible b) very surprising
c) somewhat surprising d) not at all surprising
Question 1
9 of 16
14 of 16Slide19
Results60.6% (n = 71) students in the “9 of 16” group answered correctly.
77.5% (
n
= 71) students in the “14 of 16” group answered correctly.
Two-sided
p
-value is approximately 0.030.
Interpretation: students find it easier to interpret a surprising outcome than a non-surprising one.Slide20
Question 2
Fill in the blanks in the following sentence to interpret this proportion from part (l).
This proportion says that in about
(1)
% of ___
(2)
___, the researchers would get __
(3)
___ who choose
the helper toy, assuming that ________
(4)
______.Slide21
This proportion says that in about (1) % of ___(2)___, the researchers would get __(3)___ who choose the helper toy, assuming that ______
(4)
______.
(
1
) 40.2% vs. 0.2%: Non-significant group performed better, but may be an artifact of the difficulty of expressing 0.002 as a percentage.
Results
9 of 16
14 of 16Slide22
Results This proportion says that in about (1) % of ___(2)
___, the researchers would get __
(3)
___ who choose the helper toy, assuming that ______
(4)
______.
(
2
) Both groups about 50% correctly answering 1,000,000 repetitions.Slide23
Results This proportion says that in about (1) % of ___
(2)
___, the researchers would get __
(3)
___ who choose the helper toy, assuming that ______
(4)
______.
(
3
) “14 of 16” group performed better (54.2% vs. 38.9%,
p
-value = .066) in citing the observed result or more extreme.Slide24
Results This proportion says that in about (1) % of ___(2)
___, the researchers would get __
(3)
___ who choose the helper toy, assuming that ______
(4)
______.
(
4
) Both groups about 76% correctly answering “assuming no preference.”Slide25
InterpretationAlthough students seem to equally understand the null model, they did differ slightly in realizing what the simulation told them.Slide26
Question 3Based on your answer to (1) and (2), which of the following would you consider the most appropriate conclusion from this study? (choose one)
(a) These 16 infants have no genuine preference and therefore there’s no reason to doubt that the researchers’ result is different from .5 just by random chance.
(b) The researchers’ results would be very surprising if there was no genuine preference for the helper and therefore I believe there is a preference.
(c) There is a large (small) chance that there is a genuine preference for the helper.Slide27
ResultsApproximately 77% in each group answered the correct answer: (a) for the “9 of 16” group and (b) for the “14 of 16.” Slide28
Curriculum Design Issue #2: Tactile Simulations
We have suggested beginning each simulation with a tactile version before turning to technology, but does the tactile aspect really add value to the students’ learning experience?Slide29
Should we do tactile first?Potential advantages:
Students are in a better position to understand what the technology simulation is doing if they have first performed a tactile simulation themselves.
We use applets that mirror the hands-on activity as closely as possible so the technology is not a “black box.”
Potential disadvantage: tactile simulations take valuable class time. Slide30
Classroom ExperimentRandomly assigned 43 students to two treatment groups.
Class topic was investigating the sampling distribution of a single proportion.Slide31
Classroom ExperimentTactile group (20 students) students each determined the sample proportion of orange candies among 25 actual Reese’s Pieces.Students created a class dotplot of sample proportions.
Then turned to applet for simulation of many samples of size 25.Slide32
Classroom ExperimentNon-tactile group (23) immediately moved to simulation with applet without working with Reese’s Pieces and creating pooled dotplot.Slide33
ResultsStudents in both groups given a quiz the next day.Five questions that involved a single sample proportion.
Independent and blinded statistics instructor graded the quizzes.
No statistically significant differences were found.Slide34
ResultsInteresting outcome was that both groups seemed to finish the activity in about the same amount of time.May suggest the tactile aspect does not take more time and does not hinder learning.Slide35
ResultsIn a follow-up questionnaire to a different class, approximately 50% of the students indicated the tactile component of the activity was helpful. Slide36
Curriculum Design Issue #3Should the first activity that students encounter focus on inference for a single proportion, as in the “Naughty or Nice” example, or on a comparison of two groups, as in the “Sleep Deprivation” example?Slide37
Curriculum Design Issue #4In the case of a simulation involving a single proportion, as with the “Naughty or Nice” example, how should the tactile simulation be conducted?16 students each tossing one coin
Each student tosses 1 coin 16 times
Each student tosses 16 coinsSlide38
Curriculum Design Issue #5In the case of a randomization test for a 2×2 table, what statistic should the students calculate in their simulations?Difference in conditional proportions of success.
Ratio of conditional proportions (Relative risk).
Number of successes in group A.Slide39
Curriculum Design Issue #6How much of the work should the technology do automatically (calculating empirical p-values)?
Push a button (e.g., based on two-way table).
Specify observed result to count beyond.Slide40
Curriculum Design Issue #7Should the type of randomness used in the simulation always reflect the role of randomness used in the actual data collection process?Randomizing (assignment) when data arises from experiments. Bootstrapping and/or sampling from finite populations when data arises from samples.
Always randomizing group membership.Slide41
SummaryFeasibility of random assignment in classroom experiments.Focused and relevant research questions.
Direct link between research and classroom practice.
Assessment instruments described in Holcomb, Chance, Rossman, and Cobb (2010, Proceedings).Slide42
ReferencesCobb, George W. (2007). The Introductory Statistics Course: A Ptolemaic Curriculum? Technology Innovations in Statistics Education, 1(1),www.escholarship.org/uc/item/6hb3k0nz
Holcomb, J., Chance, B., Rossman, A., & Cobb, G., (2010). Assessing student learning about statistical inference,
Proceedings of the 8
th
International Conference on Teaching Statistics.
Hamlin, J. K., Wynn, K., & Bloom, P. (2007). Social evaluation by preverbal infants.
Nature
, 450, 557-559
Stickgold, R., James, L., & Hobson, J.A. (2000). Visual discrimination learning requires post-training sleep.
Nature Neuroscience
, 2, 1237-1238.
http://statweb.calpoly.edu/csi
Thanks to National Science Foundation DUE/CCLI #0633349