Myths By Jeffery R Edwards Presented by Chelsea Hutto Difference Scores Typically used to represent the similarity between two constructs Highly used in studies of personjob fit similarity between employee and organizational values match between employee expectations and experiences an ID: 545058
Download Presentation The PPT/PDF document "Ten Difference Score" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Ten Difference Score Myths By Jeffery R. Edwards
Presented by
Chelsea Hutto Slide2
Difference Scores
Typically used to represent the similarity between two constructs
Highly used in studies of person-job fit, similarity between employee and organizational values, match between employee expectations and experiences, and the agreement between performance ratings.
Suffer from many methodological problems Slide3
Polynomial Regression Analysis
These problems can be avoided by using PRA
Uses components of difference scores along with higher order terms to represent relationships of interest in congruence research.
Treats difference scores as statements of hypotheses to be tested empirically.
Also supported by
Cafri
et. al
(2009). Slide4
Misconceptions Regarding Problems with DS
Myth 1:The Problem with Difference Scores is low reliability
Low internal consistency reliability as been viewed as
only
serious problem with DS
Reliability of any measure is ultimately an empirical matter
Problem is not whether DS are reliable in an absolute sense but also whether or not they are more reliable than other alternatives
Even with adequate reliabilities – does not solve other issues Slide5
Myth 2: Difference Scores Provide Conservative Statistical Tests
Stat’s tests based on DS labeled as conservative
Sometimes seen as appropriate for exploratory research
DS are also likely to invite conclusions that signify false positives, such that stats tests effectively become liberal.
Have not been scrutinized by PRA
Conservatism usually corresponds to effect sizes that are biased downward and Type 1 error rates that are minimized at the cost of Type 2 error.
Need a balance between liberal and conservative Slide6
Alternatives to DS that are themselves problematic
Myth 3:Measures that elicit direct comparisons avoid problems with difference scores
Merely shift the responsibility of creating a DS from the researcher to the respondent – must calculate response – error
Direct comparisons is double barreled, combines two distinct concepts into a single score
Construct validity of direct comparisons - questionableSlide7
Myth 4: Categorized Comparisons Avoid Problems with DS
Creation of subgroups based on the congruence between two component measures to avoid problems with DS
Some researchers even say it could solve reliability issues
Creates illusion
Accentuates the loss of information and reduction in explained variance
Just makes things worse Slide8
Myth 5: Product Terms are Viable Substitutes for DS
Some turn to product terms tested hierarchically in multiple regression analysis as last resort
Captures the interaction between two variables
Does not represent the effects of congruence for continuous measures Slide9
Myth 6:Hierarchical Analysis Provides Conservative Tests of DS
Some studies statistically control for component measures before estimating the effects of DS.
Characterized as conservative
Components are controlled when testing interactions using product terms
Does not yield conservative tests of DS, instead alters the relationships DS are intended to capture Slide10
Misunderstandings or misguided criticisms of PRA
Myth 7: PRA is an Exploratory, Empirically-Driven Procedure
Claimed that PR capitalizes on sample specific variance to maximize the amount of variance explained
Primary goal of PRA is to test hypotheses derived from theories of congruence
Also provides an explicit test of this hypothesis whereas using an algebraic difference score incorporates this hypothesis as an untested assumption
DS allow congruence hypotheses to evade empirical scrutiny
Lack of evidence necessary to confirm or reject hypotheses. Slide11
Myth 8: Polynomial Regression Suffers from
Multicollinearity
Concerns of
multicollinearity
between lower order and higher order terms are unfounded
Myth 9: Higher-Order Terms Do Not Enhance the Understanding of Congruence
Interpretation of higher order terms can be difficult, such difficulties arise from attempts to interpret coefficients on higher order terms individually.
Can be avoided by using response surfaces as the intermediary between congruence hypotheses and PR coefficients Slide12
Myth 10: PR Eliminates the Concept of Congruence
Comes from the assumption that a DS represents a concept that is distinct from its components. Argued that DS and their component measures are not conceptually interchangeable.
DS is calculated from its components it cannot represent a construct that is conceptually or operationally distinct from its components. Slide13
Assumptions
All can be tested empirically – so why argue?
PRA has its limitations
More comprehensive and conclusive that information obtained from difference scores Slide14
Things I Have Learned (So Far)
by Jacob Cohen Slide15
Some Things You Learn Aren’t So
Proper sample size of 30 cases per group when comparing groups
Any lower than 30 required specialized handling with “small sample statistics”
Versus critical-ratio approach
Can lead to only a fifty-fifty chance of getting significant resultsSlide16
Less is More
Should be studying few
IV’s
and even fewer
DV’s
Which
DV’s
are real and which are due to chance
As number of
IV’s
increase chances their redundancy in regards to criterion relevance also increases
Reporting numerical results
What does
r
= .12345 really mean?
Serve as a distraction from meaningful leading digits Slide17
Simple Is Better
Reporting of Data and Representation
Do not usually make it possible for most of us or consumers of products to actually see and understand the distribution
Need for graphic representation
Computers and Statistical packages
Loss of contact with data
Idea that knowledge of statistics isn’t necessary to use Slide18
Compositing of Values
Beta weights vs. unit weights
Generate a higher correlation than any other weight.
CATCH!
Only guarantees to be better than unit weights for the sample on which they were determined.
Very rare circumstances when Beta is better
Unit weights are usually more practical (+1 for positively related predictors, -1 for negatively related predictors, and 0).
Work well outside of multiple regression when we have criterion data
Better on standardized scores for our purposes than those generated by program Slide19
The Fisherian Legacy
Based on principle that science proceeds only through inductive inference, which is achieved by rejecting the null hypothesis, usually at .05 level.
Misinterpretation of Yes/No decision feature
Research is frequently designed to produce decisions, although things are not always so clearly decision oriented
Null Hypothesis – any statement about a state of affairs in a population, usually the value of a parameter, frequently zero. It is called a null hypothesis because the strategy is to nullify it or because it means “nothing doing”. Slide20
The Dreaded .05 Level
Basis for decision – cut off level
Lead to possible data fudging to massively altering data to dropping cases where there “must have been errors” Slide21
The Null Hypothesis Tests Us
Results do not tell us the truth of the null hypothesis, must turn to Bayesian stats in which probability isn’t a relative frequency but a degree of belief.
What is does tell us is the probability of the data given the truth of the null
NOT THE SAME THINGSlide22
p Value
P value does not tell us the probability that the null is true, then it cannot tell us the probability that the research is true.
Rejection of null gives us no basis for estimating the probability that a replication of the research will again result in rejecting the null.
True meaning of statistical significance
Effect is not nil, and nothing more
TemptationSlide23
Problems with NH
If the NH is almost always false, what’s the big deal about rejecting it?
Also supported by
Trafimow
and Rice (2009).
If tests exceeded critical value, you could conclude that null is false, but if you fell short of that value you couldn’t conclude it was true.
Reality: Can’t conclude anything.
If null was false – had to be false to some degree Slide24
Power Analysis
Based on four parameters
Alpha significance criterion
Sample size
Population effect size
Power of the test
Made it possible to “prove” null hypotheses
By showing that it is of no more than negligible or trivial size
Must consider the magnitude of effects Slide25
How To Use Statistics
Use of graphic and numerical analyses in ways in which we can understand them.
Plan the research
Must have credible set of specifications or discover research is not possible.
Use of effect size measures which include mean differences, correlations, and squared correlation of all kinds. All of which will lead you to a sample effect size Slide26
How To Use Statistics
After finding the sample effect size, attach a
p
value (or better) a confidence interval.
Most important rule – judgment of the scientist Slide27
Take Home Message
A single piece of research doesn’t settle an issue once and for all. Only a successful future replication in same and different settings provides an approach to settling the issue.
.05 should not be a cliff, but a reference point along the possibility-probability continuum.
Things take time. Slide28
The Earth Is Round (p
<.05)
By Jacob Cohen Slide29
Problems with Null Hypothesis
Does not tell us what we want to know
Given this data, what is the probability that NH is true
Really says, “Given that NH is true, what is the probability of these (or more extreme) data?”Slide30
The Permanent Illusion
Misapplication of deductive syllogistic reasoning
Invalid Bayesian interpretation
Level of significance at which the NH is rejected (.05) is the probability that it is correct, or at least that it is of low probability Slide31
Why P(D|Ho) ≠
P(Ho|D
)
P(D|Ho
) = when Ho is tested, finding the probability that the data could have arisen if Ho were true
The real issue =
P(Ho|D
) the inverse probability
The probability that Ho is true given the data
Reason why we conduct statistical tests – to be able to reject Ho because of its unlikelihood Slide32
Posterior Probability
Available only through Bayes’s theorem
Have to know the probability of the NH before the experiment, the “prior” probability P(Ho)
Problem: We do not normally know this
Can be done through Bayesian Stat’s by posting prior probability or distribution of probabilities.
Extremely unreliable
Use of different prior probabilities
G.K. Huysamen (2005). Slide33
Illusion of Attaining Improbability
Also known as Bayesian Id’s Wishful Thinking Error
Extremely easy to make
Made by 68 out of 70 academic psychologists studied by Oakes (1986, pp. 79-82).
Problem: Belief that after a successful rejection of Ho it is highly probable that replications will also result in rejection of Ho.
Could not be farther from the truth
Just because Ho is rejected does not mean that the theory is established.
Remember – Science experiment is not to make decisions but to make adjustments to the degree of belief. Slide34
The Nil Hypothesis
The null in Ho is taken to mean nil or zero
Which is mistakenly thought as the effect size is 0 – that the population mean difference, correlation, and raters reliability is 0 (a Ho that can almost always be rejected, even with a small sample)
Criticism – Where its use may be valid only for true experiments involving randomization (controlled clinical trials) or when any departure from pure chance is meaningful (laboratory experiments or clairvoyance) Slide35
What To Do
Do not look for an alternative to NHST
Must understand and improve data before we can generalize from our data
Report ES through confidence intervals
Improve our measurement by reducing the unreliable and invalid parts of the variance in our measures.
Use of informed judgment when using theories Slide36
Discussion Questions
Why do you think many researchers still support NHST as it stands?
Has psychology as a field become more focused on getting significant results rather than completing the proper process of an experiment? Do you think it is more prominent in other fields?
How can we as psychologists eliminate confusion and misuse of NHST?Slide37
References
Cafri
, G., Van den Berg P., &
Brannick
, M.T. (2009). What have the difference scores not been telling us? A critique of the use of self-ideal discrepancy in the assessment of body image and evaluation of an alternative data-analytic framework.
Assessment,
17(3), 361-376.
Cohen, J. (1994). The earth is round (
p
<.05).
American Psychologist,
49(12), 997-1003.
Cohen, J. (1990). Things I have learned (so far).
American Psychologist,
45(12), 1304-1312.
Edwards, J.R. (2001). Ten difference score myths.
Organizational Research Methods,
4(3), 265-287.
Huysamen
, G.K. (2005). Null hypothesis significance testing: ramifications, ruminations, and recommendations.
South African Journal of Psychology,
35(1), 1-20.
Trafimow
, D. & Rice, S. (2009). A test of the null hypothesis significance testing procedure correlation argument.
The Journal of
General Psychology,
136(3), 261-269.