Week 8 Psych 350 R Chris Fraley Validity In our last class we began to discuss some of the ways in which we can assess the quality of our measurements We discussed the concept of reliability ie the degree to which measurements are free of random error ID: 514165
Download Presentation The PPT/PDF document "Different forms of validity and why they..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Different forms of validity and why they matter
Week 8, Psych 350 – R. Chris FraleySlide2
Validity
In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements.
We discussed the concept of reliability (i.e., the degree to which measurements are free of random error).Slide3
Why reliability alone is not enough
Understanding the degree to which measurements are reliable, however, is not sufficient for evaluating their quality.
Recall
that test-retest estimates of reliability tend to range between 0 (low) and 1(high).
Nice online correlation calculator:
https://www.easycalculation.com/statistics/correlation.phpSlide4
Validity
In this example, the measurements appear reliable, but there is a problem.
Validity
reflects the degree to which measurements are free of both random error, E, and systematic error, S.
O = T + E + S
Systematic errors reflects the influence of any non-random factor beyond what we are attempting to measure.Slide5
Validity: Does systematic error accumulate?
Question: If we create a composite of multiple observations, how will systematic errors influence our estimates of the “true” score?Slide6
Validity: Does error accumulate?
Answer: Unlike random errors, systematic errors accumulate.
Systematic errors exert a
constant
source of influence on measurements. We will always overestimate (or underestimate) T if systematic error is present.Slide7
Person
O
T
E
S
A
12
10
0
+2
B
12
10
0+2C12100+2D12100+2E12100+2F12100+2G12100+2Average12100+2
Note: Each measurement is 2 points higher than the True value of 10.
The systematic errors do not average out.Slide8
Person
O
T
E
S
A
12
10
0
+2
B
11
10
-1+2C12100+2D1310+1+2E1010-2+2F12100+2G1410+1+2Average12100+2
Note: Even when random error is present, E average to 0 but S does not. Thus we have reliable measurements that have validity problems.Slide9
Validity: Ensuring validity
What can we do to minimize the impact of systematic errors?
One way to do so is to use a variety of indicators—different sources of information.
Different kinds of indicators of a latent variable may not share the same systematic errors.
If true, then S will behave like random error
across
measurements (but not
within
measurements).Slide10
Example
As an example, let’s consider the measurement of self-esteem.
Some methods, such as
self-report questionnaires
, may lead people to over-estimate their self-esteem. Most people want to think highly of themselves.
Other methods, such as
clinical ratings
by trained observers, may lead to under-estimates of self-esteem. Clinicians, for example, may not be prone to assume that people are not as well-off as they say they are.Slide11
O
T
E
S
Self-reports
item 1
13
10
+1
+2
item 2
12
10
0+2 item 312100+2 item 41110-1+2Clinical ratings rating 11010+2-2 rating 28100-2 rating 38
10
0
-2
rating 4
6
10
-2
-2
Average
10
10
0
0Slide12
Another Example
One problem with the use of self-report questionnaire rating scales is that some people tend to give consistently high (or low) answers, regardless of the question being asked.
This is sometimes referred to as a
yay-saying
or
nay-saying
bias.
Acquiescence
.Slide13
Item
T
S
O
I think I am a worthwhile
person.
4
+1
5
I have high self-esteem.
4
+1
5
I am confident in my ability to meet challenges in life.4+15My friends and family value me as a person.4+15Average4+151 = strongly disagree and 5 = strongly agreeIn this example we have someone with relatively high self-esteem, but this person systematically rates questions 1 point higher than he or she should. Slide14
Item
T
S
O
I think I am a worthwhile
person.
4
+1
5
I have high self-esteem.
4
+1
5
I am NOT confident in my ability to meet challenges in life.2+13My friends and family DO NOT value me as a person.2+13Average4+14If we reverse key half the items, the bias averages out. Responses to reverse keyed items are counted in the opposite direction.Slide15
Validity
To the extent to which a measure has validity, we say that
it measures what it is supposed to measure
.
Big question: How do we assess validity?Slide16
Different ways to think about validity
To the extent that a measure has validity, we can say that it measures what it is supposed to measure.
There are different
reasons
for measuring psychological variables. The previse way in which we assess validity depends on the reason that we are taking the measurements in the first place.Slide17
Prediction
As an example, if one’s goal is to develop a way to determine who is at risk for developing schizophrenia, one’s goal is
prediction
.Slide18
Predictive Validity
We may begin by obtaining a group of people who have schizophrenia and a group of people who do not.
Then we may try to figure out which kinds of antecedent variables differentiate the two groups.Slide19
Correct Classifications
Lost a parent
before the age of 10
10%
Parent or grandparent had schizophrenia
50%
Mother was cold and aloof to the person when he or she was a child.
15%Slide20
Predictive Validity
In short, some of these variables appear to be better than others at discriminating schizophrenics from non-schizophrenics.
The degree to which a measure can predict what it is supposed to predict is called its
predictive validity
.
When we are taking measurements for the purpose of prediction, we can assess validity as
the degree to which those predictions are accurate
(i.e., useful).Slide21
Baserate
problemSlide22
Yes
No
Yes
40
10
Reality: Schizophrenic
Measure: Schizophrenic
No
10
40
80% ( [40 + 40] / 100) people were correctly classified (50% base rate)Slide23
Yes
No
No
Yes
40
10
Reality: Schizophrenic
Measure: Schizophrenic
40
10
50% ( [40 + 10] / 100) people were correctly classified (with a 50% base rate. Yuck.) Slide24
Yes
No
No
Yes
1
0
Reality: Schizophrenic
Measure: Schizophrenic
1
98
99% ( [98 + 1] / 100) people were correctly classified, but note the base rate problem. Cohen’s kappa is used to account for this problem. Kappa in this example is 66%Slide25
Construct Validity
Sometimes we’re not interested in measuring something just for “technological” purposes, such as prediction.
We may be interested in measuring a construct in order to learn more about it
Example: We may be interested in measuring self-esteem not because we want to predict something with the measure per se, but because we want to know how self-esteem develops, whether it develops differently for males and females, etc.Slide26
Construct Validity
Notice that this is much different than what we were discussing before. In our schizophrenia example, it doesn’t matter whether our measure of schizophrenia
really
measured schizophrenic tendencies per se.
As long as the measure helps us predict schizophrenia well, we don’t really care
what
it measures or
how
that is accomplished.Slide27
Construct Validity
When we are interested in the theoretical construct per se, however, the issue of exactly what is being measured becomes much more important.
The general strategy for assessing
construct validity
involves (a) explicating the theoretical relations among relevant variables and (b) examining the degree to which the measure of the construct relates to things that it should and fails to relate to things that it should not.Slide28
Nomological Network
The
nomological network
represents the interrelations among variables involving the construct of interest.
self-
esteem
achieve in school
distrust friends
ability to cope
-
+
+Slide29
Nomological Network & Validity
The process of assessing construct validity basically involves determining the degree to which our measure of the construct
behaves
in the way assumed by the theoretical network in which it is embedded.
If, theoretically, people with high self-esteem should be more likely to succeed in school, then our measure of self-esteem should be able to predict people’s grades in school. Slide30
Construct Validity
Notice here that establishing construct validity involves
prediction
. The difference between prediction in this context and prediction in the previous context is that
we are no longer trying to predict school performance as best as we possibly can
.
Our measure of self-esteem should only predict performance to the degree to which we would expect these two variables to be related theoretically.Slide31
Discriminant Validity
The measure should also
fail
to be related to variables that, theoretically, are unrelated to self-esteem.
The ability of a measure to fail to predict irrelevant variables is referred to as the measure’s
discriminant validity
.
self-
esteem
achieve in school
distrust friends
ability to cope
-
+
+
like coffee
0Slide32
Validity: Assessing validity
Finally, it is useful, but not necessary, for a measure to have face validity.
Face validity
: The degree to which a measure
appears
to measuring what it is supposed to measure.
A questionnaire item designed to measure self-esteem that reads “I have high self-esteem” has face validity. An item that reads “I like cabbage in my Frosted Flakes” does not.
In the context of prediction, face validity doesn’t matter. In the context of construct validity, it matters more.Slide33
A Final Note on Construct Validity
The process of establishing construct validity is one of the primary enterprises of psychological research.
When we are measuring the association between two variables to assess a measure’s predictive or discriminant validity, we are evaluating both (a) the
quality of the measure
and (b) the
soundness of the
nomological
network
.
It is not unusual for researchers to refine the
nomological
network as they learn more about how various measures are inter-related.