Familiarity with the Response Category Labels on Item Response to Likert Scales Bert Weijters Maggie Geuens Hans Baumgartner Motivating Example a French researcher wants to replicate an empirical finding that was established in the US using data based on consumer selfreports in ID: 279500
Download Presentation The PPT/PDF document "The Effect of" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Effect of Familiarity with the Response Category Labels on Item Response to Likert Scales
Bert WeijtersMaggie GeuensHans BaumgartnerSlide2
Motivating Examplea French researcher wants to replicate an empirical finding that was established in the U.S. using data based on consumer self-reports in France;in the English questionnaire, a Likert scale with endpoints of ‘strongly disagree’ and ‘strongly agree’ was used;should the French researcher use ‘fortement d’accord’ or ‘tout à fait
d’accord’? Slide3
Research questionsDo the labels attached to the response scale categories influence response behavior (i.e., how many respondents endorse the extreme scale categories)? What causes this effect?How can the effect be mitigated? What are the implications for
multilingual and monolingual surveys?Slide4
Research questions (cont’d)various characteristics of rating scales have been studied, but the problem of choosing appropriate labels for the response categories has been largely ignored;this is surprising because category labels typically apply to many if not all of the items in a questionnaire;if differences in responding to survey items as a function of the category labels
have been acknowledged, the effect has generally been attributed to the perceived intensity of the labels (intensity hypothesis);in this research we propose the familiarity hypothesis (i.e., scale categories marked by labels that are used more often in day-to-day language are more likely to be endorsed) and
contrast it
with the intensity hypothesis;Slide5
Literature reviewcertain aspects of the labels attached to the response categories on rating scales can have systematic effects on people’s responses to questionnaires:the range of response alternatives provided can influence respondents’ answers to questions (Schwarz et al. 1988);the response alternatives provided (e.g., feeling really irritated “several times a day” to “less than twice a week” vs. “more than once every 3 months” to “less than once a year”) may affect the interpretation of the question (Schwarz et al. 1988);
use of different numeric values (-5 to +5 vs. 0 to 10) can change the meaning of endpoint labels such as “not at all successful” (Schwarz et al. 1991);can endpoint labels in Likert scales that differ in terms of the amplifier used (e.g., “strongly” vs. “completely” (dis)agree)
change responses?Slide6
Inferences based on the
range of response alternatives provided
Schwarz et al. (1985)
question about hours spent watching TV
Response options (low range)
%
Response options (high range)
%
< ½ hr
7.4
<2 ½ hrs
62.5
½ to
1 hr
17.7
2 ½ to 3 hrs
23.4
1 to 1 ½ hrs
26.5
3 to 3 ½ hrs
7.8
1 ½ to 2 hrs
14.7
3 ½ to 4 hrs
4.7
2 to 2 ½ hrs
17.7
4 to 4 ½ hrs
1.6
> 2 ½ hrs
16.2
> 4 ½ hrs
0.0Slide7
The intensity hypothesisintensity is defined as the degree or extent of the attribute expressed by the label (e.g., degree of agreement or disagreement, extent of liking);prior research shows that scale anchors in general (e.g., adjectives for evaluating products, such as “good”, “terrific”, or “superior”, as in Wildt and Mazis 1978) and amplifiers used in Likert scales (e.g
., “slightly”, “somewhat” or “very much” agree, as in Spector 1976) differ in perceived intensity; more intense labels represent more extreme positions, which should be endorsed less often (e.g., agree vs. strongly agree; superior vs. very good);Slide8
The intensity hypothesis (cont’d)Wyatt and Meyers (1987) found that when the extremes of the response scale were anchored by narrower or less absolute labels (i.e., “agree” and “disagree”), responses were distributed more evenly across all five scale steps, whereas when the response scale was bordered by wider or more absolute labels (i.e., “strongly agree” and “strongly disagree”), responses were concentrated more on the intermediate scale steps;even more subtle adverbial modifiers (e.g., strongly vs. completely agree) may influence response behavior;Slide9
The intensity hypothesis (cont’d)prior evidence that different intensities are associated with different adverbs (e.g., Cliff 1959; Smith et al. 2009), but little evidence that different adverbs lead to differential category endorsement;Hintensity: Endpoint response categories are endorsed less frequently if their labels are more intense.Slide10
The familiarity hypothesisaccording to the open choice model in linguistics, the only constraint on the concatenation of words is that the rules of grammar be respected;in contrast, the idiom principle states that combinations of words are used in conventional patterns, which leads to the phenomenon
of collocation:certain word combinations co-occur more often than would be expected based on their individual frequencies (e.g., strong tea vs. powerful tea);“of the large repertoire of amplifiers available for expressing a high degree of intensity, speakers rely on a rather limited set of items, and only a few of these are used with great frequency
” (
Altenberg
1991
,
p. 133)Slide11
The familiarity hypothesis (cont’d)formulaic sequences such as collocations are not only used more frequently by language users and are thus more familiar, but are also processed more quickly (Conklin and Schmitt 2008; Durrant 2008);based on research
on meta-cognitive experiences, this suggests that more familiar, high-frequency labels are more likely to be endorsed (Alter et al. 2007; Fang, Singh, and Ahluwalia 2007; Hawkins and Hoch 1992;
Unkelbach
2007;
Winkielman
et al. 2003
) :
repeated
exposure to a stimulus has beneficial effects on processing
fluency;
repeated
and more fluently processed statements are more likely to be rated as true;
stimulus repetition and fluent processing increase liking, preference and confidence judgments;Slide12
The familiarity hypothesis (cont’d)since collocations have been shown to be processed more quickly, familiar (vs. unfamiliar) labels, because of their greater processing fluency, should be chosen more confidently as the true and preferred response option;Arce-Ferrer
(2006) showed that respondents who were less familiar with the meaning of the intermediate scale categories were more likely to engage in extreme responding and therefore less likely to endorse response options with which they were not familiar;Hfamiliarity: Endpoint
response categories are endorsed more frequently if their labels are more familiar.Slide13
Two alternative hypotheses to explain the effect of response category labels
Intensity hypothesis:
H
intensity
: Endpoint response categories are endorsed less frequently if their labels are more intense.
Familiarity hypothesis:
H
familiarity
:
Endpoint
response categories are endorsed more frequently if their labels are more familiar.Slide14
Study 1: Scaling intensity and familiarityDo different methods for scaling the intensity and familiarity of response category labels lead to similar results?If the intensity or familiarity of scale labels is to have a reliable effect on responses to questionnaires, consistent differences in the perceived intensity and fluency of category labels should emerge across respondents.Can we identify endpoint labels that vary significantly in intensity and familiarity for use in subsequent studies?
We need two labels that imply contradictory responses under the intensity and familiarity hypotheses.Slide15
Study 1 (cont’d)Label intensityDirect ratings of intensity (0 = neutral; 10 = 100% agreement)Pairwise comparisons of intensity (“Which expression indicates the stronger sense of agreement?”)Label familiarity
Direct ratings of familiarity (0 = we never use this term in day-to-day language; 10 = we use this term very often in day-to-day language)Pairwise comparisons of familiarity (“Which expression is more commonly used in day-to-day language?”)
Lexical decision task (press a button labeled ‘end category label’ or ‘not an end category label’ for 6 endpoint labels and five non-endpoint labels)
Word frequency counts in corpora of texts (Google hits, available for
specific word combinations in particular countries and languages
)Slide16
Study 1: MethodSample 1: 83 undergraduates; pairwise comparisons of intensity and familiarity of six endpoint labels;Sample 2: 112 respondents (mean age 32.03, 66% female) from an online panel; direct ratings of intensity and familiarity on 11-point scales;Sample 3: 125 undergraduates (57% female); lexical decision task;Slide17
Study 1: ResultsSlide18
Study 1: Results (cont’d)for intensity, the correlation of the means obtained from the paired comparison and direct rating tasks is .92;the correlations of the means derived from the four familiarity methods range from .94 to .97;thus, there is considerable consistency in respondents’ judgments of the perceived intensity and familiarity of different category labels;‘sterk eens’ (strongly agree) consistently emerged as one of the least intense and least familiar labels, while ‘
volledig eens’ (completely agree) surfaced as one of the most intense and most familiar labels;Slide19
Study 2Direct test of the intensity and familiarity hypotheses:The endorsement rate for a high intensity and high fluency label should be relatively low if the intensity hypothesis is true, and it should be relatively high if the fluency hypothesis is true.Slide20
Measuring response distributionsA major challenge is to measure differences in response distributions that are not item-specific and independent of substantive content;To do this, we need to observe patterns of responses across heterogeneous items (i.e., items that do not share common content but have the same response format):Deliberately designed scales consisting of heterogeneous items (Greenleaf 1992)Random samples of items from scale inventories (Weijters, Geuens & Schillewaert 2010)Slide21
Study 2: Methodonline survey with Dutch-speaking panel members of an online market agency (N = 218); the respondents ranged in age from 20 to 65 years (M = 43.2, SD = 11.7), 47 % were female, and 58% had schooling beyond secondary
school; respondents were randomly assigned to questionnaires varying the endpoint labels (5-point scale):‘completely (dis)agree’ (high intensity/familiarity)‘strongly (dis)agree’ (low intensity/familiarity)questionnaire
consisted of
16
heterogeneous items (4 pages with 4 items per page) taken from unrelated scales (e.g., “Air pollution is an important worldwide problem”, “I often give compliments to others
”);
pairwise
comparisons of the two response category
labels in
terms of intensity and
familiarity; Slide22
Study 2: ResultsThe manipulation of intensity/familiarity was successful;The findings support the familiarity hypothesis:Intensity
FamiliarityMean number of extreme responsesStrongly agree22%
10%
3.1 (.26)
Completely agree
78%
90%
4.4
(.33)Slide23
Study 3the results of Study 2 are presumably due to the fact that more familiar labels are more easily processed and that this ease of processing inadvertently influences respondents’ answers to survey questions;as long as the relevance of meta-cognitive experiences is not called into question, people consider this information as diagnostic and incorporate it into their judgments by relying on naïve theories such as, “If the information comes to my mind easily, it must be true or I must like it
”;however, when the diagnosticity or informational value of meta-cognitive experiences is called into question, people discount this information and either turn to alternative naïve theories such as “The information comes to mind easily because I have often heard it” or use the cognitive content of the
stimulus;Slide24
Study 3 (cont’d)this suggests that making respondents aware that more familiar response labels may attract more responses and that this may lead them to more readily select the category label “completely (dis)agree” should eliminate the previously observed familiarity effect; Slide25
Study 3: Method
Online survey with 122 respondents of a university panel (67.2% women, average age of 29 years);
2 x 2 between-participant design:
i
ntensity/familiarity of the endpoint labels manipulated as in the previous study (
‘completely agree’ vs. ‘strongly agree’);
awareness of the label familiarity
effect (depending on whether the following instructions were given before or after the collection of the DV):
In
questionnaires, there are several different ways of labeling response categories (e.g., “strongly (dis)agree” or “completely (dis)agree”). Previous research has shown that labels that are used more commonly in day-to-day language are more often selected as a response. This happens irrespective of the true opinion of the respondent on the subject of the question
.
DV is the number of endpoints
responses to 16 heterogeneous questions;Slide26
Study 3: Results
the findings support the familiarity hypothesis in the unaware condition, but the effect goes away when respondents are made aware of the label familiarity effect :Slide27
Implications of the category labeling effect for cross-cultural researchresponse category labels can affect findings in a single-language context (e.g., meta-analytic comparisons), but they are particularly important in cross-cultural research, where labels have to be translated; two types of translation:literalidiomaticsome authors have emphasized the need to choose scale anchors that are equal in intensity (e.g.,
Harzing 2006), and prior research has demonstrated that supposedly similar terms may differ in intensity across languages (e.g., ‘definitely’ vs. ‘bestimmt’; see Smith et al. 2009);however, translated adverbial modifiers may also differ in familiarity;Slide28
Schematic
representation of the translation process (based on Bassetti and Cook 2011)Slide29
Study 4: Methodapprox. 200 English- or French-speaking respondents in five regions (nationality/language combinations) of North America and Europe;five endpoint labels in each language;16 heterogeneous items from Greenleaf (1992), rated on 5-point scales;pairwise comparisons of the six labels plus “agree” or “d’accord” in terms of intensity and familiarity;Slide30
Study 4: Method (cont’d)
France
USA
Canada
UK
Total
Language
French
227
0
203
0
430
English
0
185
196
187
568
Total
227
382
399
187
998
Version
English
French
Strongly agree
Fortement
d'accord
Completely agree
Complètement
d'accord
Extremely agree
Extrêmement d'accord
Definitely
agree
Définitivement d'accord
Fully agree
Entièrement d'accord
Very much
agree
Tout à fait
d'accordSlide31
Study 4: Results
Intensity and familiarity ratings by region
Note: Correlation between the
familiarity
ratings and the natural logarithm of the number of Google hits was at least .88.Slide32
Study 4: Resultslinear regression of the number of endpoint responses onlabel intensitylabel familiarity4 dummy variables representing the five regions
only label familiarity had a significant effect (Standardized B = .38, p < .05, R² = .14); in other words, the number of endpoint responses increases as a function of label familiarity, regardless of country and language;Slide33
Study 5demonstration that familiarity is a viable determinant of extreme responding differences between regions in a large-scale international survey;illustration of how to construct and use relative measures of familiarity and extreme responding based on secondary data only;Slide34
Study 5: Method13,520 respondents from 17 European regions;16 heterogeneous items based on Greenleaf (1992);use of fully labeled 7-point response scales;familiarity: relative measure of familiarity as the natural logarithm of the ratio of the number of Google hits for the 1st and 7th category (strongly agree or disagree) to the number of Google hits for the 2
nd and 6th category (agree or disagree);endorsement: relative endorsement of the 1st and 7
th
vs. the 2
nd
and 6
th
response categories (natural logarithm); Slide35
N
female
M
age
SD
age
Belgium, Dutch
644
51%
41.0
11.1
Belgium, French
371
51%
40.5
11.7
UK, English
908
56%
41.8
11.3
Germany, German
993
50%
39.3
11.0
Hungary, Hungarian
1003
51%
38.3
11.8
Slovakia, Slovakian
1063
50%
38.2
12.1
Poland, Polish
802
37%
32.2
11.0
Netherlands, Dutch
1046
50%
40.8
11.4
France, French
1000
51%
39.4
11.9
Spain, Spanish
934
50%
37.8
10.5
Romania, Romanian
970
50%
37.9
11.5
Turkey, Turkish
914
43%
32.5
9.4
Italy, Italian
939
50%
39.0
10.6
Switzerland, French
303
51%
42.5
9.7
Switzerland, German
60648%43.59.4Switzerland, Italian5056%32.98.7Sweden, Swedish97449%39.911.3Total1352049%38.711.4
Sample descriptive statistics Pan-European study (Study 7 and 8)Slide36
Study 5: ResultsNote: Standardized B = .68, p < .05, R² = 46%.Slide37
Study 5: Results (cont’d)prior research has generally attributed differences in response distributions in cross-cultural comparisons to nationality and national culture;our findings demonstrate that different labels may vary in terms of familiarity, which can lead to different response patterns across
languages;in particular, if the endpoint label used in a certain language is more familiar than the one used in another language (relative to the adjacent category label), it is likely that the endpoint will be selected more frequently in the former than in the latter language;Slide38
Discussion: Summary of findingsresponse category labels that are more commonly used in day-to-day language (i.e., that are more familiar) lead to higher endorsement of their associated response categories;respondents do not simply scale response categories along an intensity dimension and then map their latent response to the best-matching category, but they are also influenced by the familiarity of the labels;the category label familiarity effect can be eliminated by making respondents aware of the potentially biasing effect of label familiarity, the problem may be particularly serious in cross-cultural research when different languages are used;
however, researchers can control for differences in label familiarity across languages based on secondary data; Slide39
Consequences of the response category label effectif certain labels attract more responses, this leads to bias;Baumgartner and Steenkamp (2001) discuss how extreme responding biases scale scores:if the modal scale response is above the
midpoint, average scores will be inflated;if the modal scale response is below the midpoint, average scores will be deflated;relationships between variables can also be biased;Slide40
Consequences of the response category label effect (cont’d)imagine a situation in which the strength of a relationship is compared across two groups and labels that differ in familiarity are used to collect data in the two groups; the DV, an attitudinal variable (ATT), is measured on an agreement rating scale, and the IV (e.g., AGE in years) is measured on an objective scale and
hence not affected by differences in label familiarity;compared to respondents in the unfamiliar label condition, respondents in the familiar label condition who have a moderately positive or negative true attitude will exhibit a more extreme positive or negative observed attitude because they are more likely to endorse the endpoints;this can result in a steeper observed slope and thus a stronger relationship between the objective antecedent and the observed attitude in the familiar label condition;Slide41
Consequences of the response category label effect (cont’d)using data from Study 2:ATT: “I try to avoid food that is high in cholesterol”IV: Age in yearsSlide42
Implications formultilingual survey researchtranslations usually imply a trade-off between the attempt to be literal and the attempt to be idiomatic;optimizing equivalence: use response category labels that are equally familiar in different languages (rather than literal translations or words with equal intensity);e.g., the German and Dutch labels
“vollkommen einverstanden” and “volledig eens” are literal translations (similar to “completely agree”), but in German this expression is more familiar, resulting in more endpoint responses than in Dutch(based on Study 5);
back-translation
of response category labels
may not help because it may result
in literal rather than idiomatic translations and the familiarity of the labels in different languages may
differ;Slide43
Identifying appropriate endpoint labels
in two languages