evaluating a test Test Usefulness

(Bachman and Palmer, 1996). Anne Mullen . anne.mullen@elul.ulaval.ca . Université laval. october 2014. Test Validity. The Progressive Matrix of Validity (Messick, 1989) conceived. to control the quality of the evaluation.

evaluating a test Test Usefulness

evaluating a testTest Usefulness(Bachman and Palmer, 1996)

Anne Mullen anne.mullen@elul.ulaval.ca Université lavaloctober 2014

Test Validity

The Progressive Matrix of Validity (Messick, 1989) conceivedto control the quality of the evaluationto guarantee that the results of the evaluation are precise to assure that the interpretations of the results are fair


1. Qualities of test usefulness definitions questions 2. Creating a valid test3. Discussion and follow-up questions

Six Qualities of Test Usefulness

ReliabilityConstruct ValidityAuthenticityInteractivenessImpactPracticalitySlide5

seeks to ascertain that the results of an evaluation are similarmeasures the coherence of results from one evaluation to anotherverifies the variation between results in different evaluationsa minimal level of reliability is determined by the context

Is this evaluation reliable?

does the evaluation allow for comparison between test-takers?does the evaluation allow for comparison with other groups of test-takers in the same session, in different sessions?

Construct Validity

a measurement by which the results of an evaluation can be interpreted as an indicator of the ability that the evaluation is measuringis said to exist if the results of the evaluation are valid in a specific context and can be generalized (valid in another similar, but different context)

Does this evaluation measure the correct construct?

does the evaluation actually evaluate the desired ability?what other abilities are measured?

the correspondence between the characteristics of the tasks of the context and those of the evaluation helps in the process of generalization of results

Is the evaluation authentic?

will the test-takers need to do similar activities in their present or future, academic or work lives?

the measure and the type of individual characteristics the test-taker uses when completing the tasks of the evaluationincludes a) the goal b) the specific group being evaluated c) the specific context of the evaluation

Is the evaluation interactive?

does the evaluation reflect the classroom activities? does the evaluation lead the test-taker to use what has been taught and learned?

the effects of the evaluation on a) society (employers), b) educational systems (administrators, teachers) and c) other stakeholders (parents and test-takers)the consequences of the evaluation must be evaluated for each stakeholder

What is the impact of the evaluation?

how are the results of the test used?is anyone affected negatively by the evaluation?who benefits from the evaluation?

the measure and the evaluation of the resources: a) human (test correctors, evaluators of the evaluation)b) material (space and equipment)c) time (test creation, the correction, analysis)

Is the evaluation practical?

can it be completed in the allotted time?can it be corrected easily and fairly for all test-takers?what resources are needed and are they readily available?

Determining Test Usefulness

Three principles to follow: find a middle ground between the 6 qualities have the six qualities combined and balanced evaluate for the context

Creation of an evaluation

You need to determine an evaluation for the following list of words: to devour, to dirty, to imbibe, to purchase, to relish, to swallow, to savour, to scorch, to slip, to taste,


The class is an intermediate 4-skills ESL class with 23 students.While listening to a text which included these ten words, take-takers were asked to answer comprehension questions. The 10 words were listed and defined due to their level of presumed difficulty.The teacher also orally explained the meaning of these words and answered any questions.

Is the text useful?

does the evaluation allow for comparison between test-takers and groups over time? (Reliability)does the evaluation actually evaluate the desired ability? Do other abilities intervene? (Construct validity)

Is the text useful?

does the evaluation reflect the test-taker's present day or future reality? (Authenticity)does the evaluation lead the test-taker's to use what has been taught and learned? (Interactiveness)

Is the text useful?

what is the effect of the evaluation? (Impact)is the evaluation easy to administer? (Practicality)

Thank you