15K - views

evaluating a test Test Usefulness

(Bachman and Palmer, 1996). Anne Mullen . anne.mullen@elul.ulaval.ca . Université laval. october 2014. Test Validity. The Progressive Matrix of Validity (Messick, 1989) conceived. to control the quality of the evaluation.

Embed :
Presentation Download Link

Download Presentation - The PPT/PDF document "evaluating a test Test Usefulness" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

evaluating a test Test Usefulness

Presentation on theme: "evaluating a test Test Usefulness"— Presentation transcript:


evaluating a testTest Usefulness(Bachman and Palmer, 1996)

Anne Mullen anne.mullen@elul.ulaval.ca Université lavaloctober 2014Slide2

Test Validity

The Progressive Matrix of Validity (Messick, 1989) conceivedto control the quality of the evaluationto guarantee that the results of the evaluation are precise to assure that the interpretations of the results are fairSlide3


1. Qualities of test usefulness definitions questions 2. Creating a valid test3. Discussion and follow-up questionsSlide4

Six Qualities of Test Usefulness

ReliabilityConstruct ValidityAuthenticityInteractivenessImpactPracticalitySlide5

Six Qualities

ReliabilityConstruct ValidityAuthenticityInteractivenessImpactPracticalitySlide6


seeks to ascertain that the results of an evaluation are similarmeasures the coherence of results from one evaluation to anotherverifies the variation between results in different evaluationsa minimal level of reliability is determined by the contextSlide7

Is this evaluation reliable?

does the evaluation allow for comparison between test-takers?does the evaluation allow for comparison with other groups of test-takers in the same session, in different sessions?Slide8

Six Qualities

ReliabilityConstruct ValidityAuthenticityInteractivenessImpactPracticalitySlide9

Construct Validity

a measurement by which the results of an evaluation can be interpreted as an indicator of the ability that the evaluation is measuringis said to exist if the results of the evaluation are valid in a specific context and can be generalized (valid in another similar, but different context)Slide10

Does this evaluation measure the correct construct?

does the evaluation actually evaluate the desired ability?what other abilities are measured?Slide11

Six Qualities

ReliabilityConstruct ValidityAuthenticityInteractivenessImpactPracticalitySlide12


the correspondence between the characteristics of the tasks of the context and those of the evaluation helps in the process of generalization of resultsSlide13

Is the evaluation authentic?

will the test-takers need to do similar activities in their present or future, academic or work lives?Slide14

Six Qualities

ReliabilityConstruct ValidityAuthenticityInteractivenessImpactPracticalitySlide15


the measure and the type of individual characteristics the test-taker uses when completing the tasks of the evaluationincludes a) the goal b) the specific group being evaluated c) the specific context of the evaluationSlide16

Is the evaluation interactive?

does the evaluation reflect the classroom activities? does the evaluation lead the test-taker to use what has been taught and learned?Slide17

Six Qualities

ReliabilityConstruct ValidityAuthenticityInteractivenessImpactPracticalitySlide18


the effects of the evaluation on a) society (employers), b) educational systems (administrators, teachers) and c) other stakeholders (parents and test-takers)the consequences of the evaluation must be evaluated for each stakeholderSlide19

What is the impact of the evaluation?

how are the results of the test used?is anyone affected negatively by the evaluation?who benefits from the evaluation?Slide20

Six Qualities

ReliabilityConstruct ValidityAuthenticityInteractivenessImpactPracticalitySlide21


the measure and the evaluation of the resources: a) human (test correctors, evaluators of the evaluation)b) material (space and equipment)c) time (test creation, the correction, analysis)Slide22

Is the evaluation practical?

can it be completed in the allotted time?can it be corrected easily and fairly for all test-takers?what resources are needed and are they readily available? Slide23

Determining Test Usefulness

Three principles to follow: find a middle ground between the 6 qualities have the six qualities combined and balanced evaluate for the contextSlide24

Six Qualities

ReliabilityConstruct ValidityAuthenticityInteractivenessImpactPracticalitySlide25

Creation of an evaluation

You need to determine an evaluation for the following list of words: to devour, to dirty, to imbibe, to purchase, to relish, to swallow, to savour, to scorch, to slip, to taste,Slide26


The class is an intermediate 4-skills ESL class with 23 students.While listening to a text which included these ten words, take-takers were asked to answer comprehension questions. The 10 words were listed and defined due to their level of presumed difficulty.The teacher also orally explained the meaning of these words and answered any questions.Slide27

Is the text useful?

does the evaluation allow for comparison between test-takers and groups over time? (Reliability)does the evaluation actually evaluate the desired ability? Do other abilities intervene? (Construct validity)Slide28

Is the text useful?

does the evaluation reflect the test-taker’s present day or future reality? (Authenticity)does the evaluation lead the test-taker’s to use what has been taught and learned? (Interactiveness)Slide29

Is the text useful?

what is the effect of the evaluation? (Impact)is the evaluation easy to administer? (Practicality)Slide30

Thank you