Item Analysis: A Crash Course - PowerPoint Presentation

421 views
Uploaded On 2015-11-16

Item Analysis: A Crash Course - PPT Presentation

Lou Ann Cooper PhD Master Educator Fellowship Program January 10 2008 Validity Validity refers to the appropriateness meaningfulness and usefulness of the specific inferences made from test scores ID: 195848

test item students items item test items students difficulty reliability discrimination content single correct ability tests tested validity score

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/195848" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "Item Analysis: A Crash Course" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Item Analysis: A Crash Course

Lou Ann Cooper, PhD

Master Educator Fellowship Program

January 10, 2008Slide2

Validity

Validity refers to “the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores.”

“Validity is an integrative summary.” (Messick, 1995)

Validation is the process of building an argument supporting interpretation of test scores. (Kane, 1992)Slide3

Reliability

Consistency, reproducibility, generalizability

Very norm-referenced – relative standing in a group

Only

scores

can be described as reliable, not

tests

Reliability depends on

Test

Length – number of items



Sample of test takers – group homogeneity



Score range



Dimensionality – content and skills testedSlide4

Planning the Test

Test blueprint / table of specifications



Content, skills, domains

 Level of cognition

 Relative importance of each element

Linked to learning objectives.

Provides evidence for content validity.Slide5

Test Blueprint: Third Year Surgery Clerkship

ContentSlide6

Test Statistics

A basic

assumption: items measure a single subject

area or underlying ability.

General indicator of test quality is a reliability estimate.

The measure most commonly used to estimate reliability in a single administration of a test is Cronbach's Alpha. Measure of internal consistency.Slide7

Cronbach’s alpha

Coefficient alpha reflects three characteristics of the test:

The interitem correlations -- the greater the relative number of positive relationships, and the stronger those relationships are, the greater the reliability. Item discrimination indices and the test's reliability coefficient are related in this regard.

The length of the test -- a test with more items will have a higher reliability, all other things being equal.

The content of the test -- generally, the more diverse the subject matter tested and the testing techniques used, the lower the reliability.

Where

Total test variance = the sum of the item variances +

twice the unique covariancesSlide8

Descriptive Statistics

Total test score distribution

 Central tendency

 Score Range

 Variability

Frequency distributions for individual items – allows us to analyze the distractors.Slide9

Mean = 75.98 (6.78)

Median = 77

Mode = 72

Human Behavior ExamSlide10

Item Statistics

Response frequencies/distribution

Mean

Item variance/standard deviation

Item difficulty

Item discriminationSlide11

Item Analysis

Examines responses to individual test items from a single administration to assess the quality of the items and the test as a whole.

Did the item function as intended?

Were the test items of appropriate difficulty?

Were the test items free from defects?

Technical

Testwiseness

Irrelevant difficulty

Was each of the distractors effective?Slide12

Item Difficulty

For items with one correct answer worth a single point, difficulty is the percentage of students who answer an item correctly, i.e. item mean.

When an alternative is worth other than a single point, or when there is more than one correct alternative per question, the item difficulty is the average score on that item divided by the highest number of points for any one alternative.

Ranges from 0 to 1.00 - the higher the value, the easier the question.

Slide13

Item Difficulty

Item difficulty is relevant for determining whether students have learned the concept being tested.

Plays an important role in the ability of an item to discriminate between students who know the tested material and those who do not.

To maximize item discrimination, desirable difficulty levels are slightly higher than midway between chance and perfect scores for the item.

Slide14

Ideal difficulty levels for MCQ

Lord, F.M. "The Relationship of the Reliability of Multiple-Choice Test to the

Distribution of Item Difficulties," Psychometrika, 1952, 18, 181-194Slide15

Item Difficulty

Assuming a 5-option MCQ, rough guidelines for judging difficulty:

≥

.85 Easy

> .50 and < .85 Moderate

< .50 HardSlide16

Item Discrimination

Ability of an item to differentiate among students on the basis of how well they know the material being tested.

Describes how effectively the test item differentiates between high ability and low ability students.

All things being equal, highly discriminating items increase reliability.Slide17

Discrimination Index

- p

= proportion of students in the upper group

who were correct.

= proportion of students in the lower group who

were correct.

 .40 satisfactory item functioning

.30 

 .39 little or no revision required

.20 

 .29 marginal - needs revision

< .20 eliminate or complete revisionSlide18

Point biserial correlation

Correlation between performance on a single item and performance on the total test.

High and positive

: best students get the answer correct; poorest students get it wrong.

Low or zero: no relationship between performance on the item and the total test.

High and negative

: Poorest students get the item correct; best get it wrong.Slide19

Point biserial correlation

pbis

tends to be lower for tests measuring a wide range of content areas than for more homogeneous tests.

Items with low discrimination indices are often ambiguously worded.

A negative value may indicate that the item was miskeyed.

Tests with high internal consistency consist of items with mostly positive

relationships with total test score. Slide20

Item Discrimination

Rough guidelines for

pbis

> .30 Good

>.10 and < .30 Fair

< .10 PoorSlide21

Item Analysis MatrixSlide22

ITEM 1

ITEM 2Slide23

ITEM 4

ITEM 3Slide24

A Sample of MS1 ExamsSlide25

Cautions

Item analyses reflect internal consistency of items rather than validity.

The discrimination index is not always a measure of item quality

Extremely difficult or easy items will have low ability to discriminate

but

such items are often needed to adequately sample course content and objectives.

An item may show low discrimination if the test measures many different content areas and cognitive skills.Slide26

Cautions

Item analysis data are tentative. Influenced by:



type and number of students being tested



instructional procedures employed

 both systematic and random measurement error

If repeated use of items is possible, statistics should be recorded for each administration of each item.

Slide27

Recommendations

Valuable tool for improving items to be used in future tests – item banking.



Modify or eliminate ambiguous, misleading, or flawed items.



Helps improve instructors’ skills in test construction.



Identifies specific areas of course content which need greater emphasis or clarity.Slide28

Research

Downing SJ. The effects of violating standard item writing principles on tests and students: The consequences of using flawed items on achievement examinations in medical education.

Adv Health Sci Educ

10:133-143, 2005.

Jozefowicz RF et al. The quality of in-house medical school examinations.

Acad Med

77(2):156-161, 2002.

Muntinga JH, Schull HA. Effects of automatic item eliminations based on item test analysis.

Adv Physiol Educ 31:

247-252, 2007.