Martha Thurlow Cara Cahalan Laitusis Leslie Nabors Olah Karen Barton Historical Perspective on GradeBased Assessments Martha L Thurlow National Conference on Student Assessment June 21 2016 ID: 695553
Download Presentation The PPT/PDF document "Reconsidering Grade Based Assessments" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Reconsidering Grade Based Assessments
Martha Thurlow
Cara Cahalan Laitusis
Leslie Nabors Olah
Karen BartonSlide2
Historical Perspective on Grade-Based Assessments
Martha L.
Thurlow
National Conference on Student Assessment
June 21, 2016Slide3
Out-of-level Testing (OOLT) = administration
of a large-scale
assessment with items above or below
the grade in which a student is enrolled in
school.3Definition Concerns focused primarily on testing students below the grade in which they were enrolled.Slide4
4
Before NCLB “outlawed” OOLT, its advantages were perceived to be:
Making the test less onerous for students who were considered to be performing well below grade level
Increasing participation rates in the regular state assessment
4
Why Did States Use OOLT?
2001-02: 12 States
2003-04: 17 StatesSlide5
5
5
Students assessed on content for a different grade-level from the content of their instruction.
Reduced exposure to grade-level content (but not necessarily increased exposure to lower-level content)
Reduced expectations for students
5
What Were the Unintended Consequences of OOLT? Slide6
6
6
6
Students lost OTL – opportunity to learn – OTL worked against their access to the general curriculum of their peers
Students often omitted from reporting and accountability measures
When included in reporting and accountability, it miscommunicated about the success of the school – gave parents and students false hope about success and future graduation from high school
Not useful for school improvement planning
6
Unintended Consequences – cont.Slide7
7
7
7
Reflected in participation rates in general assessment (in virtually all states)
Reflected in increases in performance (in most states)
7
I
ntended Consequences of Ban on OOLT
Were Realized to Some Extent Slide8
8
From
Thompson,S
. & Thurlow , M.–
1999 State Special Education OutcomesParticipation Data in 1999Slide9
9
From Altman
, J., Thurlow, M., &
Quenemoen, R. (
2008). NCEO Brief: Trends in the Participation and Performance of Students
with
Disabilities
.
Participation Data 2001-05Slide10
10
10
From Altman
, J., Thurlow, M., &
Quenemoen, R. (
2008).
NCEO
Brief
: Trends in the
Participation
and
Performance
of
Students
with
Disabilities
.
Performance Data 2001-05Slide11
11
11
11
Measuring growth required more reliable measures at the low (and high) performance levels
Increased use of computer-adaptive testing made it possible to provide some lower grade level items without presenting an entire test out of level
11
NCLB Flexibility Opened Door to New OOLT Slide12
12
12
12
12
To help make certain that children with disabilities are held to high expectations and have meaningful access to State’s academic content standards, we write to clarify that an individualized education program (IEP) for an eligible child with a disability under ….
12
OSERS Dear Colleague Letter (2015)Slide13
13
13
13
13
13
(IDEA) must be aligned with the State’s academic content standards for the grade in which the child is enrolled….
IEP goals must be aligned with grade-level content standards for all children with disabilities
13
OSERS Dear Colleague Letter (2015)Slide14
14
14
14
14
14
Under the IDEA, in order to make FAPE available to each eligible child with a disability, the child’s IEP must be designed to enable the child to be involved in and make progress in the general education curriculum….the general education curriculum is the same curriculum as for nondisabled children
14
OSERS Dear Colleague Letter (2015)Slide15
15
15
15
15
Allows for testing with items “above and below the grade level tested”:
State retains the right to develop and administer computer adaptive assessments provided they meet requirements (a) not interpreted to mean that all students be administered the same assessment items, and (b) measure, at a minimum, each student’s academic proficiency based on challenging State academic standards for the student’s grade level and growth toward such standards, and may measure proficiency and growth using items above or below the student’s grade level (pp. 73-74)
15
Every Student Succeeds Act Slide16
16
16
16
16
16
(c)(1) At its discretion, a State
may
administer the
assessments
required under this section in the form of
computer-adaptive
assessments if such assessments meet the requirements
of
section 1111(b)(2)(J) of the Act and this section.
A computer-adaptive
assessment-–
(
i
)
Must
measure a student’s academic proficiency based
on
the challenging State academic standards for the grade in which
the
student is enrolled and growth toward those standards;
and
16
CAT Language in Draft Regulations Slide17
17
17
17
17
A
computer-adaptive
assessment-–
(ii)
May
measure a student’s academic proficiency and
growth
using items above or below the student’s grade level.
17
Regulations Language – cont.Slide18
18
18
18
18
18
(2) If a State administers a computer-adaptive assessment,
the
determination under paragraph (b)(3)(
i
)(B) of this
section
of a student’s academic proficiency for the grade in which the
student
is enrolled
must
be reported on all reports required by
§200.8
and section 1111(h) of the Act.
18
Regulations Language
– cont.Slide19
19
19
19
19
19
19
(d) A State
must
submit evidence for peer review
under section
1111(a)(4) of the Act that its assessments under this
section
and §§200.3, 200.4, 200.5(b), 200.6(c), 200.6(f)(1)
and
(3), and 200.6(g) meet all applicable requirements.
19
Regulations Language
– cont.Slide20
20
20
20
20
20
20
20
New Era – Items may be above or below student’s grade level, but…..
proficiency
must be reported for
the grade in which the student is enrolled
How should this be done to avoid unintended consequences and realize positive consequences?Slide21
Reconsidering Grade Based Assessments: Why adaptive is not sufficient and how learning progressions could support new system designs
Cara Laitusis and Leslie Nabors OlahSlide22
Overview of presentationWhy on grade level adaptive is not sufficient to measure what all students DO knowOverview of one option (Learning Progressions)
Current
l
imitationsRecommendations for starting to move us forwardSlide23
Few guardrailsNOT arguing for an alternate assessment model ONLY for students with disabilities. Many lower performing students could benefit from a new model. However, these examples will focus on special education students as one group that is over represented in the lower end of the distribution.
NOT arguing for
an off grade assessment but rather a system that integrates information across grade levels to provide more instructionally relevant information to local and state educators.
23Slide24
What can/has been tried?Students excluded from state assessmentsOut of Level Testing (OOLT)Accommodations/Tools/Features
Universal Design
Modified Assessments/Within
Grade Level ScaffoldsAdaptive Within Grade LevelAdaptive Out of Level or ‘Off G
rade Level’Adaptive Based on Across Grade Level Learning Progressions24Slide25
Performance Gaps and Solutions
25
MODIFIED ASSESSMENTS
5% of grade 5 students in California took a modified science assessments last year
GRADE BASED LINEAR12-20% of students with disabilities responded at chance level on grade based assessment PRIOR to more rigorous college and career ready standards.GROWTH MODELSGrowth models struggle when measurement precision is poor (e.g. when students perform in the tail of the distribution) and when the student or the testing program changes from year to year (e.g., changes in accommodations used, moving from alternate to general assessment)
GRADE BASED
ADAPTIVE
While adaptive assessments may improved measurement of growth they still struggle to provide instructionally relevant feedback for
students
in the
extreme tails
of the distribution
OFF GRADE ADAPTIVE
Off grade level adaptive has the advantage of adapting items to easier content but performance on some standards is likely to be impacted by instructional recency.Slide26
Blue skyAn equitable assessment system where all parts of system operate togetherSummative tests that are not burdensome to schoolsSummative assessments that point to interim and formative assessments for additional information
Locus of control for interim and formative assessments are school and classroom driven
Assessment results are instructionally relevant for students performing well below grade level by providing information on where students are across a learning progression
What do they know?What comes next?
26Slide27
Learning progressionsSlide28
Learning Progressions28
Copyright © 2016 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 33537
A
learning progression describes the stages of student understanding of a concept
as
the student’s understanding
matures
from a basic understanding to a sophisticated or robust understanding
.
The learning progression captures misconceptions or partial understandings that are not correct, but that can be built on to help students develop a more complete understanding. Slide29
Proportional Reasoning Learning Progression29
Copyright © 2016 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 33537Slide30
Mapping to the CCSS30
Copyright © 2016 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS. 33537
Grade 6
High SchoolSlide31
Proportional Reasoning, Levels 2 and 3
Jason has a collection of hockey cards and basketball cards. He has 240 hockey cards and 320 basketball cards.
Part A:
What is the ratio of hockey cards to basketball cards in Jason’s collection?Part B: Jason gave a friend 40 basketball cards in exchange for 40 hockey cards. State whether the ratio of hockey cards to basketball cards in Jason’s collection changed, and explain your thinking.
31Slide32
Proportional Reasoning, Levels 3 and 432Slide33
LimitationsLess than 100% alignment between the CCSS-M and the learning progressions.Some domains have more complete alignment than others.
The use of LPs in assessment design may need to vary by content area.
Additional information on student understanding may require additional testing time
Multiple levels of LP may take more testing time33Slide34
RecommendationsAdaptations of the CCSS to allow for a developmental perspective on how students gain knowledge and skills over time.Great exploration of how learning
p
rogressions can provide a foundation for assessment rubrics.
Empirically test LPs on state assessments through careful placement of field test itemsEven if we can’t report on learning progressions today we can still move this work forward on existing summative assessmentsExplore how to link state assessment scores to starting points on interim and formative assessments
34Slide35
Next StepsIntegrate items aligned to existing learning progressions into existing assessmentsField test itemsAcross grade level linking items
Evaluate LPs (both across grade level performance for a single year and across years)
Integrate learning progression theory into system based test design.
Summative items point to interim and formative itemsInterim scores point to starting point on summative assessment
35Slide36
Questions?Cara claitusis@ETS.org @caralaitusis
Leslie
lnaborsolah@ETS.org
36Slide37
Off Grade/Out Of Level TestingAn Assessment System and Learning Progression Option
Karen Barton
CCSSO/LSAC 2016Slide38
Summative assessments are built to capture
a sampling of content,
a mere snapshot
from which student ability
is estimated and
inferences and
accountability flow. Slide39
“
the situation is like trying to understand an artist’s work by examining only a few, disconnected pieces of it, or by watching only the first act of a three-act play.”
Herman, 2010Slide40
Sampling:
Standards
Assessable StandardsSlide41
Assessable Standards
Prioritize
and determine
proportional representations:“domain sampling”Slide42
Assessable Standards
Items
Items
Items
ItemsItems
Items
Items
Items
Items
Items
Interpret
standards and
operationalize
as items across various typesSlide43
Target score distribution
Items/
TasksSlide44Slide45
Item
Item
ItemSlide46
Item
Item
Item
Test
Scale ScoreSlide47
Item Information
Cut 1
Cut 2Slide48
Precision DecisionTradeoff: information, precision, and instructional usefulness with measurement error ~ little information for students in tails.
Bielinksi
, J.,
Thurlow
, M., Minnema, J, & Scott., J (2000) http://www.cehd.umn.edu/NCEO/onlinepubs/OOLT2.htmlSlide49
ALDs
Total Scores
Sub Scores
Items/
Tasks
Useful, meaningful, reliable, valid,
for all?Slide50
Is off level just another way of testing a lower level form, vertically scaled?Slide51
Assessable Standards
Items
Items
Items
ItemsItems
Items
Items
Items
Items
Items
Assumes comparable instruction: standards and methodSlide52
Assessable Standards
Items
Items
Items
ItemsItems
Items
Items
Items
Items
Items
May require additional effort of transfer beyond context (instructional vs tested), especially for students with needsSlide53
Would the sampling of standards and items be different under an instructionally focused assessment model?
Assessable StandardsSlide54
Where assessment design and intended purpose collide…Slide55
Assessment Design InterferenceSampling of Standards:
Not all standards and parts of standards can be assessed
summatively
:Even in CAT designs, practically, not everything can be tested.Slide56
Assessment Design InterferencePsychometrically: what is “known” or not
Items are selected to be moderately easy/difficulty for the most students – i.e. middle of bell
Even with extended response items, scoring expectations are often constrained to limited set of points that, still focused on correct/incorrect.
Did the student get the item correct or not and, collectively across the test, what does that say about the students overall ability? Results in a single scoreSlide57
Scale Score
Sub score
Achievement Level
Growth
Where are you in the ocean of learning?
How are you doing on this part of the path?
What have you learned along the way?
How much have you grown?Slide58
Assessment Design InterferenceInferences from limited information
Overall ability:
Scale scores with little to no information of the error.
Few report band/confidence interval about that score – particularly harmful for students at tailsSub scores with low reliability and limited usefulness
Achievement levels developed post-testing to represent what we wish for apart from standards sampling based on limited items within achievement levels and at cuts/thresholds – and do they really drive instruction?Slide59
Assessment Design InterferenceGrowth Models
Infer students who grow more know more
Often misused and misunderstood: scale scores used to set growth targets for individual students based on changes to scale scores (regardless of SEMs) ; variability of “growth” within achievement level
.Slide60
Assessment Design InterferenceGrowth Models
Does growth for students at tails mean more/something different than those at the mean?
Is the test designed to measure growth at the tails and to do that well?
How does the vertical scale and equating play into the interpretation of growth, particularly at the tails? Briggs and Peck (2015) state: “We take the position that the best way to move the science behind vertical scaling forward is to place a greater emphasis on design issues.”Slide61
Using Learning Progressions to Design Vertical Scales that
Support Coherent Inferences about Student Growth
Use of learning progressions to building vertical scales and showing meaningful growth; and “be a bridge between summative and formative assessments.”
“If the sole purpose is to take a grade-specific inventory of the different knowledge and skills that students are able to demonstrate from the different domains that define math and ELA, then domain sampling is an entirely appropriate method for building a test blueprint. However, if an additional purpose is to support coherent and actionable inferences of growth, this can be accomplished at the same time by adopting a stratified domain sampling approach, where one or more strata might consist of the domain within which a learning progression has been specified.”
Briggs & Peck, 2015Slide62Slide63
Learning Progressions Approach
Focus on learning
Information for instruction
Helpful growth measuresMeaningful achievement levelsBridge to multiple assessment opportunities throughout instruction, across grades
What is our story?Slide64
GPS approach:
Multiple waypoints and data triangulation.
Never a single waypoint.
A collection, measurement, and analysis of data
during learning to understand, improve, and optimize learning. Slide65
“It’s not the destination, as much as the journey…”Slide66
What does this have to do with grade based assessments?
Limitations to current designs to support the inferences and (
mis
)usesCan we reconsider design in terms of learning inferences?Will this help increase validity of assessments for all students – lower SEMs, higher information statistically, greater usefulness.
Simply providing a “lower level test” introduces multiple systematic errorsAt what point will “off grade” no longer be relevant?Slide67
Keep Calm
&
Just Keep Swimming
Swim, swim, swim…Slide68
Learning Progressions Approach
Assessment Systems Design based on a scale of learning – the progression becomes the foundational scale
Sampling of standards less constrained and perhaps more focused on domain-specific vs cross domain sampling
Items can be developed to be moderately difficulty for the targeted student groupLower SEMs, increase information, improve usefulness of data – for all students
CAT system based on constraints incorporating LP could be very powerfulMaking “off level” no longer relevant