Monday, January 14, 2019

January Board of Ed: update on automated test scoring

back up is here, but there is a PowerPoint
Wulfson: have been studying for some time
interested in potential of computer assisted scoring on MCAS scoring
with potential of being able to return results faster and open more opportunities for computer-adapted testing in the future
"always valued having open response questions on our MCAS"
"five years ago we wouldn't have even been having this discussion" but remarkable advances in artificiel intelligence
one of the areas in which "we are proud to say we haven't been an early adopter" but have taken our time in considering response
not expecting perfection; human scoring is also failable
not expecting in depth feedback, as that is the role of the teacher
is it of equal quality with current human scoring

Stapel: overview of current scoring then overview of automated scoring
1.5M ELA essays scored in spring 2019 in 8 states; about 60% of scorers have teaching experience
scored against two traits: idea development and conventions
"we also look for the unusual responses we might get, so scorers can be trained on those"
have to be trained on each individual item
100% double blind scoring for grade ten; student gets higher of two scores
scorer reliability: exact, adjacent, discrepant
There have now been two pilots, in 2017 and in 2018
in both cases finding parallel results with human scoring
"in particular...tended to show high rates of agreement with scores assined by expert scorers"

avoiding the gaming of automated scoring: text (but not an essay); repetition; length; copying source text

Doherty: at this point, it appears you're presenting that the validity is equal to that of human scoring
agreement from presenters

Craven: you'd mentioned that there might be savings...
Wulfson: there are so many variables, implementation costs, start-up costs
"more likely to kick out a paper that it isn't sure of, have to pay for human scorers to check those"
"if there is some savings, great, but that's not what's driving this"

Stewart: you'd mentioned advisory groups working with
Stapol: working with contractor, doing internal analysis
technical advisory committee "who are fairly skeptical"

Peyser: but is there time savings?
Wulfson: absolutely, "we get machine scoring back very quickly" to schools; this is what takes time
and then to the Commissioner's vision of adaptive testing
"goal has always been to return MCAS scores to schools within the same year...I think we're still quite a ways away from, your driver's test, did you pass or fail"

Plan is to use automated in grades 3-8 as second (double blind) scoring only at least one per grade
all human scoring in grade 10
higher score is always what is awarded if there are adjacent scores
analyzing results over summer; update in the fall

No comments: