How does phasing out paper-and-pencil tests affect student performance?

Virtually every state has moved, incrementally or fully, from administering standardized assessments on paper to online, so there’s been plenty of discussion about whether taking a test on a computer or tablet, versus with paper and pencil, affects student performance. (Recall that the same issue was raised when NAEP scores were released in April, with Louisiana voicing particular concerns.) In other words, is it true that “mode effects”—meaning the physical medium by which one takes a test—can depress scores in ways that have much to do with the medium and little to do with what students know and can do? 

Enter Ben Backes and James Cowan from CALDER who examine, first, whether students who take Massachusetts’s state test online score systematically lower than if they had taken the same test on paper; and second, whether there are any differences across subgroups of students.

Recall that Massachusetts in 2015 and 2016 administered the PARCC test both on paper and online. And in 2015 districts could chose between MCAS (the old state test) or the new PARCC test. During this period of flux, state officials agreed to a hold-harmless provision for all schools administering the PARCC in either year, whether on paper or online—meaning that no school’s accountability rating could fall as a result of their PARCC scores. Descriptive findings show that 72 percent of elementary and middle schools switched to PARCC in either 2015 or 2016. Of those, 57 percent administered the test online at least once.

The CALDER study examines schools that administered PARCC in both years, which includes about half of Massachusetts students enrolled in grades 3–8 between 2011 and 2016 and 88 percent of students in schools administering PARCC in 2015 and 2016. Backes and Cowan use two analytic approaches: a linear regression (OLS or ordinary least squares) and a differences-in-differences design that compares the differential effect of the mode on both the treatment and control group over time. They use the MCAS paper assessments as control variables in both approaches.

They find “test mode effects” of about 0.10 standard deviations (SD) in math and 0.25 SD in English language arts (ELA). These effects equate to about 5.4 months of learning in math and eleven months of learning in ELA in a nine-month school year—obviously both substantial. They find similar results across both models. They also perform a number of robustness checks to ensure, for instance, that preexisting trends in school outcomes aren’t driving their results, and they determine that is not what’s occurring.

Next, Backes and Cowen look to see if there are mode effects in the next year, to check whether this was a one-time phenomenon. But they still find differences that can be attributed to mode—though for second-time test takers the effects are about one-third as large as the first year in math, and about half as large in ELA. Additional analyses also suggest that student familiarity with the tests is what’s driving the reduced mode effect seen in year two. Finally, relative to subgroups, they find little variation by subgroup, except that mode effects are more pronounced in ELA for students scoring at the bottom of the distribution. 

When scores were first released for all PARCC states, testing officials explained that mode effects were an issue, but that it would be up to individual state and district leaders to determine the scope of the problem, as well as what to do about it. Backes and Cowan take that advice one step further, recommending that states take test mode effects into account when using assessment scores for accountability purposes.

In 2017, Massachusetts started to use statistical adjustments to correct for mode effects—at least during the transition from paper to online. Given officials’ propensity in that state to make wise decisions and witness subsequent success, other states should do their homework and perhaps follow suit.

SOURCE: Ben Backes and James Cowan, “Is the Pen Mightier Than the Keyboard? The Effect of Online Testing on Measured Student Achievement,” Calder (April 2018).

Amber M. Northern, Ph.D.
Amber M. Northern, Ph.D. is the Senior Vice President for Research at the Thomas B. Fordham Institute.