Why Tom Loveless is wrong about NAEP achievement levels

My friend Tom Loveless is right about most things, and he’s certainly right that scoring “proficient” on NAEP has nothing to do with being “on grade level.” He’s also right that Campbell Brown missed this point.

But Tom, alas, is quite wrong about the value of NAEP’s trio of “achievement levels” (basic, proficient, advanced). And he’s worse than wrong to get into any sort of defense of “grade level,” as if that concept had any valid meaning or true value for education reform.

In his words, Tom’s post sought “to convince readers of two things: One, proficient on NAEP does not mean grade-level performance. It’s significantly above that. Two, using NAEP’s proficient level as a basis for education policy is a bad idea.”

We agree on the first point, not on the second—and not on his implicit argument that there is merit in basing education policy on “grade-level” thinking.

Unless one is talking about academic standards—Common Core or otherwise—or about the cut scores on high-stakes, end-of-year, criterion-referenced exams like PARCC and Smarter Balanced, “grade level” has no meaning at all. It’s a misnomer that we adopted during decades of using norm-referenced tests. These were “normed” such that the average score of, say, fifth graders taking the test was deemed to be “fifth-grade-level” work. But that score was simply the average achieved by kids enrolled in fifth grade. It had nothing to do with whether they had mastered the fifth-grade curriculum, had attained a fifth-grade standard, or were well prepared for academic success in sixth grade. It’s nothing more than the score attained by the average student.

That bit of folly led to decades of profoundly misleading reporting of academic performance, memorably skewered by a West Virginia psychiatrist named John J. Cannell in a 1987 study that swiftly became known as the “Lake Wobegon Report” (after Garrison Keillor’s mythical town where “all the women are strong, all the men are good-looking, and all the children are above average”). District after district was reporting to the public that most of their pupils were scoring “above average” or “above grade level”; on those tests, these meant the exact same thing. (And both are mathematically impossible.)

Standards are different. They’re aspirational statements of what students in a given grade should learn, even though we’re painfully aware that most don’t—at least not when the standards are rigorous and demanding enough that those who master them are truly on track for success in college or the job market.

As for cut scores: If correctly calibrated to signify readiness for academic success in the following grade—such as a 4 on the PARCC exam—they signify that the test taker has done “grade-level” work, properly understood.

But the percentage of American students getting a 4 or higher on PARCC is about the same as the percentage reaching “proficient” on NAEP (in the grades where NAEP is given)—which is to say, not nearly enough. In Maryland, for example, the 2015 PARCC scores generally showed 30–40 percent of students reaching levels 4 or 5. About the same percentages of the state’s fourth and eighth graders tested “at or above proficient” on NAEP that year in both reading and math.

Which brings us back to NAEP. When the achievement levels were established in the early 1990s—I was on the National Assessment Governing Board (NAGB) then—state leaders and others were hungry to “How good is good enough?” on various gauges of student and school performance. The authors of A Nation at Risk had to rely on norm-referenced test results and SAT scores to form their bleak conclusions about the parlous condition of American education. Meeting in Charlottesville six years later, the governors and President Bush 41 set “national goals” for American education by the year 2000. One of those goals ambitiously declared that, by century’s end, “American students will leave grades four, eight, and twelve having demonstrated competency in challenging subject matter including English, mathematics, science, history and geography.”

But who was to say what “competency in challenging subject matter” meant, or how to gauge student progress toward it?

Using new statutory authority conferred in 1988, NAGB resolved to try to answer that question by establishing “achievement levels.” These benchmarks would enable NAEP results to be reported according to how well students (and states, etc.) were actually doing, rather than in relation to “scale scores” that have meaning only to psychometricians.

We agonized over how many levels there would be and what to call them, eventually settling on three and boldly declaring that the middle level—“proficient”—was the desired level of educational performance for young Americans.

Yes, it was aspirational (just like Common Core and scores of 4 or 5 on PARCC!). Yes, we knew that most young Americans weren’t there yet. Yes, the achievement levels were destined to be controversial—Loveless summarizes that history. Seems it’s not possible for “experts” at places like the National Academy of Sciences to countenance anything that is ultimately based on human judgment rather than some sort of experiment or regression.

It’s also a fact that many people thought (and still think) that NAEP’s achievement levels, especially “proficient,” expect too much from American schools and students. But, guess what? Recent painstaking research has shown that proficiency in twelfth-grade reading on NAEP equates pretty closely to college readiness. (The corresponding math score is closer to proficient than to basic.) Tom seems to think that the complaints about NAEP’s difficulty are proven by the fact that even high-performing Asian countries boost only 60–70 percent of their kids to the TIMSS equivalent of “NAEP proficient” in eighth-grade math. To which I reply: For the United States to reach that point would be nothing short of transformational.

If NAEP’s achievement levels are too ambitious for American students, then we have further evidence (as if any were needed) that today’s actual performance by the majority of those students is a long, long way from where it needs to be if we’re at all serious about their readiness for college-level work.

For more than two decades now, NAEP achievement levels have been the closest thing America has had to “national standards.” Yes, they’re ambitious—at least “proficient” and “advanced” are—but so is the goal of getting young Americans prepared for success in college and career.

One presumes that Tom Loveless, former teacher and great guy that he is, shares that aspiration. If so, he should quit knocking the achievement levels. (They already have plenty of wrong-headed critics without him joining the chorus.) And he should explain to Campbell Brown and others that “grade level,” as commonly used, is a hollow and meaningless metric.

Chester E. Finn, Jr.
Chester E. Finn, Jr. is a Distinguished Senior Fellow and President Emeritus of the Thomas B. Fordham Institute.