The testing mess
August 04, 2010
The only thing surprising about last week’s revelation that the fraction of New York City students passing the state’s reading and math tests had dropped by an average of 25 percentage points is that anyone was surprised at all. Student pass rates dropped precipitously all across New York State for one reason, and one reason only: State education commissioner David Steiner and Board of Regents chancellor Merryl Tisch decided to make the tests less predictable this year, and to raise the “cut scores” required for students to reach each of four designated achievement levels—below basic, basic, proficient (i.e., “passing”), and advanced. Student achievement levels had risen spectacularly from 2007 to 2009 because a different group of Albany education authorities decided to lower the bar for proficiency by reducing the cut scores.
Such elasticity in the definition of student achievement is one of the nation’s most serious education problems. The No Child Left Behind (NCLB) Act of 2001 left the door wide open to massive test inflation by stipulating that all American students “will be proficient” by the year 2014—and imposing a series of increasingly onerous sanctions on districts and schools that do not move fast enough toward that goal—yet allowing each state to develop its own tests and set its own standard for “proficiency.” Since men are not angels, it was inevitable that state and local education authorities would lower the proficiency bar to make themselves look good politically and avoid federal sanctions.
The best evidence of test-score inflation is the wide gap between the number of students that states deem proficient on their own tests and the number called proficient by the National Assessment of Educational Progress (NAEP), often referred to as the “nation’s report card.” The NAEP tests are the gold standard in student assessment because they can’t be gamed by educators: Because the federal tests are given to only a sample of students in each state, teachers can’t “teach to the test” and schools can’t offer students practice tests.
The problem of test inflation has been particularly acute in New York. As shown in two separate state-comptroller reports, one in 1991 and another last year, the state’s education department has historically failed to maintain the integrity of the testing system (by, for example, establishing a standardized scoring system and verifying its use). The situation became far worse in 2002, when NCLB came into effect and mandated reading and math exams for grades three through eight. The state education department should have hired a highly qualified director of assessment, someone committed to creating an honest and transparent testing regime. Instead, the job went to David Abrams, a high-school English teacher who had spent ten years as an administrator in an Albany-area school district. Abrams lacks professional credentials in the field of education testing. One member of the Regents told me that the testing director “has no qualifications for the job, and he’s responsible for many of our blunders on the tests.”
Abrams’s most consequential blunder was ignoring a warning from assessment experts Daniel Koretz and Howard Everson about the integrity of the state tests. In a September 2008 memorandum to Abrams, they cited growing public skepticism about the reported score gains and requested the education department’s “support for a program of validation studies” to measure the extent of “score inflation and the undesirable instructional activities that produce it.” The inflation was produced not only at the state level with lowered standards, but locally through such practices as “teaching to the test,” having teachers grade their own students, and even the possibility of cheating.
Abrams shrugged off the experts’ warning, and scores on the 2009 state tests then reached astronomical levels. In many school districts, the number of students scoring above the proficiency bar was nearly 100 percent. It was even possible for test takers to reach the “basic” level by simply guessing on all the multiple-choice questions, while ignoring test items that required longer written answers. Not surprisingly, almost no students in the state scored below the basic level in 2009. I’ve called these results the “Lake Wobegon test scores,” after Garrison Keillor’s tales about a town where “all the children are above average.”
For Mayor Bloomberg and Schools Chancellor Joel Klein, test inflation was the gift from Albany that kept on giving, and they found ways to build even higher monuments to their reputations as school reformers. The city offered school employees a variety of inducements, including cash payments, for pushing test scores up even farther. The Bloomberg administration didn’t bother to ask too many questions about how the deed was done. Principals received cash bonuses of up to $25,000, and thousands of teachers were offered smaller bonuses for improved test scores — a powerful incentive to inflate the results by any means necessary. Finally, the city provided a powerful additional tool—the “Predictive Assessment”—to help teachers get the scores up. This is essentially a test-prepping device disguised as a mini-test that students take once a year; it closely reflects the blueprint and structure of the state tests.
To believe that the rising state and city test results had any objective validity was, by 2009, to believe that education nirvana had arrived in the Empire State. The new [chancellor] of the Board of Regents, Merryl Tisch, made it clear that she didn’t believe it. Tisch suspected that state education officials, including outgoing education commissioner Richard Mills, were deliberately setting the cut scores low, leading to the big boost in test results in 2008 and 2009. She not only brought in the reform-minded David Steiner to succeed Mills, but leaned on the education department’s lethargic bureaucracy to provide comprehensive student test data to Koretz, one of the country’s leading testing experts. Koretz and his team of Harvard researchers will produce a long-range empirical study that promises to pinpoint the extent and source of the test inflation of the past few years.
Tisch and Steiner deserve the public’s praise for their courage in challenging some of the powerful political interests in education. One of the most important revelations produced by their recalculation of cut scores this year was debunking the claim made by Mayor Bloomberg that his reforms had led to a significant narrowing of the black–white achievement gap. Still, Tisch and Steiner have taken only the first small steps towards creating a fair and transparent assessment system. Such a system should encourage classroom teachers to teach a well-rounded curriculum and then test students on their mastery of its academic content. This is Tisch and Steiner’s long-term challenge. In the meantime, it would certainly help if they were able to hire a testing director with a national reputation and a commitment to assessment reform. It is dismaying to discover that David Abrams, the Albany bureaucrat who was squarely in the middle of the test-inflation scandals of the past few years, is still New York’s state testing director.
This piece originally appeared on National Review Online.
Stern is a senior fellow at the Manhattan Institute and a contributing editor to the Institute’s City Journal.