As the big education bill limps through Congress, much debate centers on how to determine whether states are making real achievement gains, how to track those gains (or losses), and how best to compare states with each other - and with the country. In the New York Times of June 6, columnist Richard Rothstein contends that Congress should forget about state-specific tests and instead rely exclusively on mandatory state participation in the National Assessment of Educational Progress (NAEP) which, he says, is a better test that yields better data based on sampling just 2500 or so kids per state while dampening "teach to the test" temptations. As you may recall, the original Bush proposal - and the pending Senate bill - use NAEP as an external "audit" of a state's results on its own test; the House bill would let states use NAEP or some other instrument of their own choosing, but again only for audit purposes. All versions assume, indeed require, that each state will also give its own test, at least in reading and math, to every child in grades 3-8.
I'm a long time NAEP partisan, indeed one of the (many) parents of state-level NAEP, and I strongly favor its use by states for external audit purposes. But we mustn't expect it to bear too much weight or be too precise. The most notable feature of NAEP trend data, after all, is the flat line. Scores just don't vary much from year to year. NAEP is really good at detecting long-term shifts in overall performance, at calibrating the performance of large groups of kids against the closest thing we have to national academic standards, and at facilitating big comparisons such as state-to-state and state-to-nation. But the more detail you want from NAEP the less robust those data are. Rothstein asserts, for example, that NAEP can tell us how "urban Hispanic students" are doing. That's true at the national level. But the state samples aren't nearly sufficient to yield such fine-grained information. For that level of detail we'd need a much bigger NAEP - which brings its own political battles and costs. In the real world, to get detailed data about the progress of various "disaggregated" groups of kids within states, we're going to have to rely mainly on states' own tests. Problem is, we know that many of those tests leave a lot to be desired.
Monday's Wall Street Journal vividly illustrated the danger of expecting too much precision from any system that depends on smallish state-level samples. On the front page, we read the disturbing news that even so familiar a statistical source as monthly unemployment data may not be reliable at the state level. "The problem," explained reporter Clare Ansberry, "is the rates are imperfect because they are based on samples." She showed how Ohio's March and April jobless data, as calculated by the federal Bureau of Labor Statistics, were simply screwy. It seems that a sample of (in this case) 2000 households isn't robust enough to yield solid results when a lot of other things are going on in the "system." This should serve as a caution to Rothstein and others who expect a sample-based national test (whether NAEP or another) to function as the primary gauge of state academic performance.
"Test Here and There, Not Everywhere," by Richard Rothstein, New York Times, June 6, 2001
"States Discover It is Hard Work to Figure Their Jobless Rate," by Clare Ansberry, Wall Street Journal, June 1, 2001