Standards, Testing & Accountability

  • Education reformers are right to prioritize the closing of “achievement gaps”—the disparities in academic outcomes separating comparatively advantaged (and primarily white) students from their low-income and minority peers. But there’s such a thing as prosecuting the achievement gap beyond its proportion, as this Hechinger Report story on Kentucky schools illustrates. While surveying the state’s testing progress since its (propitiously early) adoption of the Common Core, author Luba Ostashevsky focuses heavily on the fact that white third graders have increased reading proficiency by twice the amount that their black classmates have (4 percent vs. 2 percent). It’s certainly true that we’d like to see those gains realized equitably, but it’s also worth highlighting—and celebrating—the fact that both groups are doing better than they were previously. Regardless of their background, most elementary schoolers know enough math to understand that achievement isn’t a zero-sum proposition.
  • The political challenges around reform can be enough to make you pine for a benevolent education dictator to establish rigorous academic standards, ample choice in schooling, and unlimited recess for all. But put down that scepter, Jefe Duncan—most of the truly important policy decisions are still made at the state level, and that’s why it’s so
  • ...
  • At the same time we wrapped up our Wonkathon on parental choice under the Every Student Succeeds Act (ESSA), the Washington Post’s Jay Mathews published a column on the new law’s implications for school accountability. With authority ostensibly withdrawn from the Department of Education, he wonders which measures—particularly non-academic ones—state-level officials will use to determine whether schools and districts doing right by their students. It’s a question that we originally asked in our accountability system design competition this February, yielding novel proposals for student satisfaction questionnaires, school climate surveys, and the tracking of chronic absenteeism, among others. Mathews’s take is no less rewarding.
  • Meanwhile, developments in Denver are also providing a real-time examination of issues we’ve been exploring this month in our national commentary. District officials there have unveiled a new, three-phase framework for initiating the shuttering of underperforming schools, echoing the recent debate between Fordham’s Mike Petrilli and the University of Arkansas’s Jay Greene on the utility—or futility—of relying on test data for closures. (Jay struck a deeply skeptical note on “distant authorities” using such information to overrule parental demand, while Mike was more bullish on what regulators can learn from test scores.)
  • ...

Ohio’s student growth measure—value added—is under the microscope, which provides a good reason to take another look at its important role in school accountability and to see if there are ways it can be improved. On April 19, state Representatives Robert Cupp and Ryan Smith introduced House Bill 524, legislation that calls for a review of Ohio’s value-added measure. In their sponsor testimony, both lawmakers emphasized that their motivation is to gain a strong understanding of the measure before considering any potential revisions.

The House Education Committee has already heard testimony from the Ohio Department of Education and Battelle for Kids; it is expecting to hear from SAS, the analytics company that generates the value-added results, on May 17. In brief, value added is a statistical method that relies on individual student test records to isolate a school’s impact on growth over time. Since 2007–08, Ohio has included value-added ratings on school report cards, though data were reported in years prior.

As state lawmakers consider the use of value added, they should bear in mind the advantages of the measure while also considering avenues for improvement. Let’s first review the critical features of the value-added...

This study examines the impact of test-based accountability on teacher attendance and student achievement using data from North Carolina. Under the No Child Left Behind Act (NCLB), schools that failed to make “Adequate Yearly Progress” (AYP) toward universal proficiency in consecutive years faced a series of escalating sanctions. Thus, teachers at schools that failed one year had a strong incentive to boost achievement in the next, while those at other schools faced a weaker incentive.

Using a difference-in-differences approach that compares these groups, the author estimates that failing to make AYP in NCLB’s first year led to a 10 percent decline in teacher absences in the following year (or roughly one less absence per teacher). He also estimates that an additional teacher absence reduces math achievement by about .002 standard deviations, implying that schools that failed to make AYP saw a similar boost in achievement because of improved teacher attendance. However, in a separate analysis, he shows that the threat of sanctions led to a .06 standard deviation improvement in math achievement in the following year, suggesting that improved teacher attendance accounted for just 3 percent of all accountability-driven achievement gains.

In addition to the general decline in teacher absences,...

  • The heroic journalism of the Boston Globe in exposing pedophilia enabled by the Catholic Church was the focus of last year’s Oscar-winning Spotlight. Now the paper has trained its attention on New England preparatory schools, where some allegations of misconduct date back a half-century or more. Its survey of the claims is penetrating and comprehensive: Nearly seventy such schools have faced complaints of sexual harassment or abuse in the last twenty-five years, with accusations lodged by two hundred alleged victims. And we have no reason to believe that the exploitation is limited to private schools; as a 2004 literature synthesis undertaken by the Department of Education makes clear, sexual misconduct plagues schools across the country and in every sector.
  • At one point, forty-four states were affiliated with one of the two next-generation testing consortia (PARCC and Smarter Balanced) that arose with the widespread adoption of the Common Core. This spring, just twenty-one of those states will be administering the tests. Chalkbeat has published a thorough account of the political machinations that overtook the assessments, as well as the efforts of legislators to pull away from them. In dozens of states, what followed was chaos. Curricular experts were
  • ...

The school choice tent is much bigger than it used to be. Politicians and policy wonks across the ideological spectrum have embraced the principle that parents should get to choose their children’s schools and local districts should not have a monopoly on school supply.

But within this big tent, there are big arguments about the best way to promote school quality. Some want all schools to take the same tough tests and all low-performing schools (those that fail to show individual student growth over time) to be shut down (or, in a voucher system, to be kicked out of the program). Others want to let the market work to promote quality and resist policies that amount to second-guessing parents.

In the following debate, Jay Greene of the University of Arkansas’s Department of Education Reform and Mike Petrilli of the Thomas B. Fordham Institute explore areas of agreement and disagreement around this issue of school choice and school quality. In particular, they address the question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with...

Editor's note: This post is the sixth and final entry in an ongoing discussion between Fordham's Michael Petrilli and the University of Arkansas's Jay Greene that seeks to answer this question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with test results? Prior entries can be found herehereherehere, and here.

Shoot, Jay, maybe I should have quit while we were ahead—or at least while we were closer to rapprochement.

Let me admit to being perplexed by your latest post, which has an Alice in Wonderland aspect to it—a suggestion that down is up and up is down. “Short-term changes in test scores are not very good predictors of success,” you write. But that’s not at all what the research I’ve pointed to shows.

Start with the David Deming study of Texas’s 1990s-era accountability system. Low-performing Lone Star State schools faced low ratings and responded by doing something to boost the achievement of their low-performing students. That yielded short-term test-score gains, which were related to positive long-term outcomes. This is the sort of thing we’d...

Editor's note: This post is the fifth in an ongoing discussion between Fordham's Michael Petrilli and the University of Arkansas's Jay Greene that seeks to answer this question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with test results? Prior entries can be found hereherehere, and here.

Mike, you say that we agree on the limitations of using test results for judging school quality, but I’m not sure how true that is. In order not to get too bogged down in the details of that question, I’ll try to keep this reply as brief as possible.

First, the evidence you’re citing actually supports the opposite of what you are arguing. You mention the Project Star study showing that test scores in kindergarten correlated with later life outcomes as proof that test scores are reliable indicators of school or program quality. But you don’t emphasize an important point: Whatever benefits students experienced in kindergarten that resulted in higher test scores, they did not cause higher test scores in later grades—even though they produced better later-life outcomes....

Editor's note: This post is the fourth in an ongoing discussion between Fordham's Michael Petrilli and the University of Arkansas's Jay Greene that seeks to answer this question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with test results? Prior entries can be found herehere, and here.

I think we’re approaching the outline of a consensus, Jay—at least regarding the most common situations in the charter quality debate. We both agree that closing low-performing schools is something to be done with great care, and with broad deference to parents. Neither of us wants “distant regulators” to pull the trigger based on test scores alone. And we both find it unacceptable that some states still use test score levels as measures of school quality.

I think you’re right that in the vast majority of cases, charter schools that are closed by their authorizers are weak academically and financially. Parents have already started to “vote with their feet,” leaving the schools under-enrolled and financially unsustainable. Closures, then, are akin to euthanasia. That’s certainly been our experience at...

Previous research has found that oversubscribed urban charter schools produce large academic gains for their students. But are these results related to test score inflation (defined by one assessment expert as increases in scores that do not signal a commensurate increase in proficiency in the domain of interest)? In other words, do these schools merely figure out how to prepare their students to do well on the high-stakes exam, or are they contributing to real learning writ large?

To explore this question, a recent study examines state testing data from 2006 to 2011 at nine Boston middle school charters with lottery-based admissions. By exploiting the random nature of the lottery system, prior studies have found that these schools produce substantial learning gains on the Massachusetts Comprehensive Assessment System (MCAS).

To carry out the analysis, author Sarah Cohodes breaks down the learning gains by the various components of the state assessment—akin to how one might disaggregate overall gains by student subgroup. A math assessment might contain several different testing domains (e.g., geometry versus statistics), with some topics being tested more frequently than others. Cohodes’s hypothesis is as follows: If the gains are attributable to score inflation, we might expect to see stronger results on...

Pages