Tackling the Yo-Yo effect: Improving Ohio's value-added model

Ohio has made adjustments to its value-added model ahead of the release of 2009-10 school performance data next Friday. In anticipation of these changes, we commissioned the education data and policy experts at the North Carolina-based firm Public Impact to help us understand them and what they mean for Ohio’s schools, teachers, and students. We are pleased to share their insights with our Gadfly audience.

Tackling the Yo-Yo effect: Improving Ohio’s value-added model

Since 2007, the Buckeye State has used a value-added analysis to measure student academic growth over time. In 2008, the state revised its statewide school accountability system to take into account student progress from one year to the next. As we wrote then, the Ohio value-added model compares student gains to the improvement made by similar students to evaluate if those gains were “above expected growth,” “at expected growth” or “below expected growth.” If students are making above expected growth or expected growth, after controlling for relevant factors, we can conclude that their teachers and schools are “adding-value.”

Value-added measurements are complicated. Indeed, “value-added” analysis requires statisticians to make various choices regarding methodology, each with its own pros and cons. These approaches are much discussed and argued over by experts. But as we wrote in 2008, value-added analysis provides parents, educators, and policymakers with essential information about the performance of teachers and schools.

The Yo-Yo Effect

Or at least they can, in theory. In practice, the first two years of value-added results in Ohio have raised some serious concerns about the efficacy of the state’s value-added methodology. As Cleveland State University researcher Douglas Clay observed – and the Dayton Daily News reported, the results showed wild variations in performance between grades – a "yo-yo effect" under which few students make expected growth in one year, then almost all make above expected growth the next, or vice versa.

In 2008, for example, 83 percent of students made below expected growth in fifth-grade reading. The next year, when those students were in the sixth-grade, 98 percent made above expected growth. A similar pattern appeared between sixth- and seventh-grade students (see Figure 1). As Dr. Clay wryly observed, it is hardly “plausible that all fifth-grade teachers and students in the state had a very bad school year in 2007-08, or that all sixth-grade ones had a great year in 2008-09.”

Figure 1: Percentage of Ohio Schools Making Growth in Reading, 2008 – 2009

Source: Ohio Department of Education, “Building Value-Added Data, 2007 – 2009” Available: http://ilrc.ode.state.oh.us/Downloads.asp

Looking only at these results, it’s clear that something was amiss in the value-added model. But what caused this yo-yo effect, and what can be done to correct it?

The Culprit

All calculations that rely on statistical methods require defensible assumptions to validate their results. The methodology used in Ohio’s value-added model makes two important assumptions relevant to the yo-yo problem. First, the ability level of student cohorts will change little from year-to-year across the state. In other words, this year’s fifth graders are not tremendously brighter (or less bright) than last year’s fifth graders. This is a fairly safe assumption, barring the introduction of a revolutionary new brain food or the disruption of a natural disaster. 

The second assumption underlying the value-added model is that the rigor of state test will be consistent across both time and grade levels. That is, the state’s fifth-grade math test should be as rigorous in 2009 as it was in 2007, and the fifth-grade math test should have the same grade-level rigor as the sixth-grade math test. This consistency is essential because the value-added model uses student performance in a baseline year, 2007, to benchmark and predict student performance in subsequent years. If state tests are remarkably easier from one year to the next or from one grade level to the next, it will appear that most students are making exceptional gains. Or if on the other hand, tests become remarkably harder, it will appear that most students are making below expected gains. 

State data suggest that the second assumption is behind the yo-yo effect. First, there is wide variation in rigor between different grade level assessments. This is evident when we compare average scale scores between grades. Fourth graders in 2006 saw their average reading score drop more than 10 points when they went to fifth grade, and then rise by more than 10 points in sixth grade. As a result, it appears that the percentage of proficient students falls dramatically between fourth and fifth grade, only to rise again between fifth and sixth grade. 

Second, between 2007 and 2008 the state fifth-grade math test appears to have gotten significantly more rigorous. During that time, average student performance in fifth-grade reading fell by almost eight scale score points – more than one fourth of a standard deviation – statewide. It’s unlikely that a drop of that magnitude would occur naturally. 

Test rigor likely varies between grade levels because Ohio’s assessments were not developed with a value-added analysis in mind. When Ohio designed its state exams, each grade did so independently, rather than working together to ensure consistency of rigor across grades. And since test designers change specific test items every year so that students and teachers never know exactly what questions to expect, test rigor may also vary on the same grade level test from one year to another. Taken together these two issues are the most likely culprit behind the yo-yo effect.

Possible Fixes

The most obvious solution to Ohio’s value-added dilemma is to revamp the state assessments so they fluctuate less in rigor from grade-to-grade and year-to-year. State officials are, in fact, beginning to design new assessments in conjunction with the multi-state adoption of the Common Core standards in English language arts and Mathematics. These assessments should be an improvement over current tests, but the transition will take years.

In the meantime, what is Ohio to do? One option would be for the state to “curve” test results each year to distribute ratings more evenly. Just like your eighth-grade math teacher could give As only to the top three students in class, the state could adjust the value-added results each year to ensure that the same proportion of schools were above, at, and below expectations. While this fix would stop the yo-yo effect, curving would create a zero-sum game for schools. Even if every school made real growth, only a limited number of schools could be identified as making “above expected growth.” For every school that the state labeled a success, it would be forced label another as a failure, which wouldn’t be fair given the consequences attached to value-added ratings in Ohio’s accountability system.

Ohio's Solution

Ohio will introduce two changes this year to its value-added model that will control the yo-yo effect. First, the state will use what's called a "rolling average" to smooth out changes in average performance. For example, a school’s eighth-grade math score will be based on that school’s score this year, the average score of eighth-grade students last year, and the present cohort’s math scores in the previous year (when they were in seventh grade). These averages smooth out the variability in schools’ reported results, while still allowing schools to show progress.

Second, the state will place a cap on how much a grade cohort's average score can raise or drop from one year to the next. If a school's average sixth-grade reading score, for example, exceeds last year's fifth-grade score by more than a maximum amount (one “normal curve equivalent,” a kind of standardized unit), Ohio will limit the school's gain to that amount. Ohio will also limit the drop in a school’s score to a lesser amount.

So can we really trust these fixes? As we mentioned earlier, Ohio’s test data are not ideal for a value-added analysis. That does not mean that the analysis can’t provide meaningful information, only that we need stronger evidence of success or failure before passing judgment. Both of these fixes raise the bar on what it takes for the state to declare that a school has made above or below expected growth. Although fewer schools will meet the new parameters, we can have greater confidence that the designation of above or below expected growth reflects real changes in performance. The fixes also ensure that the system is quicker to reward schools than it is to punish them. While still imperfect, the proposed solutions seem appropriate given the challenges facing the state.

Next Steps for Ohio

The use of value-added data within education is still a work in progress. Researchers continue to develop newer and better methods for measuring and quantifying student achievement. Though in some respects the effort remains as much art as science, value-added models offer powerful ways to assess the effectiveness of schools over time. But, as we noted in 2008, measuring value-added growth has limitations. This model of measuring student progress should be balanced with measures of student achievement. A measure of achievement will, for example, report whether a student is reading at grade level or what proportion of fifth-grade students in a school are reading at grade level.

Even while the state modifies its current accountability system and works to improve its state assessments, it also needs to adjust its value-added methodology accordingly. In this ongoing effort to improve accountability systems, policymakers should consider the following:

  • Providing additional growth information to parents and educators, such as student and school growth percentiles or other standardized measures of student performance.
  • Continuing to improve state data systems to track and correlate student data across a variety of dimensions so that educators, parents, and policymakers can see what’s working.
  • Developing next generation value-added models that assess whether students are making enough growth to meet rising standards and progressing toward other important non-academic outcomes.
  • Developing value-added measures that can provide fair and reliable insights into teacher effectiveness.

We’ll address each of these recommendations in more detail in this year’s Ohio Performance Report, available at the end of this month at http://www.edexcellence.net/ohio.

More By Author