“Similar Students” measures: a flawed approach to school accountability

Vladimir Kogan

The school accountability movement is founded on the principles of transparency, high expectations, and the dissemination of accurate information about educational quality. While there is much to like about Ohio’s recently signed charter school reform legislation, one provision in the bill is at odds with all three of these ideas. As a result, it threatens to significantly undermine Ohio’s efforts to hold charter school operators and public school districts accountable for the achievement of the students they educate.

The provision I’m referring to requires the Ohio Department of Education (ODE) to create and evaluate a new “Similar Students” measure of academic achievement, based on a metric used in California. The final language of the legislation only called for a study of such a measure but there appears to be significant interest from legislators and stakeholder groups in formally incorporating it into Ohio’s existing school accountability system, particularly for charter schools. Once ODE completes its evaluation, this conversation is likely to intensify.

In a new analysis, I show why this would be a terrible idea. Using data from ODE and technical documentation from the California Charter Schools Association, I precisely replicate California’s methodology and create the Similar Students measure that Ohio is considering for its schools. In the analysis, I show that the measure has a number of fatal flaws: Not only does it fail to accurately identify high- and low-achieving schools, but it also artificially inflates the measured performance of Ohio charter schools while disadvantaging schools in Ohio’s big eight urban districts for reasons that have nothing to do with the quality of the education they provide to their students.

Before diving into these troubling results, it is useful to understand why policy makers might be interested in a Similar Students measure. The purpose of school performance designations (such as the letter grades published on Ohio’s school report cards) is to accurately assess the quality of the education being provided to students in the classroom. In reality, it is surprisingly hard to disentangle educational quality from all sorts of other factors that affect student achievement—including many factors over which schools have little control.

This is true because, in the first eighteen years of their lives, children spend only 10 percent of their time in the classroom. The experiences students have outside of the classroom—after school, on the weekends, and in the summers—impact their academic performance tremendously. Some kids have parents who read to them every night, help them with homework, and encourage them to live up to their potential. Although we might wish it were otherwise, many other children have parents who provide few of these forms of support, either because they can’t or don’t know how.

When we aggregate student achievement at the school or district level—by looking at proficiency rates, aggregate achievement indices, or metrics like graduation rates—the outcomes we observe reflect not only educational quality, but also all of these other determinants of academic achievement. Because many of the forces affecting student achievement are outside of school control, the proficiency measures used by some states to hold schools accountable effectively punish or reward them for the demographic composition of the students they serve, rather than the quality of the education provided in the classroom.

Fortunately, Ohio has recognized this problem and, since 2008, has also utilized a measure of academic “value-added,” which rewards schools not based on how well their students perform at one moment in time but instead on how much growth they make from one year to the next. The value-added approach addresses many—but not all—of the concerns about non-school influences on achievement, because a great deal of these factors effectively fall out when we focus on individual students’ growth instead of absolute achievement levels. (By relying in large part on students’ previous test scores, CREDO’s well-known “virtual twins” methodology does something comparable. It is therefore much more like the value-added approach than the Similar Students measure.)

The benefit of value added is that it accounts for any student-level factor that consistently affects student achievement, including aspects of their lives that would be impossible for us to measure otherwise.

The Similar Students measure tries to get at the same issue by using statistical modeling to make adjustments for demographic differences of students served by different schools. The problem with this approach, however, is that such adjustments can account only for observable differences between students. This might help us account for things such as student race or socioeconomic status, which is recorded in administrative data, but not for difficult-to-measure differences—such as individual motivation or parental involvement and support—that have as big of, if not bigger, an impact on student achievement.

It turns out that there is almost no relationship between student achievement growth, as measured by value added, and the scores we get using the Similar Students measure. Much of the difference between the two metrics is likely driven by such unobservable student characteristics. Overall, the correlation between the two is very low. As a result, many schools look stellar using the Similar Students measure even when, according to Ohio’s current metrics, they provide a crummy education. Similarly, many high-quality schools would get an F grade using the new measure.

The danger posed by such unobserved variables is particularly serious when applying demographic adjustments to charter schools. By definition, students who choose to attend charter schools are different from their peers who remain in traditional public schools. If these differences include difficult to measure factors such as motivation or parental involvement—and there is clear evidence that they do—using simple statistical adjustments to compare schools will do little to account for these differences and will artificially inflate the ratings calculated for charter schools using this method.

This is precisely what I find in Ohio when I compare the new Similar Students measure to the existing value-added scores published by ODE. Not only do charter schools look better on the Similar Students measure when compared to the typical Ohio school whose students make identical gains on the value-added metric, but urban district schools look far worse. This occurs because both charter school students and urban district school students differ from the pupils served by typical Ohio public schools in unobservable ways, and these differences systematically skew the Similar Students measure. In fact, the unearned “bonus” given to charter schools under this method equals roughly a full letter grade, while the “penalty” imposed on urban district schools totals roughly half a grade. (Could this be why some charter advocates are pushing for the measure?)

Let me stress that this doesn’t prove that Ohio charters systematically attract only the highest- performing students. Indeed, there is clear evidence that they tend to disproportionately serve demographically disadvantaged students. What it does suggest, however, is that Ohio charter school students tend to differ from their public school counterparts in a number of ways. Some of them are negatively correlated with student achievement (e.g., race, socioeconomic status), while others are positively correlated (e.g., parental involvement, motivation). The problem is that the Similar Students approach accounts only for the former while ignoring the latter, resulting in the artificial inflation of average charter school scores.

The consequences of incorporating the Similar Students measure into Ohio’s accountability system would be troubling on two counts. First, the new measure would incorrectly label many schools as being high- or low-achieving when, in fact, they are not. Second, it would create greater confusion among parents by adding yet another conflicting signal about the quality of their children’s schools they would need to reconcile for themselves. This undermines the promise of transparency.

Of course, I sympathize with many charter school operators who complain that Ohio’s current accountability framework unfairly punishes them—and most big urban school districts as well—for serving disadvantaged student populations. When it comes to Ohio’s school “performance index,” which is based on a snapshot of academic achievement, that is undoubtedly true. However, my analysis shows that simply switching to the Similar Students measure will not fix this problem. Instead, it will artificially boost the ratings of charter schools, producing school rankings that are no more informative or comparable between schools than the flawed achievement-based metrics Ohio mostly uses now. The best way to address these underlying concerns is to put greater weight on schools’ value-added grades, which currently play only a limited role in overall school evaluations, and also to expand this valuable measure to include more grade levels than are covered now. To be sure, the value-added approach is not without its own limitations, but it is by far the best and most informative measure we have available right now.

Vladimir Kogan is an assistant professor of political science at the Ohio State University.