How to think about short-term test score changes and long-term student outcomes

4.16.2018

Last month, the American Enterprise Institute published a paper by Colin Hitt, Michael Q. McShane, and Patrick J. Wolf that reviewed every rigorous school-choice study with data on both student achievement and student attainment—high school graduation, college enrollment, and/or college graduation. They contend that the evidence points to a mismatch, specifically that “a school choice program’s impact on test scores is a weak predictor of its impacts on longer-term outcomes.”

This week, I plan to write a series of commentaries on the paper, which I believe is fundamentally flawed. I have several concerns, including:

The authors included programs in the review that are only tangentially related to school choice and that drove the alleged mismatch, namely early-college high schools, selective-admission exam schools, and career and technical education initiatives.
Their coding system—which they admit is “rigid”—sets an unfairly high standard because it requires both the direction and statistical significance of the short- and long-term findings to line up.
In their conclusions, they extrapolated from their findings on school choice programs and inappropriately applied them to schools.

That’s a lot to unpack, so I’m going to do this over several posts. I hope you’ll join me for the ride.

First, though, let’s consider the value of looking at long-term outcomes and what they can and cannot tell us about schools and programs.

Let’s start with where the AEI authors got it right. We all hope that our preferred education reforms, at least the most ambitious ones, will “change the trajectories of children’s lives.” That’s particularly true for children growing up in poverty, who typically face depressing odds of success if they attend mediocre schools. They may drop out before finishing high school, or they might graduate but with minimal skills. Either way, they’re unlikely to complete postsecondary education or training or master the “success sequence,” and are thus likely to work low-wage jobs and not enjoy the fruits of upward mobility. Many will struggle to form an intact family, and their children will grow up poor and then repeat the cycle. It’s a bleak picture.

It’s different for affluent children, of course. Great schools may change their trajectories, too, but it will be subtler because they will probably do reasonably well regardless due to the advantages they usually receive at home. Most will graduate high school and go to college no matter what. But with a great education, they have a good chance of learning more, which will help them get into and through a better college and eventually do more and better in the labor market.

For all children, we hope that fantastic schools will benefit them in other long-term ways that are important but harder to measure: encouraging them to become active, informed citizens; identifying strengths and interests that they might put to good use in a career; and helping them become people of good character.

So, yes, long-term outcomes are extremely important, especially for children for whom positive outcomes are far from assured.

Will all effective programs show long-term effects?

As the AEI authors write, it would be deeply disappointing if major reforms of schools and schooling moved the needle on test scores but didn’t have a lasting impact on kids’ lives. In effect, we’d be fooling ourselves into thinking that we’re making a big and enduring difference when in reality we might be wasting our time or money that could better be spent on other strategies.

Of course, it would be unfair to apply such a high standard to small, incremental changes, like adopting a better textbook or extending the school day. Nobody would expect tiny tweaks to have profound impacts, such as transforming a future high school dropout into a college graduate. But if they help lots of students become marginally more literate or numerate, they are still worth doing.

For major, expensive, disruptive reforms, however, stronger long-term outcomes are not too much to ask. And of course the most effective reformers are already aware of this. Consider KIPP, which has been obsessed from day one not just about student achievement but also at getting its KIPPsters to and through college, consistently tracking its numbers and refining its model. Which has properly led the organization to look at much more than just short-term test scores as indicators of whether their students are on track. Much of today’s interest in non-cognitive skills grew out of this work.

If a program showed large and significant impacts on achievement, especially for low-income kids, but poor results on long-term outcomes, it would certainly raise alarms. It might indicate that the program was overly focused on reading and math, or was teaching to the test, or was crowding out other strategies or activities that would actually help kids succeed in the long run. Unless, of course, there were issues associated with the long-term measures themselves.

The problem with high school graduation rates

It almost goes without saying, but at a time when high school graduation rates are skyrocketing, in part thanks to “credit recovery” initiatives and other dubious practices, we must view this measure with a healthy dose of skepticism. Now that most states allow students to graduate without passing an exit exam, or even a set of end-of-course exams, we cannot pretend that the standards for graduation are consistent from school to school. As a result, we must be careful with how we interpret positive or negative impacts on high school graduation rates, especially when evaluating high schools themselves. Boosting graduation rates might mean that schools are better preparing students to succeed—or it could mean that they have lowered their standards.

It would also be inappropriate to assume that a high school that boosts student achievement and better prepares its graduates to succeed in college would also raise its own graduation rates. In fact, we know from a 2003 study by Julian Betts and Jeff Grogger that there is a trade-off between higher standards (for what it takes to get a good grade) and graduation rates, at least for children of color. Higher standards boost the achievement of the kids who rise to the challenge, and helps those students longer-term, but it also encourages some of the other students to drop out. If a high school could manage to both boost achievement and keep its graduation rate steady that would be an enormous accomplishment. Yet by the logic of the AEI review, such a school would show a mismatch between short-term achievement impacts and “long term” attainment ones. Such logic is faulty.

Programs that don’t test well

On the other end of the scale are programs that appear to be failures when judged by short-term test-score gains, but that produce impressive long-term results for their participants. It’s this category that most concerns Hitt, McShane, and Wolf, especially in the context of school choice. “In 2010,” they write, “a federally funded evaluation of a school voucher program in Washington, DC, found that the program produced large increases in high school graduation rates after years of producing no large or consistent impacts on reading and math scores.” Later they conclude that “focusing on test scores may lead authorities to favor the wrong school choice programs.”

It’s a legitimate concern, and one I share (setting aside my misgivings about high school graduation rates expressed above). I played a tiny role in helping launch the D.C. voucher program when I served at the U.S. Department of Education, and I support the expansion of private school choice programs for low-income students. I can imagine why the private schools in the D.C. program might struggle to improve test scores, especially when compared to highly effective (and highly accountable) D.C. charter schools and an improving public school system. But I can also imagine that the experience of attending a private school in the nation’s capital could bring benefits that might not show up until years later: exposure to a new peer group that holds higher expectations in terms of college-going and the like; access to a network of families that opens up opportunities; a religious education that provides meaning, perhaps a stronger grounding in both purpose and character, and that leads to personal growth.

It would be a shame—no, a tragedy—for Congress to kill this program, especially if it ends up showing positive impacts on college-going, graduation, and earnings. The same might be said about large voucher programs in Ohio, Indiana, and Louisiana, all of which have shown disappointing early findings in terms of student achievement but might be setting children on paths to future success. Policymakers should be exceedingly careful not to end such programs prematurely.

***

There are therefore several things to think about as we further explore the AEI study: long term outcomes do indeed matter a lot, especially for poor kids; if large test-score gains don’t eventually translate into improved long term outcomes, it is a legitimate cause for concern; and we must stay open to the possibility that some programs could help kids immensely over the long haul, even if they don’t immediately improve student achievement. At the same time, we should be skeptical about using high school graduation rates as valid and consistent measures of attainment.

So is there really a mismatch between short-term scores and long-term outcomes, especially for school choice programs? And do existing studies really raise red flags about using test scores to make decisions about individual schools? Tune into tomorrow’s installment to find out!