Next-generation teacher evaluations: Are they living up to expectations?

Over the last five years, prodded by the feds, states have adopted teacher evaluation systems. According to a recent report from the National Council on Teacher Quality, forty-one states, including Ohio, now require evaluations that include objective measures of student achievement. These aren’t the meat-axe assessments of yesteryear, though. These next-generation teacher evaluations combine classroom observations using new prescriptive protocols with quantitative evidence of learning gains on state tests (or another form of assessment) to determine each teacher’s effectiveness.

The national focus on teacher evaluations raises a couple of questions. First, why have states chosen to focus on teacher evaluations (i.e. what’s the problem that policymakers are trying to solve)? Second, are the new evaluations proving effective in solving the problem?

Let’s start with the why. Recall all the evidence that the single most important in-school factor for student achievement is teacher quality. If we know that good teachers make a difference, it's not surprising that we've focused on evaluating them. Such evaluations hold the potential to identify great teachers whom we can reward, retain, and/or hold up as models, struggling or developing teachers whom we can help to improve, and ineffective teachers who should be removed from the classroom. In other words, evaluations are intended to boost the effectiveness of teachers whom our children learn from.

That’s really only part of the answer, though. Even before there was a law mandating it, principals have long conducted teacher evaluations. Yet those traditional evaluations, typically based solely upon classroom observations, had little effect on teacher quality. Teachers remained in place even if they were obviously struggling. And nearly every one of them got a satisfactory (or even “outstanding” rating).  For instance, a California judge in the recently-decided Vergara case found that a significant number of “grossly incompetent” teachers were allowed to remain in the classroom “because school officials don’t want to go through the time and expense to investigate and prosecute” these cases. (The court estimated that, under the state’s teacher laws, it could take between two and ten years and anywhere from $50,000 to $450,000 to investigate and potentially dismiss an ineffective teacher.)

Now for the second question: Are the new teacher evaluation systems effective in accurately identifying which teachers are making the biggest difference for students and which are struggling?

Early results from states with next-generation teacher evaluation systems suggest that the answer is a resounding no. In Florida, more than 97 percent of teachers are still rated as effective or better. In Delaware, it’s 99 percent. New evaluations are producing more of the same results.[1]

It’s important to note that there isn’t a “right” number of teachers that should be deemed ineffective. But we also need to be honest that teaching—especially teaching well—is incredibly difficult to do. So while it would be great if 99 percent of teachers were effective, it’s hard to believe that in teaching (or any profession) we could actually reach that level.

While there is some anecdotal evidence from teachers and school leaders that regular observations required under the new teacher evaluation systems have value and can improve practice, these relatively modest gains (if you believe them) have come with significant costs.

First, the controversy surrounding teacher evaluations has resulted in a moving target for educators. Here in Ohio, for instance, the GOP-controlled legislature spent the better part of six months debating changes to the nascent Ohio Teacher Evaluation System (OTES). OTES, still in its first year and without any meaningful data, was already being changed in substantive ways, including the frequency of evaluations, the percentage of the evaluation based upon student value-added, and the use of student surveys in evaluations. At the end of the day, a compromise was reached, but nothing about the debate suggests that the underlying issues have been resolved. It’s worth pondering how effective an evaluation system can be if its key components continue to be the focus of debate.

Second, whether fair or not, teacher evaluation systems are being portrayed publicly as being anti-teacher. This is harmful in at least two ways: it has a negative effect on teacher morale, and the ensuing debate becomes polarized and focused on the wrong issue. If we can’t even discuss the real problem, the chance of finding a solution is barely a dream.

Third, perhaps the least anticipated and potentially greatest cost relates to student testing. As Russ Whitehurst and Matt Chingos have noted, state assessments—usually limited to reading and math in grades 3–8—don’t give us enough information to analyze most teachers based upon student growth on objective assessments. To gauge a teacher’s impact on student learning in Ohio, we’ve devised an intricate web of additional assessments and student learning objectives that supplement the state assessments. The result has been a dramatic increase in time students spend testing and teachers spend administering tests. The overreliance on tests for every teacher’s evaluation has put the entire test-based accountability system on trial and jeopardizes the progress that’s been made over the past twenty years.


The move to enshrine teacher evaluations in state law, however well intended, has created a bureaucratized, divisive solution. It’s also a workaround, designed to escape the harmful effects of the laws and policies that the Vergara court found protected adults but damaged the educational prospects of children. As with most workarounds, teacher evaluations are an inefficient way of achieving an admirable and needed goal.

If statewide, uniform teacher evaluations fail to effectively identify struggling teachers, change regularly, cause debate over the value of teachers, and contribute to needless over-testing of students, then we should rethink how we can best achieve the goal of improving the quality of teachers in every classroom. Stay tuned for a future Gadfly where we’ll explore what that might look like.

[1] Ohio has not yet released the first-year results of its teacher evaluation system, so we should reserve judgment as to how effective our system is until that time.


Chad L. Aldis
Chad L. Aldis is the Vice President for Ohio Policy and Advocacy of the Thomas B. Fordham Institute.