Evaluating Teachers with Classroom Observations: Lessons Learned in Four Districts

5.21.2014

Everyone agrees that a good teacher makes all the difference in the world—but that’s where the agreement ends. This new report from Brookings adds to the body of research examining how to decide what makes a good teacher, specifically looking at teacher-evaluations systems in four moderate-sized urban districts in an effort to suggest ways to improve them. Analysts link individual student and teacher data—one to three years of them, from 2009 to 2012—and specifically use two consecutive years of district-assigned evaluation scores for teachers with value-added ratings. There were five key findings: First, only a small minority of the workforce (22 percent) can be evaluated using gains in test scores; the remainder of educators in nontested grades and subjects are evaluated using classroom-observation scores (which account for 40–75 percent of their ratings) and teacher-developed measures, school value added, and student feedback, among other things. Second, observation scores are fairly stable from year to year. Third, including school-value added in the teachers’ evaluations (not surprisingly) tends to bring down the scores of good teachers in bad schools and inflates the scores of weaker teachers in good schools. Fourth, teachers with initially high-performing kids receive higher observation scores on average than do teachers who have initially lower-performing kids; this finding holds when comparing observation scores of the same teacher at different points in time—meaning that this result is probably not due to better teachers getting better kids (What this means, again not surprisingly, is that it’s easier to teach a dynamite lesson that scores well against observation rubrics when your students are higher performing.) Fifth, and lastly, observations conducted by outsiders are more predictive than those conducted by school principals (MET said this, too). In the end, researchers recommend that observation scores be adjusted based on student demographics, similar to how many value-added scores are adjusted, and that school value added be scrapped or have its weight reduced as a teacher evaluation metric.

Grover J. Whitehurst, Matthew M. Chingos, and Katharine M. Lindquist, Evaluating Teachers with Classroom Observations: Lessons Learned in Four Districts (Washington, D.C.: Brown Center on Education Policy at Brookings, May 2014).