The MET study: implications, winners, and losers

1.14.2013

Bill and Melinda Gates funded possibly the most important K–12 research study of this generation.
Photo by Kjetil Ree.

The final report from the Gates-funded “Measures of Effective Teaching” project may prove to be the most important K–12 research study of this generation.

Many others have summarized its findings and opined on its various features, so I’ll only do that lightly here, spending more time on its implications. (See here for Amber and Daniela’s very good synthesis and here for the Washington Post story.)

It’s hard to believe that it’s taken this long for our field to undertake a research project of this level of sophistication on arguably the most important and confounding aspect of K–12 practice and policy: educator effectiveness.

The upshot is that we know far more than before about how to assess a teacher’s ability to improve the learning of students in his/her classroom. That means we now have the power to identify—in every state, district, and school—the teachers likeliest to help kids learn. The consequences for policy are profound.

We don’t yet know enough about how to find individuals who will eventually become great educators or how to train people to get there, but at least now we’re not flying blind. Some will argue that we weren’t flying blind before, but one of the study’s most important findings is that observations—long the core of evaluation systems—are far less predictive than anyone wanted to believe.

It should be pointed out that the research methodology was fascinating and rigorous. It began with an initial view of each participating teacher’s perceived level of effectiveness.

We can determine who our great teachers are, and those teachers tend to be good year after year, largely regardless of the characteristics of the students in their classrooms.

Then students were randomly assigned to classrooms. It turns out that the initial teacher ratings accurately predicted the influence these teachers would have on student learning. As one Gates researcher noted, “We were able to show that teacher effectiveness is really about the teachers,” not the demographics of students in their classrooms.

In other words, we can determine who our great teachers are, and those teachers tend to be good year after year, largely regardless of the characteristics of the students in their classrooms.

As noted above, a critically important finding relates to observations. For eons, observations were evaluations. The study, however, casts some doubt on how valuable observations actually are.

Two excellent scholars, Jay Greene and Marty West, have different interpretations of the study’s implications on this score. If you follow this field and its developments closely, you ought to read their back and forth.

As one Gates researcher noted, “We were able to show that teacher effectiveness is really about the teachers,” not the demographics of students in their classrooms.

Though the size of the finding might be debatable, the upshot is this: Measures of student performance and student surveys come out looking good, and observations lose a good bit of luster. (The latter point should give pause to those training observers.)

As Tim Daly from TNTP was quoted as saying, “The way that most teachers have been evaluated forever is completely unreliable…Before, what we were weighing is, ‘Should we move in the direction of using student learning or is it too precarious?’ (This study’s findings) show we have no choice but to change—the way they're doing it is totally inadequate.”

Tom Kane, the lead researcher, concludes that the best evaluation system uses a combination of student-growth measures, evaluations, and student surveys.

Several related findings are worth noting:

A “multiple-measures” approach with a relatively equal balance of surveys, observations, and growth measures is the best mix because of low volatility and robust predictive ability;
An observation system produces the best results when there are multiple observations conducted by several different people (not the occasional pass-through by a teacher’s designated assistant principal); and,
While there are large differences in the levels of effectiveness of teachers at opposite tails of the distribution, it is far harder to distinguish between teachers closer to the center.

Since the worth of observations has now been brought into question, we have to start wondering about the most widely used observation protocols and rubrics. Those probably need to be revamped. And fast.

Greene’s contrarian point about the costs associated with surveys and observations is powerful and should be taken seriously by scholars and policymakers.

Similarly, we have to wonder about the training of those who conduct observations. Are administrator-prep programs properly instructing leaders on how to assess a teacher’s practice? What if the agreed-upon definition of “strong practice” doesn’t correlate with student learning gains? Which gives?

Greene’s contrarian take is powerful and should be taken seriously by scholars and policymakers. His point about the costs associated with surveys and observations is crucial. Both are expensive, especially the latter; so are they currently worth their price tags?

Here are some thoughts on winners and losers:

WINNER: The Bill and Melinda Gates Foundation deserves a great deal of credit for undertaking this huge project. They’ve opened up and shed light on a subject that many considered an impenetrable black box. Many questions remain but they have helped lift a corner of the great veil.

LOSER: The various research arms of U.S. Department of Education (from inception to today). This is precisely the kind of research that the feds should’ve been conducting for the last 30+ years. Even those most critical of the feds’ role in K–12 schooling generally agree that high-quality research should be part of USED’s portfolio. Why did it take a private foundation to launch this effort?

WINNER: Race to the Top and its backers. Despite loud complaints from the establishment, the grant competition’s all-important Section D was founded on new evaluations that used measures of student performance. The MET study suggests that this was a bold and proper move. The Department’s subsequent move to shift TIF in the same direction now also looks wise.

LOSER: Those training-teacher observers. They’ve had decades to refine rubrics and prepare administrators for this critical work. And they produced results that appear to have come up short.

WINNER: Ed-reform advocacy organizations, especially TNTP. The group’s The Widget Effect largely launched this line of work, and, ever since, they’ve been arguing for smart evaluation reform and working with states and districts to bring it to life (a herculean task). Their time and energy appear to have been very well spent.

LOSER: Those who’ve argued that measures of student performance and student surveys should not be used to assess educator effectiveness and that observations are the way to go.

POTENTIAL BIGGEST WINNER: Whoever uses these findings to figure out how to identify potentially great teachers and how to train teachers to become superb.