ESSA Accountability Design Competition: My big takeaways

Joanne Weiss

On February 2, I had the privilege of being a judge for the Fordham Institute’s ESSA Accountability Design Competition. It’s widely known that I’m a fan of using competition to drive policy innovation, and this competition did not disappoint. Fordham received a stunning array of proposals from teachers, students, state leaders, and policy makers.

But before we turn to the insights buried in these pages, I want to praise the competition’s conception, which mirrored the process that states should replicate as they design their own accountability systems. Contestants explained how their proposed accountability systems would support a larger vision of educational success and spur desired actions. They laid out their design principles—attributes like simplicity, precision, fairness, and clarity. They defined the indicators that should therefore be tracked, and they explained how those indicators would roll up into ratings of school quality. Finally, they laid out how each rating would be used to inform or determine consequences for schools. All decisions were explained in the context of how they would forward the larger vision.

Together, these proposals represent a variety of both practical and philosophical approaches to accountability system design. Here are the five major themes I found most noteworthy.

1. The ascendance of growth

For the past fourteen years, the currency in education has been “proficiency”—the percentage of students scoring at or above “proficient” on their state’s standardized assessment. There is no doubt that proficiency is important, and it remains a legally required academic indicator under ESSA. But the exclusive focus on proficiency under NCLB incented such well-documented perverse behaviors as states lowering the bar for what it meant to be proficient, as well as schools focusing improvement efforts only on those “bubble kids” whose performance was just below proficient.

In this competition, every applicant proposed adding growth measures to the accountability system to broaden the picture of a school’s academic performance. Growth models tell us how much a student has learned over a given period of time. Students who are behind need to grow “faster” (learn more in a year) than the typical student to catch up, or they need to be given more time to get through school. Similarly, students who are ahead need more challenges in order to ensure that they stay engaged and stretch to reach their potential. Without measuring growth, educators can’t assess how well their schools are doing at advancing students’ learning.

A variety of approaches to measuring growth were proposed. Bellwether’s Chad Aldeman suggested that schools chart whether each student moved “up” from one performance band to the next over time, moved backwards, or stood still. While this is only a rough proxy for growth, it has the benefit of being easy to understand, feasible to implement with today’s state assessments, and clear to every student about where he or she stands.

The Pritchard Committee high school students advocated for the use of student growth percentiles, based on the Colorado Growth Model—as did Richard Wenning, the father of that model. Pritchard noted that this approach can be graphically depicted to allow the “public to easily distinguish [among] schools,” including those that serve different subgroups of students well (or poorly).

Most of the other proposals offered variations on value-added growth models. The calculation approaches to measuring growth get wonky fast (see Sherman Dorn for a model that includes the most words I had to look up), but the team of Polikoff, Duque, and Wrabel transformed all student scores into a nicely understandable 0–100 point scale, and upped the ante by using a two-step value-added model to control completely for all student characteristics.

TeachPlus advocated for direct measurements of growth using computer-adaptive tests capable of pinpointing student performance along a wide, multi-grade learning progression. Though this is my personal favorite, our current standardized tests (even the computer-adaptive Smarter Balanced) don’t yet do this accurately or reliably enough.

One assessment issue stands in the way of measuring growth properly. NCLB required that assessments measure only “on grade level” knowledge. So every fifth grader, for example, had to take the test of fifth-grade knowledge. If that fifth grader started the school year performing at a second-grade (or an eighth-grade) level, and gained 1.5 years of learning over the course of that school year, the fifth-grade test would not accurately assess that student’s growth; doing so would require that the test include “out of grade level” items. ESSA removes this constraint, so it is now possible to measure growth accurately. However, states’ current standardized assessments are built for the old rules and do not (yet) capture the full range of student performance. If pressed by states on this issue, the assessment community will (I hope) address this challenge.

2. Juding school quality: Let me count the ways

Under ESSA, states are required to include in their accountability systems at least one non-academic, statewide measure of school quality. While states may be tempted to use easy-to-report data like school attendance to comply with this requirement, the Fordham applicants showed us the hidden potential here.

Applicants thought through their theories of change and figured out what additional data was needed to promote the improvement behaviors they were looking for. Pritchard reminded us that “students are the chief stakeholders in schools” and advocated for surveys and open-ended responses to inform improvement at the school level. Ronald Ferguson agreed. Polikoff et al did a terrific job of identifying five indicators that “incentivize schools to focus on desirable outcomes”: absenteeism (overall and chronic), student engagement and happiness (measured through surveys), equity (specifically disproportionality in discipline for different groups of students), students’ success in subsequent grades, and access to a full, rich curriculum for every student. Wenning drove home his focus on continuous improvement by proposing the use of digital portfolios for students. He also includes a set of “educator opportunity to learn and perform” indicators to focus schools on effective professional learning.

The research behind much of this work is compelling, and the format of the proposals makes it easy for policy makers to absorb the research quickly.

3. Out of many, a few

One of the tenets of NCLB was that every school would get one rating. While ESSA requires that each school be rated, nowhere does it require that only one rating be given to each school. And a number of proposals drove smartly through this crack in the door. Several applicants arrived at the key insight that different indicators can be used for different purposes. All of the indicators don’t have to be scored, weighted, and rolled up into one rating that’s used to make all accountability determinations.

The academic indicators, for example, can be used by states to make determinations about which schools are low-performing and require labeling and intervention. The academic indicators can also be used to identify strong schools (those to learn from) and mid-range schools (those that need to keep improving). The school quality indicators can then be used to diagnose the root causes of the problems and to point the way toward potential solutions. TeachPlus dubbed this a “two-tiered accountability system” where the second tier provides “metrics that are informative but not determinative.”

As Wenning noted, his proposal “would not produce a single rating across indicators because a single rating combining so many measures would fail to promote public understanding [and would] mask important strengths and weaknesses.” PolikoffTeachPlus, and Education First all concurred. They offered separate ratings for different types of information, believing that clear visualizations disaggregated by subgroup would help educators (and the public) identify and diagnose gaps, strengths, and problem areas.

4. It takes a village

Several applicants challenged another closely held principle of NCLB. They argued that the objective formulae at the center of most current accountability calculations are too blunt. Room for subjective judgment is sorely needed.

Different proposals offered various solutions to this problem. Several suggested that schools and communities customize or select indicators to meet their needs. One of my favorites, the Education First proposal, included the design priority: “local communities should have real decision-making” power. They make good on this promise by laying out, in detail, a roadmap toward a wholly new type of accountability system. In it, statewide goals sit side-by-side with local goals, and easy-to-interpret reports provide insight into how schools are doing along both dimensions.

With a different take on the problem, the proposal from Dale Chu and Eric Lerum laid out roles for the SEA, the LEA, and the school; it also builds an accountability system that devolves the selection of indicators and the responsibility for consequences to the appropriate actor in the system. And Wenning specified whether the school, district, and/or state should define the targets for each school quality indicator.

Looking at the problem in a totally different way, Aldeman proposed using a professional “inspectorate” (experts operating under contract with the state) to make the final accountability determinations. His formula over-identifies low-performing schools, then requires all of these schools to go through an expert review process. No school’s rating is final until the inspectors have spoken. Modeled on a successful U.K. approach, the hope is that low-performing schools will get more than just a rating; they’ll get expert advice on how to improve.

Perhaps the most far-out and creative idea comes from Dorn, who proposes a citizen peer review process for low-performing schools. Dorn dubbed this the “grand jury” model (words that made my fellow competition judges cringe), but the analogy is strong. As Dorn puts it, accountability algorithms “omit critical context, especially around the judgment of schools with low-performing and vulnerable demographic subgroups.” His solution is to “insert citizen judgment around issues of education equity” by convening a civil grand jury that has independent subpoena authority, reviews evidence, and makes determinations about the fate of these schools. That’s community empowerment.

5. The future is now

While the competition was filled with creative policy ideas, only Wenning designed a truly “next-generation” accountability system. He proposed, for example, the use of individual student digital portfolios that “contain evidence and credentials belonging to the student” and provide views for students, families, educators, colleges, and employers. He described a robust online data system that offers visualizations for each indicator disaggregated by student subgroup, so that schools’ strengths and areas for improvement are self-evident. And he builds in both customization and waiver opportunities (as does Education First) to encourage innovation at the school and district levels. States moving rapidly toward technology-enabled, competency-based education models will find inspiration in Wenning’s ideas.

ESSA offers states the opportunity to develop their own aligned, coherent accountability system—one that forwards their state’s vision of educational improvement. The risk is that states will not take advantage of this opportunity: Either they will plead capacity constraints, tweak NCLB, and keep their current compliance machines in place; or they will take half-measures, throwing out NCLB and replacing it with an incoherent set of metrics. This competition charts better ways forward. It offers states an array of ways to think about accountability and provides coherent blueprints for how to get there. It’s a must-read for every state leader getting ready to embark on accountability redesign.

Joanne Weiss is an independent consultant to organizations on education programs, technologies, and policy. She is the former chief of staff to U.S. Secretary of Education Arne Duncan and led the Race to the Top and Race to the Top Assessment programs.