Standards, Testing & Accountability

Editor’s note: This is the third in a series of blog posts that will take a closer look at the findings and implications of Evaluating the Content and Quality of Next Generation Assessments, Fordham’s new first-of-its-kind report. The first two posts can be read here and here.

The ELA/literacy panels were led by Charles Perfetti (distinguished professor of psychology and director and senior scientist at the University of Pittsburgh’s Learning Research and Development Center) and Lynne Olmos (a seventh-, eighth-, and ninth-grade teacher from the Mossyrock School District in Washington State). The math panels were led by Roger Howe (professor of mathematics at Yale University) and Melisa Howey (a K–6 math coordinator in East Hartford, Connecticut).

Here’s what they had to say about the study.

***

Which of the findings or takeaways do you think will be most useful to states and policy makers?

CP: The big news is that better assessments for reading and language arts are here, and we can expect further improvements. Important for states is that, whatever they decide about adoption of Common Core State Standards, they will have access to better assessments that will be consistent with their goals of improving reading and language arts education....

Way back in the days of NCLB, testing often existed in a vacuum. Lengthy administration windows created long delays between taking the test and receiving results from it; many assessments were poorly aligned with state standards and local curricula; communication with parents and teachers was insufficient; and too much test preparation heightened the anxiety level for teachers and students alike. These issues largely prevented assessments from being used to support and drive effective teaching and learning. That doesn’t mean just state tests, either, but rather the full range of assessments given during the year and across curricula.

But the new federal education law creates a chance for a fresh start. While ESSA retains yearly assessment in grades 3–8 and once in high school, the role of testing has changed. States are now empowered to use additional factors besides test scores in their school accountability systems, states may cap the amount of instructional time devoted to testing, funding exists to streamline testing, and teacher evaluations need no longer be linked to student scores. These changes may mean less anxiety, but that won’t equate to better outcomes unless significant reforms occur when states design their new assessment systems.

A new report from the Center...

Editor’s note: This is the second in a series of blog posts that will take a closer look at the findings and implications of Evaluating the Content and Quality of Next Generation Assessments, Fordham’s new first-of-its-kind report. The first post can be read here

Few policy issues over the past several years have been as contentious as the rollout of new assessments aligned to the Common Core State Standards (CCSS). What began with more than forty states working together to develop the next generation of assessments has devolved into a political mess. Fewer than thirty states remain in one of the two federally funded consortia (PARCC and Smarter Balanced), and that number continues to dwindle. Nevertheless, millions of children have begun taking new tests—either those developed by the consortia, ACT (Aspire), or state-specific assessments constructed to measure student performance against the CCSS, or other college- and career-ready standards.

A key hope for these new tests was that they would overcome the weaknesses of the previous generation of state assessments. Among those weaknesses were poor alignment with the standards they were designed to assess and low overall levels of cognitive demand (i.e., most items required simple recall or...

A decade ago, U.S. education policies were a mess. It was the classic problem of good intentions gone awry.

At the core of the good idea was the commonsense insight that if we want better and more equitable results from our education system, we should set clear expectations for student learning, measure whether our kids are meeting those expectations, and hold schools accountable for their outcomes (mainly gauged in terms of academic achievement).

And sure enough, under the No Child Left Behind law, every state in the land mustered academic standards in (at least) reading and math, annual tests in grades 3–8, and some sort of accountability system for their public schools.

Unfortunately, those standards were mostly vague, shoddy, or misguided; the tests were simplistic and their “proficiency” bar set too low. The accountability systems encouraged all manner of dubious practices, such as focusing teacher effort on a small subset of students at risk of failing the exams rather than advancing every child’s learning.

What a difference a decade makes. To be sure, some rooms in the education policy edifice remain in disarray. But thanks to the hard work and political courage of the states, finally abetted by some...

If you care about state education policy and/or the new federal education law, you ought to spend some time doing three things. First, consider how the performance of schools (and networks of schools) needs to be assessed. Second, read the short Fordham report At the Helm, Not the Oar. Third, encourage your favorite state’s department of education to undertake an organizational strategic planning process.

All three are part of a single, important exercise: figuring out what role the state department of education must play in public schooling.

By now, everyone knows that ESSA returns to states the authority to create K–12 accountability systems. So it’s worth giving some thought to what, exactly, schools and districts should be held accountable for. What do we want them to actually accomplish?

But even if we get clear on the “what,” the “who” and “how” remain. Which entity or entities should be tasked with this work, and how should they go about it?

In At the Helm, which I co-wrote in 2014 with Juliet Squire, we argue that there are lots and lots of things handed to state departments of education (also known as state education agencies, or “SEAs”) that could be better achieved elsewhere....

Joanne Weiss

On February 2, I had the privilege of being a judge for the Fordham Institute’s ESSA Accountability Design Competition. It’s widely known that I’m a fan of using competition to drive policy innovation, and this competition did not disappoint. Fordham received a stunning array of proposals from teachers, students, state leaders, and policy makers.

But before we turn to the insights buried in these pages, I want to praise the competition’s conception, which mirrored the process that states should replicate as they design their own accountability systems. Contestants explained how their proposed accountability systems would support a larger vision of educational success and spur desired actions. They laid out their design principles—attributes like simplicity, precision, fairness, and clarity. They defined the indicators that should therefore be tracked, and they explained how those indicators would roll up into ratings of school quality. Finally, they laid out how each rating would be used to inform or determine consequences for schools. All decisions were explained in the context of how they would forward the larger vision.

Together, these proposals represent a variety of both practical and philosophical approaches to accountability system design. Here are the five major themes I found most noteworthy.

1. The...

Michael Hansen

I walked away from Fordham’s School Accountability Design Competition last Tuesday pleasantly surprised—not only at the variety of fresh thinking on accountability, but also at how few submissions actually triggered the “I think that’s illegal” response. I left encouraged at the possibilities for the future.

The problem of one system for multiple users

Having done some prior work on school accountability and turnaround, I took great interest in the designs that came out of this competition and how they solved what I’m going to call the “one-system-multiple-user” problem. Though the old generation of systems had many drawbacks, I see this particular problem as their greatest flaw and the area where states will most likely repeat the mistakes of the past.

Basically, the one-system-multiple-user problem is this: The accountability design is built with a specific objective in mind (school accountability to monitor performance for targeted interventions) for a single user (the state education office); but the introduction of public accountability ratings induces other users (parents, teachers, district leaders, homebuyers, etc.) to use the same common rating system. Where the problem comes in is that not all user groups have the same objective; indeed we expect them to have different purposes in...

The Thomas B. Fordham Institute has been evaluating the quality of state academic standards for nearly twenty years. Our very first study, published in the summer of 1997, was an appraisal of state English standards by Sandra Stotsky. Over the last two decades, we’ve regularly reviewed and reported on the quality of state K–12 standards for mathematicsscienceU.S. historyworld historyEnglish language arts, and geography, as well as the Common CoreInternational BaccalaureateAdvanced Placement and other influential standards and frameworks (such as those used by PISA, TIMSS, and NAEP). In fact, evaluating academic standards is probably what we’re best known for.

For most of the last two decades, we’ve also dreamed of evaluating the tests linked to those standards—mindful, of course, that in most places, the tests are the real standards. They’re what schools (and sometimes teachers and students) are held accountable for, and they tend to drive curricula and instruction. (That’s probably the reason why we and other analysts have never been able to demonstrate a close relationship between the quality of standards per se and changes in student achievement.) We wanted to know how well matched the assessments were to the standards, whether they were of high...

New York State education officials raised a ruckus two weeks ago when they announced that annual statewide reading and math tests, administered in grades 3–8, would no longer be timed. The New York Post quickly blasted the move as “lunacy” in an editorial. “Nowhere in the world do standardized exams come without time limits,” the paper thundered. “Without time limits, they’re a far less accurate measure.” Eva S. Moskowitz, founder of the Success Academy charter schools had a similar reaction. “I don’t even know how you administer a test like that,” she told the New York Times

I’ll confess that my initial reaction was not very different. Intuitively, testing conditions would seem to have a direct impact on validity. If you test Usain Bolt and me on our ability to run one hundred meters, I might finish faster if I’m on flat ground and the world record holder is forced to run up a very steep incline. But that doesn’t make me Usain Bolt’s equal. By abolishing time limits, it seemed New York was seeking to game the results, giving every student a “special education accommodation” with extended time for testing. 

But after reading the research and talking to leading psychometricians, I’ve concluded that both...

The assessments edition

In this week's podcast, Mike Petrilli and Robert Pondiscio preview Fordham’s long-awaited assessments evaluation, analyze low-income families’ education-related tech purchases, and wave the red flag about TFA’s lurch to the Left. In the Research Minute, David Griffith examines how well the nation’s largest school districts promote parent choice and competition between schools.

Amber's RM

Grover (Russ) J. Whitehurst, "Education Choice and Competition Index 2015," Brookings (February 2016).

Pages