In this week's podcast, Mike Petrilli and Robert Pondiscio assess New York State’s decision to end timed tests, extract lessons from a former teacher’s troubled account of his time in the classroom, and discuss the most creative and worthwhile ideas that came out of Fordham’s ESSA Accountability Design competition. In the Research Minute, Amber Northern examines the value of recess.
The eyes of the nation are fixed on a tournament of champions this week. Snacks have been prepared, eager spectators huddle around their screen of preference, and social media is primed to blow up. Veteran commentators have gathered at the scene to observe and pontificate. For the competitors, the event represents the culmination of months of dedicated effort, and sometimes entire careers; everything they’ve worked for, both at the college and professional level, has led up to this moment. The national scrutiny can be as daunting for grizzled journeymen as it is for fresh-faced greenhorns. You know what I’m talking about:
The Fordham Institute’s ESSA Accountability Design Competition.
Okay, you probably know what I’m talking about. If you inhabit the world of education policy, you took notice of Fordham’s January call for accountability system frameworks that would comply with the newly passed Every Student Succeeds Act—and take advantage of the new authority the law grants to states. With the federal influence on local classrooms scaled back so suddenly, it will be up to education agencies in Wisconsin and Mississippi and Alaska to adopt their own methods of setting the agenda for schools and rating their performance in adhering to it.
The purpose of our little conclave—and the request for proposals that proceeded it—was to seek out some of the best ideas percolating in the far-flung reaches of wonkworld, interrogate their goals and assumptions, and inject them into the murk of policy implementation. Perhaps the good folks over at the Department of Education could take a few notes—and if we can accomplish that, there may even be hope for the overextended Goodell regime.
The response was immediate and overwhelming. We received dozens of plans from thinkers in academia, consulting practices, the think tank world, and even major urban school districts. Upon review, the competition was winnowed to a playoff field of ten intrepid contenders from across the country. They gathered Tuesday afternoon at the frozen tundra of Lambeau Field, er, Fordham’s cheerily appointed and climate-controlled meeting space. Battle was joined. Soft cheeses were consumed. And verdicts were rendered for each proposal by our team of experts: former two-time state schools chief Tony Bennett, career teacher and charter founder Charlene Mendoza, former Department of Education Chief of Staff Joanne Weiss, and all-around reform mensch Andy Smarick. Each participant addressed his remarks both to the packed house and this formidable gauntlet of evaluators, each poised to stump them with questions or issue withering judgment in the vein of American Idol. (This may be the place where my football metaphor starts to break down, but bear with me.)
One of the key quandaries of the post-No Child Left Behind landscape is how to accurately gage performance without falling back too heavily on the much-loathed regimen of standardized tests. Suggestions for achieving that aim helped a few teams put points on the board—but also led to some vicious collisions. Even while looking favorably on Chad Aldeman’s general outline, several judges raised eyebrows at his suggestion to base evaluations partially on visits by apolitical school “inspectors” (a model that draws inspiration from the British system). Arizona State University’s Sherman Dorn unveiled a crisp delivery (and awesome use of art) to attack “brain-dead algorithms” for school ratings, but his bid to use grand juries to identify schools with low-performing subgroups and equity struggles proved the most polarizing idea of the day; Tony Bennett dismissed it as “a statutory and regulatory mess.”
Not every challenger could rely on prior postseason experience. Mixed in with the graybeards were the makings of a bona fide youth movement: Lydia Burns and Jamie Smith, two Kentucky high school students who helped design their presentation as members of the state’s Pritchard Committee Student Voice Team. The pair voiced some respectful plaudits for their more seasoned rivals, proclaiming them “the most brilliant policy experts in the country.” Once the bright lights came on, though, the rookies proved they were born for primetime, breaking down a meticulously cited game plan to measure performance through indicators of proficiency, growth, achievement gaps, and school climate. Whether their awe-struck deference was genuine or merely a clever ploy, the kids enjoyed a lusty reception from both judges and onlookers. “You can stand with any wonk in this room, any time,” gushed Joanne Weiss.
As the competition progressed, presenters began to coalesce around a handful of common objectives—and offered more than a few novel schemes for how to realize them. Endorsements were made for different non-academic indicators of school quality, including retention rates, faculty absenteeism, fullness of curriculum, and student attendance. Bennett stumped for a clear, dispositive A–F rating for the reasons of clarity, while Weiss countered that their “simplicity masks a lot of important information.”
The game’s outcome wasn’t solely in the hands of the officials, thankfully. Through the magic of online survey technology, audience members were able to instantly weigh in on every proposal. Though the mood of the crowd was broadly generous (attendees seldom pushed the “Hate it” or “I think it might be illegal” buttons, as was their prerogative), a few standout playmakers earned particular admiration. As Fordham’s own Robert Pondiscio sagely observed, two of the most fêted plans were the works of either pupils (the instant-favorite Student Voice group) or teachers (the Teach Plus policy fellows, who drew up an innovative two-tiered system of separate performance indicators).
In the end, though, no winner was crowned. “Partly that’s because I’m too cheap to hand out cash prizes,” conceded Commissioner Mike Petrilli. But the matter of championship trophies was secondary to the question on everybody’s minds: Now that these ideas are in the air, will they ever make it to schools on the ground? Do any states have the personnel and the backbone to remodel their accountability plans around the new ESSA playbook?
That’s the greater contest—bigger than our terrific and fun event, bigger than the Super Bowl. We still don’t know whether most states agencies have the capacity to fundamentally change their present systems and start assessing schools by more precise and representative metrics. Finding out is going to involve some bumps. Reformers need to grab their helmets and strap on some pads. And for once, let’s all skip past the commercials.
I’ve pulled out some of the best nuggets from across the twenty-six submissions.
Indicators of Academic Achievement
ESSA requires state accountability systems to include an indicator of academic achievement “as measured by proficiency on the annual assessments.”
Yet not a single one of our proposals suggests using simple proficiency rates as an indicator here. That’s because everyone is aware of NCLB’s unintended consequence: encouraging schools to pay attention only to the “bubble kids” whose performance is close to the proficiency line. So what to use instead? Ideas include:
An achievement index. Under these plans, schools would receive partial credit for getting students to basic levels of achievement, full credit for getting them to proficiency, and extra credit for getting them to advanced. (That’s how Ohio’s “performance index” works today.) Here’s how Kristin Blagg of the Urban Institute puts it: “There is evidence that accountability pressure, particularly when schools or districts are on the margin of adequate yearly progress (AYP), is associated with neutral-to-positive achievement gains….I would use a five-tier system to assess the levels of student achievement, banking on the idea that four rather than three cut-offs would spur more schools to coach 'bubble' students into the next achievement level.”
Scale scores. Other plans skip the use of performance levels entirely, at least for the purpose of calculating achievement scores. Morgan Polikoff and colleagues write about their design, “Performance is measured by conversion of students’ raw scale scores at each grade level to a 0–100 scale. This is superior to an approach based on performance levels or proficiency rates in that it rewards increases in performance all along the distribution (rather than just around the cut points).”
Cross-sectional achievement. Sherman Dorn proposes a mix of many measures, all derived (and transformed) from proficiency rates and scale scores. As he puts it in a blog entry, it is “a deliberate jury-rigged construction; that is a feature, not a bug.”
Here we pose a question for the U.S. Department of Education: Will your regulations allow these alternatives to straight-up proficiency rates?
Indicators of Student Growth or an Alternative
ESSA also requires state accountability systems to include “a measure of student growth, if determined appropriate by the State; or another valid and reliable statewide academic indicator that allows for meaningful differentiation in school performance.” Yet nobody in our competition went for an alternative: everyone went for growth, and usually in a big way.
Richard Wenning, the designer of the Colorado Growth Model, explains why: “Disaggregation and high weighting of growth and its gaps are essential because too often, poverty and growth are negatively correlated….If an accountability system places greatest weight on growth, it creates an incentive to maximize the rate and amount of learning for all students and supports an ethos of effort and improvement.” I would add that true growth models—rather than “growth to proficiency” ones—encourage schools to focus on all students instead of just their low-performers.
So which growth models were proposed?
Student growth percentile (a.k.a. Colorado Growth Model). Back to Wenning: “Student growth percentiles based on annual statewide assessments of reading, mathematics, other core subjects…comprise the first layer of evidence reported and employed for school ratings. The corresponding metrics are median growth percentiles, with fiftieth-percentile growth reflecting the normative concept of a year’s growth in a year’s time; and adequate growth percentiles, which provide a student-level growth target constituting ‘good enough’ growth, and which yield the percentage of students on track to proficiency or on track to college and career readiness.”
Two-step value-added model. Polikoff et al. write, “This model is designed to eliminate any relationship between student characteristics and schools’ growth scores. In that sense, it is maximally fair to schools.” More information on the two-step model is available in this Education Next article by Mark Ehlert, Cory Koedel, Eric Parsons, and Michael Podgursky.
Transition matrix. Bellwether Education Partners’ Chad Aldeman suggests this approach for its simplicity and cost effectiveness. “It gives students points based on whether they advance through various performance thresholds. Unlike under NCLB, where districts focused on students right at the cusp of proficiency—the 'bubble' kids—this sort of method creates several, more frequent cutpoints that make it harder to focus on just a small subset of students. This approach offers several advantages over more complex growth models. Any state could implement a transition matrix without external support, and the calculations could be implemented on any state test. Most importantly, and in contrast to more complex models, the transition matrix provides a clear, predetermined goal for all students. School leaders and teachers would know exactly where students are and where they need to be to receive growth points.”
Indicator of Progress toward English Language Proficiency
ESSA also requires state accountability systems to measure “progress in achieving English language proficiency, as defined by the State.” This one’s way outside my area of expertise, but my sense is that the feasibility of implementing this part of the law rests on the quality and nature of English language proficiency assessments. Can they accurately produce “growth measures” over time? Or are states better off just using their regular English language arts assessments here, as some proposals suggest? Several proposals also suggest weighting the ELL indicator in proportion to the concentration of ELLs at a given school.
Another question for the Department of Education: Will the department allow that?
Indicator(s) of Student Success or School Quality
Finally, ESSA requires state accountability systems to include “not less than one indicator of school quality or student success that allows for meaningful differentiation in school performance” and is “valid, reliable, comparable, and statewide.” With this final set of indicators, Congress opened the door to more holistic and creative approaches to measuring effectiveness—and our contenders did not disappoint. Here are some of my favorite ideas:
School inspections. Chad Aldeman writes, “Under this system, no school’s rating is final until they complete a formal inspection. The inspections would be based off the school inspectorate model used as part of the accountability and school improvement process in England. As Craig Jerald described in a 2012 report, “inspectors observe classroom lessons, analyze student work, speak with students and staff members, examine school records, and scrutinize the results of surveys administered to parents and students.” Although the interviews provide context, the main focus is on observations of classroom teaching, school leadership, and the school’s capacity to improve.” Kristin Blagg also endorses an “observation system,” which she analogizes to the Quality Rating and Improvement System (QRIS) used to differentiate among early childcare and education providers. Matthew Petersen and his fellow Harvard Graduate School of Education students similarly called for “peer visits.”
Surveys. Ron Ferguson et al. write, “An accountability system should do more than simply measure and reward tested outcomes. Educators need tools and incentives to monitor and manage multiple processes for achieving intended results. Therefore, the state should require the use of valid and reliable observational and survey-based assessment tools. These can provide feedback from students to teachers, and from teachers to administrators, on school climate, teaching quality, and student engagement in learning, as well as the development of agency-related skills and mindsets. For these observational and survey-based metrics, schools should not be graded on the measured scores. Instead, they should be rated on the quality of their efforts to use such measures formatively for the improvement of teaching and learning. Ratings should be provided by officials who supervise principals, contributing 10 percent of a school’s composite accountability score.” Several other proposals, including ones from Jim Dueck and Alex Hernandez, are big on surveys too. Hernandez even turns the results into a “Love” score—as in, “Will my child enjoy and thrive in this school?” Specific tools mentioned include Tripod (developed in 2001 by Ferguson and selected in 2008 for the MET project) and the University of Chicago 5 Essentials Survey, which David Stewart and Joe Siedlecki were high on.
A well-rounded curriculum. Polikoff and colleagues propose a measure that “captures the proportion of students who receive a rich, full curriculum. We define a full curriculum as access to the four core subjects plus the arts and physical education each for a minimum amount of time per week. Our goal with this measure is to ensure that schools do not excessively narrow the curriculum at the cost of non-tested subjects and opportunities for enrichment. This indicator will be verified through random audits.”
Other indictors mentioned here included teacher absenteeism; chronic student absenteeism; and student retention rates (particularly important for communities with lots of school choice). And several proposals (such as the one from the Teach Plus Teaching Fellows, and another from Samantha Semrow and her fellow Harvard classmates) suggest including additional data on schools’ report cards, but not using them to determine school grades. (Melany Stowe’s “measure up” dashboard is a particularly engaging model.) That seems like a smart approach, especially for indicators that are new and perhaps not yet “valid and reliable.” Furthermore, as the Teach Plus Fellows explain, these additional data can be used to “examine whether certain factors have predictive power” for improving student achievement. “In this way, states will have the opportunity to not only identify struggling schools or subgroups but form actionable hypotheses for improving outcomes, using disciplined inquiry to drive improvement.”
Calculating Summative School Grades
ESSA requires states to “establish a system of meaningfully differentiating, on an annual basis, all public schools in the State, which shall be based on all indicators in the State’s accountability system…for all students and for each subgroup of students.” Most of our contenders proposed indices with various weights for their indicators, and typically at least some consideration for the performance of subgroups. But a few offered some outside-the-box ideas:
Mix and match. Dale Chu and Eric Lerum of America Succeeds suggest that states offer schools and districts a menu of indicators to be used to generate their grades. They explain, “This design aspires to create the conditions for flexibility and entrepreneurship at the local level. One of the problems that arose with the previous accountability regime was that it funneled schools toward one model. By allowing schools and districts to develop their own performance goals aligned with their programs, existing performance, and needs of students, ownership of school improvement will lie with the stakeholders closest to the students.”
Inclusion of locally designed indicators. In a similar vein, Jennifer Vranek and her colleagues at Education First write, “Past accountability systems were the darlings of policy makers, think tanks, foundations, editorial boards, and advocates; they rarely had the support of educators, school communities, and the public writ large. They were too often equated with excessive testing that many parents believe ‘takes time away from learning.’ Our design provides school communities the opportunity to select additional indicators and measures in every component through discussion of what matters most to them, to share that publicly, and to commit to work that addresses goals the community develops.”
And a final question for the U.S. Department of Education: Will your regulations allow for the “mix and match” and “locally developed indicators” approaches? Or will you read ESSA as requiring a “single accountability system,” meaning one-size-fits-all for all schools and districts in the state?
Believe me when I say that’s just a sampling of the smorgasbord of sound ideas and fresh thinking available in the two-dozen-plus accountability designs submitted for our competition. I hope others mine the proposals and highlight great suggestions that I missed. And don’t forget to tune in Tuesday to watch our ten finalists make the case for their own unique approach to ESSA accountability.
A report last month from the “Making Caring Common” project at the Harvard Graduate School of Education calls on elite colleges and universities to “send different messages” to high school students and parents about what matters—and, more importantly, what will gain admission—to America’s most hallowed higher education institutions. “Today’s culture sends young people messages that emphasize personal success rather than concern for others and the common good,” laments the report, entitled Turning the Tide. To combat this rising swell of student stress and self-regard, the college admissions process should motivate high schoolers to “contribute to others and their communities in more authentic and meaningful ways.”
Top admissions and financial aid officials at several dozen elite American colleges and universities have eagerly endorsed the report’s recommendations, which include encouraging “collective action that takes on community challenges” and looking for evidence of “authentic, meaningful experiences with diversity” when admissions decisions are made. New York Times columnist Frank Bruni praised the report, which he claims “nails the way in which society in general—and children in particular—are badly served by the status quo.”
It’s a bit much, frankly. I’m not quite convinced by the sudden alarm over the “undue academic performance pressure” placed on our children, nor am I moved by the concern of these elite institutions for students’ collective well-being. Harvard alone sits atop a $35 billion endowment, a mere 5 percent of which would suffice to grant full scholarships to each of its undergraduates—with enough left over to provide $30,000 scholarships to every single student (over fifty thousand) at every other Ivy League school. Yet concern for the common good and acting in the public interest means telling high school students that they should demonstrate “meaningful, sustained community service?”
After you, Harvard.
Let me not be unkind. The report raises serious and legitimate issues. Sleep deprivation, anxiety, and depression are, without question, taking a toll on a subset (albeit a far smaller one than commonly imagined) of America’s high school students striving to earn a place at top schools. Nothing good will come from students overloading on AP courses and extracurriculars they’re not interested in merely to impress the admissions office at Brown. The authors of Turning the Tide may even be correct to worry that too many teenagers show more interest in “personal success rather than concern for the common good.”
But here again, messages matter less than deeds: If you’re serious about ending the academic and extracurricular arms race, why not scrap the current admissions process altogether in favor of an admissions lottery? List clear qualifications for admission—a threshold SAT or ACT score, GPA, or number of hours of community service, for example—set aside the clearly unqualified, and choose a freshman class randomly from among those who remain. Without question, elite colleges could fill their freshman classes many times over with qualified high school seniors who have earned a fair shot at a seat. With the exception of a comparatively small percentage of superior or substandard applicants at the margins, an acceptance is already a lottery masquerading as a meritocracy. “A lottery system would relieve the pressure on students. Instead of being the ‘best,’ they would only have to be ‘good enough’—and lucky,” notes Swarthmore College psychology professor Barry Schwartz. “It would free students up to do the things they were really passionate about in high school,” which is the outcome Turning the Tide purports to favor. Elite schools could still make whatever allowances they deem just and fair for students from underserved populations (NB: The rich, legacies, and athletes are not underserved populations). But there would be no point to taking extra APs or a spring break trip to rebuild houses in Haiti unless that was where your intellectual hunger or zest for service led you, since none of it would make you more likely to win admission.
Top colleges can, as the report says, “warn students and parents applications that are ‘overcoached’ can jeopardize admission outcomes” and “discourage students from taking admissions tests more than twice.” (Turning the Tide is sloppy with warnings, conveyings, and discouragements.) But it’s arrogance to think that the college admissions process can alter human nature or tap the brakes on ambition. Nor does it offer much comfort when elite schools are so eager to become authorities on what comprises “authentic and meaningful community service.” It all leaves the nagging sense that our best colleges are suddenly less interested in encouraging academic excellence among high school students—and a little more interested in doling out compassion points to applicants who might otherwise be left on the outside looking in.
If that’s not what they really have in mind, a lottery would be far better—and more fair—for high schoolers than any of the report’s recommendations. Indeed, one needn’t be a cynic to read Turning the Tide as an attempt by elite colleges to win a PR victory for expressing concern about a system they created, maintain, and benefit handsomely from.
If you ask a thoughtful question, you may be pleased to receive a smart and germane answer. If you post that question in your widely read newspaper column on education, you’ll sometimes be greeted with such a torrent of spontaneous engagement that you have to write a second column. That’s what happened to the Washington Post’s Jay Matthews, who asked his readers in December to email him their impressions of Common Core and its innovations for math: Was it baffling them, or their kids, when they sat down to tackle an assignment together? He revealed some of the responses last week, and the thrust was definitively in support of the new standards. “My first reaction to a Common Core worksheet was repulsion,” one mother wrote of her first grader’s homework. “I set that aside and learned how to do what [my son] was doing. And something magical happened: I started doing math better in my head.” The testimonials are an illuminating contribution to what has become a sticky subject over the last few months. Common Core advocates would be well advised to let parents know that their kids’ wonky-looking problem sets can be conquered after all.
Homework isn’t the only area where the Common Core fever looks to be breaking. Early reports from Louisiana indicate that the recent animus against standards-aligned testing is unlikely to be repeated in 2016. Rebranded with a spiffy and unpoliticized new handle (“No federally mandated assessments here, folks! It’s just our old friend LEAP 2016!”), the exams are reportedly whipping up less opt-out energy than last year—when, you may recall, then-despised-Governor Bobby Jindal tried to use the backlash to spice up his insipid gumbo of presidential aspiration. What’s more, California has announced that just twenty thousand of its students sat out last year’s Smarter Balanced tests. Even that colossal figure actually accounts for just .61 percent of the state’s eligible pupils, a far lower rate than in opt-out capitals like New York and New Jersey. Dare we hope that recalcitrant parents and teachers might be more willing to give the assessments a chance this time around?
You may have read about the utility of small schools—ace commentators like Peter Meyer swear by them as a revolutionary alternative to district-run urban behemoths. But Education Weekrecently published a fascinating profile of a growing subset of private academies that might make a colonial schoolhouse look huge by comparison. Dubbed “microschools,” the start-up classrooms enroll student bodies as small as six and seem to strike a balance between homeschool co-ops and blended learning classrooms. Networks like Altschool—based in San Francisco and run by a Google alumnus—are opening branches in Austin and New Orleans with the help of local nonprofits. With the built-in savings that come with sparse facilities, tiny staffs, and free online learning programs, it’s easy to see how the trend has become a compelling business model; on the flip side, it’s only natural to wonder what parents actually get for the hefty tuition they’ve paid, which can run into the tens of thousands of dollars. The movement’s most vocal advocates, who seem to have emerged mostly from the tech community, are long on talk of disruptive innovation and short on details about curriculum and accountability. Without a little more consideration of those factors, it’ll be hard to proclaim microschools the next big thing.
Full disclosure: I worked briefly (and happily) for Ed Boland, the author of The Battle for Room 314, after leaving my South Bronx classroom. He is a longtime senior executive with Prep for Prep, a heralded nonprofit that seeks out talented students of color in New York City’s public school system, grooms them for placement in elite private schools, and shepherds them into the best colleges in the nation. It’s the closest thing in education to finding a life-changing golden ticket in a Wonka bar.
Beset by a “nagging feeling that the program, as worthy as it was, just wasn’t reaching enough kids or the ones who needed the most help,” Boland starts to wonder if he’d missed his true calling. Raised in a Catholic family of teachers and do-gooders, he sets his mind (and resets his household budget) on becoming a New York City public school teacher. First he works nights and weekends to get his teaching degree. Then he quits his job hobnobbing with the city’s elite and trades his “comfy bourgeois life,” for a job teaching ninth-grade history at “Union Street School.”
To say it didn’t go well would be an understatement. Chantay climbs on her desk and taunts Boland with a crude sexual gesture and unprintable (here, not in the book) language. Gang member Kameron is an “unalloyed sociopath”; Boland admits he was “genuinely afraid of him from the minute [he] set eyes on him.” Jesus is a “perfect shit” who “executed his role as tormentor of adults seriously, almost professionally.” Readers conditioned to expect affirming tales of the bloodied but unbowed teacher be warned: No fourth-act scene of hope and redemption comes to Room 314. An administrator cuts short her formal observation and gives Boland a withering dressing-down about his terrible classroom management. “I used to teach juvenile delinquents in Vermont who had huffed half their brains out on glue,” she seethes. “They acted better than this.” Boland soldiers on, but his despair is palpable. When he is offered his old job back, he briefly balks before accepting that everyone would be better off—him, his family, and his students—if he just admits defeat and goes back to his comfort zone.
The book has already provoked angry reactions. After an excerpt appeared in the New York Post, one veteran teacher wrote to criticize Boland as having “no real classroom management strategies at his disposal” (true) and deemed the book “an obvious money grab” (untrue and unkind). To be clear, The Battle for Room 314 is best read as a memoir—not a “teacher” book, and certainly not a policy prescription. While Boland is clear-eyed and candid about his failure, the one false note is a list of “pressing priorities” at the end of the book (integrate schools, rethink funding, and “end poverty, the root of educational failure,” etc.), which feels like it was forced upon him by an editor hungry for a hopeful takeaway. Such prescriptions might bring sympathetic nods from millionaires at Boland’s next Waldorf fundraiser, but they are unlikely to change conditions on the ground for the students at schools like “Union Street” anytime soon.
One of the heartbreaks in the book is a student named Byron, who is clearly out of place in Room 314. Talented enough to be waitlisted (with Boland’s help) at Ivy League schools, but stymied by his undocumented immigration status, he doesn’t go to college at all. As of 2012, Boland reports, Byron is living in Florida, having done “very little of anything except go to the public library and help his aunt sell meat pies from time to time.” In schools where chaos and disorder reigns, those who suffer the cruelest fate are invariably quiet, studious, and largely ignored students like Byron. Attending to their untapped potential invariably seems less urgent than reining in the misbehaviors of the Chantays and Kamerons. My one disappointment with Boland’s well-written and unflinchingly honest book is that he didn’t honor his initial impulse in his recommendations: Worthy programs like Prep for Prep don’t reach enough kids. For every kid they pluck from New York’s public school system, there are a dozen Byrons—maybe a hundred Byrons—who wither on the vine. Here is my challenge for Boland and his colleagues: Take what you’ve learned in decades of preparing kids for New York’s most rigorous academic challenges and use it to help thousands more kids. It might be the difference between attending a first-rate college and selling meat pies in Florida.
If you want to take a lesson from The Battle for Room 314, it should be alarm at how poorly we prepare teachers for the reality of inner-city teaching—and the maddening futility of expecting any teacher to “meet the needs of all learners,” from Kameron to Byron, amid such rampant dysfunction. Careful readers will note that Boland was not a New York City Teaching Fellow, Teach For America corps member, or other “alt cert” instructor. Those who are eager to criticize those programs will have to explain why a smart, dedicated, traditionally trained teacher like Boland flamed out so badly, and what we can learn from his experience.
A new study out by Tom Dee and his colleagues follows on the heels of a prior evaluation of District of Columbia Public Schools' (DCPS) IMPACT teacher evaluation system, which found largely positive outcomes for the system. This time around, they examined the effects of teacher turnover on student achievement. The new focus is presumably prompted by IMPACT, a multifaceted evaluation system that measures student growth, classroom practice (via observations), and teacher professionalism. Teachers receive scores that range from “ineffective” to “highly effective”; the former are “separated” from the district, while the latter are eligible for one-time bonuses of up to $25,000 and a permanent increase in base pay of up to $27,000 per year.
This evaluation, using data from 2009–10 to 2012–13, covers 103 schools between grades four and eight. It examines achievement at the school level, and then the grade level, for particular years. Analysts examine whether teacher effectiveness and achievement are higher or lower as a result of teachers exiting and entering the system.
The evaluation is a well-designed, quasi-experimental study, so it’s not causal in nature. But like any good analysts, the authors subject their data to a number of checks for “robustness” to rule out the possibility, for instance, that systematic sorting of students occurred in response to the turnover. (The best evidence says that it didn’t.)
The bottom line is that teacher turnover in D.C. was found to have an overall positive effect on math achievement, to the tune of 0.08 standard deviations (SD); the effect on reading was positive (0.05 SD), but the latter is barely statistically significant. This overall effect masks important differences, however. When low-performers leave, for example, achievement grows by 21 percent of a SD in math (which equates to something between one-third to two-thirds of a year of learning, depending on grade level) and 14 percent of a SD in reading.
With respect to turnover among low-performing teachers, it’s interesting to note that more than 90 percent occurs in high-poverty schools. But the exit of these instructors consistently produces large improvements in teaching quality and student achievement in math, as well as smaller improvements in reading. “In almost every year, DCPS has been able to replace low-performing teachers with high-performing teachers who have been able to improve student achievement,” the analysts report. When high-performers leave, on the other hand, it does not influence teacher quality or student achievement; it appears that DCPS is able to recruit replacements who are at least as effective. So whereas other studies show generally negative effects of teacher turnover, this one doesn’t.
It turns out, unsurprisingly, that when you enact a policy intended to change the composition of the teaching workforce, and you also have access to a bunch of money to reward the high-performers, the workforce is strengthened and the students benefit. The question is, can any other place replicate these conditions?
Following in the footsteps of a previous study, CAP researchers have examined the effects of a state’s commitment to standards-based reform (as measured by clear standards, tests aligned to those standards, and whether a state sanctions low-performing schools) on low-income students’ test scores (reading and math achievement on the NAEP from 2003 to 2013). The results indicate that jurisdictions ranked highest in commitment to standards-based reform (e.g., Massachusetts, Florida, Tennessee, the District of Columbia) show stronger gains on NAEP scores for their low-income students. The same relationship seems to be present in states ranked lowest in commitment to standards-based reform: low-income students in Iowa, Kansas, Idaho, Montana, North Dakota, and South Dakota did worse.
As you can imagine, a lot of caveats go with the measure of commitment to standards-based reform. Checking the box for “implemented high standards” alone is likely to pose more questions than it answers. Beyond that, implementation, teaching, and assessment of standards are all difficult, if not impossible, to quantify. The authors acknowledge that some of their evidence is “anecdotal and impressionistic,” but they are talking about the “commitment to standards” piece. They are four-square behind NAEP scores as a touchstone of academic success or lack thereof, despite persistent questions among fellow researchers on that subject. To the extent that higher standards and all that go with them can be connected to improvements in NAEP performance—especially for low-income students, and especially over some years of past implementation—we need to pay attention. The authors take a more detailed look at Iowa—the lowest-ranking state on commitment to standards—whose NAEP gains for low-income students were similarly unremarkable. It’s not proof, but at least we have some suggestive indicators to determine whether NAEP improvements will continue into the Common Core era.
In this week's podcast, Mike Petrilli and Robert Pondiscio assess New York State’s decision to end timed tests, extract lessons from a former teacher’s troubled account of his time in the classroom, and discuss the most creative and worthwhile ideas that came out of Fordham’s ESSA Accountability Design competition. In the Research Minute, Amber Northern examines the value of recess.