Reading First: Not dead yet

5.14.2008

The interim evaluation of Reading First has all sorts of people upset for all manner of reasons. It found that, on average, the federal program's impact on student achievement was not statistically detectable; moreover, at least half of the third graders in the sample were performing below grade-level after up to three years of funding into the program, as measured by Stanford 10 norms. However, the study also found that in some "late award" sites (i.e., those that received their Reading First grants later in the federal funding process), first- and second-graders' comprehension scores had improved significantly and educators at those same sites had spent significantly more instructional time teaching the five essential elements of reading.

Not exactly a hearty endorsement of Reading First. But neither is it the downright disparagement some have claimed. The study certainly offers no compelling reason to kill the program, as some on Capitol Hill appear eager to do. Like most complex evaluations--especially of large-scale federal programs--the findings are complicated and mixed. So allow me to highlight a few evaluation-related concerns and one "non-concern" that reinforce what my Dad (who was also my softball coach) often said to his team of 8-year-olds when we trailed in the late innings: "It ain't over til the fat lady sings."

First, this study suffers from a potential contamination factor. That is, it does not appear to document, much less measure, how the practices of Reading First schools may have bled over to the comparison schools. Others, referring to both the national and state evaluations, have reported on this and I am merely joining their choir (see here, here, and here). Institute of Education Sciences chief Russ Whitehurst himself said as much at a Washington conference this week (though he studiously avoided saying anything of the sort when releasing the study and talking with journalists). In a different, state-based Reading First evaluation that I helped conduct, we found evidence that our comparison schools/districts were adopting many of the practices of the Reading First grantees, including concentrating on the five essential components of reading, hiring building-level literacy coaches, adopting 90-minute literacy blocks, and using the same core reading programs. (Not surprising since, as I understand it, the RF statute required participating RF districts to spread professional development based on scientifically-based reading research [SBRR] beyond their participating schools.) So, although contamination complicates any evaluation, this specific instance of contamination--when more schools adopt reading practices that are shown to enhance students' reading progress--is a good thing for kids.

Second, we must pay attention to the outcome assessment used to measure achievement. Reading First evaluators opted to use the Stanford 10 reading comprehension subtest, a reliable and valid measure. Most state evaluations of Reading First, however, have used the DIBELS to measure achievement. DIBELS is a series of short assessments intended to assess mastery of discrete reading skills. The federal evaluators apparently considered using DIBELS but didn't because its various sub-tests must be individually administered, and that wasn't "practical" given the sample size. I understand the data collection burden in administering DIBELS. But why did the evaluation team not choose to administer the battery at least to a sub-sample, especially given DIBELS's widespread use and the fact that Reading First has been widely viewed more as a skill builder than a comprehension enhancer? (Comprehension, it may be recalled, is but one of the five "essential elements.")

In fact, an informal examination of 24 other Reading First state evaluations found that fifteen used the DIBELS either as their sole performance measure or as one of multiple measures. Of those 24, only six used a comparison group in their design. And of those six, five reported that Reading First students outperformed their non-Reading First peers on at least one measure of the DIBELS at one or more grades.

I don't know if using DIBELS would have altered the national achievement findings. But I do know it would have been worthwhile to have added DIBELS as an additional measure or in a sub-sample of schools. Because it assesses a fuller variety of discrete reading skills, rather than focusing only on comprehension, DIBELS would have given us an additional window into achievement.

Now here's a "non-concern": Some seek to justify the null evaluation findings by decrying an implementation breakdown in the program itself. But Reading First is perhaps the best-implemented education program in federal history. Study after study (both national and state, such as those found here, here, and here) echo the same message: the program has been implemented with a high degree of fidelity to its statutory purpose. Yes, teachers spent more time focusing on the five essential elements of reading; yes, they used a textbook aligned to SBRR; yes, they received professional development based on the same. And this message comes not only from teacher and principal self-reporting, but also from classroom observational data. Questioning implementation is a red herring. (The danger, of course, in saying this is that program critics, lacking an implementation culprit, may try instead to discredit the National Reading Panel research upon which the Reading First program was based. I'd respond that a two-year evaluation of a program with a bleed-over issue ought not trump a body of research that comprises the most rigorous experimental and quasi-experimental studies conducted on reading to date.)

Finally, let's be reminded that we're discussing an interim report based on two school-years of data. The next study will include another year of achievement data on reading comprehension as well as an assessment of decoding skills in first grade. We are only too aware that policymakers don't always wait for final evaluation reports before cutting (or, for that matter, increasing) programs. But given the substantial money, time, importance, interest, and promise (shall I go on?) surrounding Reading First, don't we owe it to students and teachers to withhold the final verdict until all the data are in?