University of Oregon

Questions & Answers

The National Center on Assessment and Accountability for Special Education has been funded to conduct research that provides evidence about (1) the natural developmental progress in achievement of students with disabilities and (2) the technical properties of alternative accountability models where student academic growth is used to describe and evaluate school effectiveness.

Why is this research important?
Students’ summative test scores in reading and mathematics in our current accountability system are key educational outputs used to evaluate the effectiveness of schools and teachers, and, in some cases, individual students’ learning. The centerpiece of current federal accountability legislation, the No Child Left Behind Act (NCLB, 2001), requires reporting of school-level outcomes as well as the disaggregation of achievement test scores for subgroups who have historically performed poorly relative to other students. States vary considerably in their assessment instruments, testing and reporting mechanisms, definition of grade-level proficiency, and means for schools to demonstrate progress toward universal proficiency by 2014 (Heck, 2006; Linn, 2008). We know relatively little about the implications of states’ assessment choices in terms of the reliability of building-level scores and the validity of the inferences and decisions about schools made on the basis of these scores (Heck, 2006; Linn & Haug, 2002; Zvoch & Stevens, 2005). Unfortunately, we know even less about the reliability and validity of scores for targeted subgroups reported at the school level (Kiplinger, 2008).

The lack of knowledge about impact of particular assessment and accountability choices on the reliability and validity of disaggregated test scores at the school or higher levels (e.g., district, state), is particularly acute for students with disabilities (SWD), one of the targeted subgroups in NCLB. First, schools' poor performance with this group of students has been a concern for decades (Carlberg & Kavale, 1980; Schulte, Osborne, & Erchul, 1998) and remains so today. With many states reporting that over 70% of students with disabilities are below expectations in reading and mathematics on annual statewide achievement tests, there is a critical need to provide accurate information to schools about whether their practices with this subgroup are effective. Second, in a recent 3-state study of schools that failed to make their adequate yearly progress (AYP) targets, Eckes and Swando (2009) found that the most frequent reason for schools' failure was the performance of this subgroup. Thus, unreliable measurement of schools' progress with this subgroup not only threatens the validity of inferences about schools' performance with SWDs, but also the inferences about schools' performance as a whole. The need for our National Center on Assessment and Accountability for Special Education is evidenced by the paucity of longitudinal research on achievement growth for individual students with disabilities who take general achievement tests. In addition, we simply don’t yet know how well current (status-based) and proposed (growth-based) school-level accountability models represent student achievement outcomes, particularly for students with disabilities.

What are the research questions we are trying to answer?
To accomplish the Center’s goals and address the problem of measuring achievement growth for educational accountability purposes, we propose conducting a tightly-linked series of growth modeling studies that address 6 key questions:

  1. What is the natural developmental progress in achievement for students with disabilities?
  2. What models best characterize achievement growth for students with disabilities who are participating in general achievement tests, as well as those taking alternate assessments?
  3. How do various growth models represent school effects for students with and without disabilities, and how do results compare to those derived from status models now in use?
  4. What are the reliability and validity of the estimates of school effectiveness for students with disabilities produced by alternative growth models, and how are these estimates influenced by contextual differences among schools and students?
  5. How do results from different types of interim assessments of students’ achievement meaningfully contribute to a model of academic growth for students with disabilities?
  6. How can information about opportunity to learn and achievement growth be used to enhance academic outcomes for students with disabilities?
To answer these questions with scientifically-sound evidence requires the use of longitudinal designs, an understanding of measurement limitations, and a command of an array of statistical analyses and comparison techniques (Barton, 2005; Gong, Perie, & Dunn, 2006; Linn & Haug, 2002; Raudenbush, 2004; Singer & Willett, 2003; Stevens, 2005). It also requires access to large and representative datasets of both summative and interim assessments. Further, we maintain that these large datasets should not only include students with disabilities, but students without disabilities in order to understand how students with disabilities differ from their non-disabled peers, and how well different school-level growth models represent school effects for both populations. This research strategy is in keeping with the principles of "inclusion" and "least restrictive alternative" that underlie many policy decisions regarding students with disabilities, including their participation in current standards-based reforms (McDonnell, McLaughlin & Morison, 1997).

How can we best measure and model student achievement and school effectiveness?
Many methods are currently used to measure student progress. Most common, of course, is the federally-mandated percent proficient method for determining adequate yearly progress incorporated in the No Child Left Behind Act (NCLB) of 2001. A number of educational scientists and leaders have argued for shifting our accountability metric away from achievement status to students’ achievement growth (e.g., Betebenner, 2008; Hanushek & Raymond, 2005; Schulte & Villwock, 2004; Teddlie & Reynolds, 2000; Willms, 1992). The argument for using achievement growth rather than status as the basis for accountability is based on the dual premise that (a) schools should be held accountable for achievement outcomes they can control, such as how much students learn during the school year, rather than their prior achievement, and (b) status models incentivize schools to focus on students near the threshold of proficiency rather than focusing on the achievement growth of all students, including those functioning well below these thresholds (Ladd & Lauen, 2009). The purpose of this National Center is to empirically examine alternative models of estimating student learning and school effectiveness. The models we are examining can be described briefly as:
  • NCLB Status and Improvement. Status models provide a picture of academic performance at a single point in time and have the same advantages and disadvantages of a census. Status models provide a summary of student performance and snapshot of school performance. School status performance (i.e., percent proficient) is interpreted through comparison to a performance standard or benchmark. Another variant in NCLB is the different groups improvement model, embodied in the NCLB "safe harbor" provision. In this model, different cohorts of students are compared from one year to another to determine change in percent proficient. Although status models are sometimes considered growth models, they do not track individual student progress over time, and their accuracy in reflecting school change depends on the questionable assumption that the student population has remained stable from one year to the next. As we compare school-level growth approaches in terms of their validity, the NCLB Status and Improvement model will serve as a starting point for evaluating the advantages and disadvantages conferred by various growth modeling approaches.
  • Transition Matrix. Students’ growth is tracked at the performance standard level. A transition matrix is set up with previous performance levels (e.g., Does Not Meet, Meets, Exceeds) as rows and current performance levels as columns. Each cell indicates the percent of students who moved from year to year. The diagonal cells indicate students staying at the same level, cells below the diagonal show the students moving down one or more levels, and the cells above the diagonal show the students moving to higher performance levels. Transition matrices can be combined to show the progress of students across all tested grades to show total performance for the school. This approach is included here because it allows scores from tests on different scales to be aggregated and a substantial number of students with disabilities take alternate assessments that, in many states, are on a different scale than the assessments used in general education for reporting school-level results.
  • Residual Gain and Value Added Models (ResVAM). A variety of residual models have been used in accountability systems. The simplest predicts current performance from past performance. Each student has a predicted score based on achievement from a previous occasion. The difference between predicted and actual scores in the current year is the residual score. Residual gains near zero indicate growth consistent with prediction, positive scores indicate greater than predicted growth and negative scores indicate performance lower than predicted growth. Residual gain scores can be averaged to obtain a group growth measure, but they are not easily integrated with performance standards because they focus on relative gain. Value-added models (VAM) are a variation of residual models. The best known are variations like Sanders’ Tennessee Value-Added Assessment System (TVAAS; Sanders, Saxton, & Horn, 1997), the Dallas accountability system (Webster, 2005), the Chicago School Productivity model (Bryk, Thum, Easton, & Luppescu, 1998; Ponisciak & Bryk, 2005), and the RAND model (McCaffrey, Lockwood, Koretz, Louis, & Hamilton, 2004). All of these models use prior achievement as a predictor of performance, some use multiple years of prior achievement, and some also include other conditioning predictors including measures of student background characteristics. VAMs are now being applied in a number of states to estimate and link teacher effects to student achievement results.
  • Multilevel Growth Models (MGM). Longitudinal designs entail tracking an individual over time and measuring performance at two or more time points (see Rogosa, 1995; Willett & Sayer, 1994; Willett, Singer, & Martin, 1998). There are at least two major advantages to these designs: (a) each student acts as his or her own control (see Stevens, 2005), and (b) the focus is on the outcome of interest, student learning. MGMs fit growth trajectories to each student's data using a two or three level structure. The first level of each model is used to represent measurement occasions and estimate a growth trajectory for each student. The second level in the structure represents student characteristics, and the third level represents school context, characteristics, and programs. An important feature of MGMs is that variation in performance can be separated out into the student and school levels. These models allow the estimation of an intercept and a slope for each individual and each school. Time in school (an indication of opportunity to learn and of the amount of exposure to school practice and policy) and other variables can be explicitly included as predictors of individual students’ growth. Application of the MGM will produce individual student growth trajectories, as well as school average growth trajectories.