University of Oregon

Glossary

This is a list of frequently used words on this site and in the field. Although there may be different definitions for many of the words below, the definitions we provide align with our philosophy and reflect our perspective.


Alignment of Items to Standards
In a standards-based education system, the association of test items with the content standards, instruction, policy, and curriculum. (Webb, 2006).
Content Standard
A statement of a broad goal describing expectations for students in a subject matter at a particular grade or at the completion of a level of schooling. (Standards for Educational and Psychological Testing, 1999).
Curriculum-Based Measurement (CBM)
An assessment system with technically adequate properties (i.e., reliability and validity) designed to measure students' academic status and growth so the effectiveness of instruction may be evaluated. The most common application of CBM requires that a student's performance in each curriculum area be measured on a single global task repeatedly across time, wherein growth is described by an increasing score on a standard, or constant, task. (Deno, 1987)
Domain
A specific universe of performance (e.g., reading or reading comprehension; mathematics or numbers and operations).
Domain Sampling
The process of selecting test items to represent a specified universe of performance. (Standards for Educational and Psychological Testing, 1999.)
easyCBM
An online benchmarking and progress monitoring assessment system, developed using Item Response Theory (IRT) to represent the psychometric properties expected of modern technically-adequate assessments. easyCBM includes multiple different reading and mathematics assessment forms, grades K-8, reported to be of equivalent difficulty within each grade. Many schools, teachers, and administrators use this system in their Response to Intervention (RTI) programs. (Alonzo, Tindal, Ulmer, & Glasgow, 2006)
Equating
Putting two or more essentially parallel tests on a common scale. (Standards for Educational and Psychological Testing, 1999)
Evaluation
The process of characterizing and appraising some quality(s) of a person, program, or system.
Formative Assessment
Frequent measures to identify students’ current knowledge/skills, including peer and student self-assessments embedded within learning activities. The process requires a plan for reaching desired goals, provides examples of learning goals (e.g., grading criteria or scoring rubrics), and encourages students to self-monitor progress toward their learning goals. These goals represent valuable educational outcomes with applicability beyond the learning context, and promote metacognition and reflection by students on their work. Teachers' feedback is non-evaluative, specific, related to the learning goals, and provides opportunity for students to revise and improve work products. (Andrade & Cizek, 2009)
Growth
Individual student trajectory over time.
Growth Modeling
Models that track the performance of the same students, or the same cohort, across multiple measurement occasions, to allow comparisons between the change for a student over time and an average change. (Goldschmidt, Roschewski, Choi, Auty, Hebbler et al., 2005)
Improvement
An increase in knowledge or skill as compared to a previous state.
Interim Assessments
Technically sound, brief measurements of achievement in reading and mathematics that can be used multiple times within a school year, as well as across school years for cohorts of students in grades 1-8. These measures (a) are linked to a developmental scale, (b) have wide use in schools, particularly with special education students, and can include multiple administrations per year, and (c) offer a window on early achievement growth that is not possible with large-scale summative assessments because interim assessments are frequently used prior to grade 3. As noted in Andrade and Cizek (2010), these measures are distinguished from formative assessments by greater standardization of administration, technical adequacy in terms of reliability and validity, and the use of carefully constructed score scales including equated alternate forms and vertical scaling of forms. These assessments provide teachers with important instructional, evaluative, and predictive information (see Perie, Marion, & Gong, 2009). As a consequence, their inclusion in our research allows modeling within-year, as well as between-year, growth, and allows examination of important policy questions such as whether summer losses and gains affect student or school results. (Downey, von Hippel, & Hughes, 2008; Zvoch & Stevens, in press)
Mastery Measurement
A student's successive mastery of a hierarchy of objectives. (National Center on Response to Intervention, 2011)
Measure
The instrument used to quantify objects, events, or observed behaviors based on rules; much like a ruler is a measure of length, an assessment can be a measure of an academic construct (e.g., reading or math).
Measurement
The process of assigning numbers to characteristics (i.e., objects, events, or observed behaviors) according to a defined rule. (Hinkle, Wiersma, & Jurs, 2003)
Measures of Academic Progress (MAP)
Developed by Northwest Evaluation Association, Measures of Academic Progress (MAP) are computer adaptive tests in reading, mathematics, and science that use Item Response Theory (IRT) to represent the psychometric properties expected of modern technically-adequate assessments. (Computer adaptive tests automatically provide a more challenging item after a correct item response and a simpler item after an incorrect item response to narrow in on a student's learning level.) MAP tests are aligned to national and state curricula and standards and provide normative growth, both within and across years. (www.nwea.org)
Model
Mathematical equation(s) that represents the statistical relations between a set of variables.
Multilevel Growth Models (MGM)
Longitudinal designs entail tracking an individual over time and measuring performance at two or more time points (see Rogosa, 1995; Willett & Sayer, 1994; Willett, Singer, & Martin, 1998). There are at least two major advantages to these designs: (a) each student acts as his or her own control (see Stevens, 2005), and (b) the focus is on the outcome of interest, student learning. MGMs fit growth trajectories to each student's data using a two or three level structure. The first level of each model is used to represent measurement occasions and estimate a growth trajectory for each student. The second level in the structure represents student characteristics, and the third level represents school context, characteristics, and programs. An important feature of MGMs is that variation in performance can be separated out into the student and school levels. These models allow the estimation of an intercept and a slope for each individual and each school. Time in school (an indication of opportunity to learn and of the amount of exposure to school practice and policy) and other variables can be explicitly included as predictors of individual students’ growth. Application of the MGM will produce individual student growth trajectories, as well as school average growth trajectories.
Performance Standard
(1) An objective definition of a certain level of performance in some domain in terms of a cut score or a range of scores on the score scale of a test measuring proficiency in that domain. (2) A statement or description of a set of operational tasks exemplifying a level of performance associated with a more general content standard; the statement may be used to guide judgments about the location of a cut score on a score scale. The term often implies a desired level of performance. (Standards for Educational and Psychological Testing, 1999)
Progress
Individual student change over time in a specific academic domain.
Progress Monitoring
Repeated measurement of academic performance to inform instruction of individual students, conducted on a specified schedule. The goal of progress monitoring is to (a) estimate rates of improvement, (b) identify students who are not demonstrating adequate progress and/or (c) compare the efficacy of different forms of instruction to design more effective, individualized instruction. (National Center on Response to Intervention, 2011)
Reliability
The degree to which test scores for a group of test takers are consistent over repeated applications of a measurement procedure and hence are inferred to be dependable and repeatable for an individual test taker; the degree to which the scores are free of errors of measurement for a given group. (Standards for Educational and Psychological Testing, 1999)
Residual Gain Model
A variety of residual models have been used in accountability systems. The simplest predicts current performance from past performance. Each student has a predicted score based on achievement from a previous occasion. The difference between predicted and actual scores in the current year is the residual score. Residual gains near zero indicate growth consistent with prediction, positive scores indicate greater than predicted growth and negative scores indicate performance lower than predicted growth. Residual gain scores can be averaged to obtain a group growth measure, but they are not easily integrated with performance standards because they focus on relative gain.
Response to Intervention (RTI)
Response to intervention integrates assessment and intervention within a multi-level prevention system to maximize student achievement and to reduce behavioral problems. With RTI, schools use data to identify students at risk for poor learning outcomes, monitor student progress, provide evidence-based interventions and adjust the intensity and nature of those interventions depending on a student's responsiveness, and identify students with learning disabilities or other disabilities. (National Center on Response to Intervention, 2010)
Scaling
The process of creating (a) a system of numbers, and their units, by which a value is reported on some dimension of measurement, or (b) a score to which raw scores are converted by numerical transformation. Scaling may enhance test score interpretation by placing scores from different tests onto a common scale or by producing scale scores designed to support criterion-referenced or norm-referenced score interpretations. (Standards for Educational and Psychological Testing, 1999)
Status Models
Status models (such as No Child Left Behind, NCLB) provide a picture of academic performance at a single point in time and have the same advantages and disadvantages of a census. Status models provide a summary of student performance and provide a snapshot of school performance. School status performance (i.e., percent proficient) is interpreted through comparison to a performance standard or benchmark. Although status models are sometimes considered growth models, they do not track individual student progress over time, and their accuracy in reflecting school change depends on the questionable assumption that the student population has remained stable from one year to the next. As we compare school-level growth approaches in terms of their validity, the NCLB Status and Improvement model will serve as a starting point for evaluating the advantages and disadvantages conferred by various growth modeling approaches.
Status Models - Different Groups Improvement Model
Achievement measures administered at the end of an instruction period (e.g., year-end) to evaluate students' performance against a defined set of content standards. (Perie, Marion, & Gong, 2007)
Transition Matrix
Students' growth is tracked at the performance standard level. A transition matrix is set up with previous performance levels (e.g., Does Not Meet, Meets, Exceeds) as rows and current performance levels as columns. Each cell indicates the percent of students that moved from year to year. The diagonal cells indicate students staying at the same level, cells below the diagonal show the students moving down one or more levels, and the cells above the diagonal show the students moving to higher performance levels. Transition matrices can be combined to show the progress of students across all tested grades to show total performance for the school. This approach is included here because it allows scores from tests on different scales to be aggregated and a substantial number of students with disabilities take alternate assessments that, in many states, are on a different scale than the assessments used in general education for reporting school-level results.
Validity
The degree to which accumulated evidence and theory support specific interpretations of test scores entailed by proposed uses of a test. (Standards for Educational and Psychological Testing, 1999)
Value-Added Model (VAM)
Value-added models (VAM) are a variation of residual models. The best known are variations like Sanders' Tennessee Value-Added Assessment System (TVAAS; Sanders, Saxton, & Horn, 1997), the Dallas accountability system (Webster, 2005), the Chicago School Productivity model (Bryk, Thum, Easton, & Luppescu, 1998; Ponisciak & Bryk, 2005), and the RAND model (McCaffrey, Lockwood, Koretz, Louis, & Hamilton, 2004). All of these models use prior achievement as a predictor of performance, some use multiple years of prior achievement, and some also include other conditioning predictors including measures of student background characteristics. VAMs are now being used and applied in a number of states to estimate and link teacher effects to student achievement results.