Test Norming by Mark H. Daniel, Senior Scientist, Product Development
Mark H. Daniel, Senior Scientist, Product Development
After a test has been administered to a nationally representative sample (see "The Making of a Test: Sampling" in the Winter, 1997 issue of the Assessment Information Exchange (AIE) Newsletter)*, norms can be constructed. This is a process that involves judgment as well as analysis. For example, there are various ways of grouping the data before starting the calculations, and standard scores may or may not be normalized. In most cases, the process begins with the estimation of percentiles.
How are percentile norms computed?
Even though the norm sample has been carefully selected so that the demographics of each subgroup (e.g., age or grade) match the U.S. population, sampling error still introduces irregularities into the raw-score distribution of each subgroup, and also causes unevenness in the year-to-year progression of scores. A primary objective of norm development is to smooth out the irregularities while preserving the true shapes of the within-year distributions and the age trends. In the most common norming method, the overall sample is first divided into subgroups with enough cases to produce fairly accurate estimates of percentile points. For example, groups for grade norms might correspond to the spring and fall of each grade, while those for age norms might be formed from 3-month, 6-month, or 12-month ranges. Within each subgroup, the raw scores corresponding to selected percentile points are identified. Then, for each percentile point, the across-age or across-grade progression is smoothed to remove random deviations from a regular pattern of growth.
Next, each within-subgroup distribution of scores (as adjusted by the previous step) is smoothed to eliminate lumps or gaps. These two smoothing operations-across and within subgroups-are alternated (by computer) until the results stabilize. This results in a set of percentiles that progress steadily across age or grade and are smoothly distributed within each age or grade. In the final step, percentiles for any particular point on the age- or grade-range can be read from the smoothed growth curves.
Continuous norming is a somewhat different method.
Instead of starting with a set of nonoverlapping subgroups, each representing a range of months, this approach creates a large number of overlapping subgroups centered on each individual month of the age or grade range. For example, one subgroup might be centered on age 6:4, the next on age 6:5, and so on. Each subgroup has enough cases to permit calculation of percentile points for each month, and these values can be smoothed across and within subgroups as described above.
How are standard scores computed?
Usually, raw scores at any specific age or grade are not normally distributed, even after smoothing. The distribution may be skewed (stretched out farther in one direction than another), or flatter or taller than a normal distribution. Particularly in ability tests, there often is a theoretical reason to expect the true distribution to be normal, and to assume that the non-normality of the observed distribution is an artifact. If this assumption is made, then normalized standard scores are constructed. This is accomplished by converting each percentile point into the standard score that would correspond to that percentile in a normal distribution. Because normalized standard scores are derived from percentiles, all tests using such scores show the same relationship between standard scores and percentiles. If the underlying distribution is not assumed to be normal, then standard scores generally are constructed directly from raw scores. This might be the case with a behavior-problems scale on which most individuals score in a normal range and a few have extreme scores in one direction. Here, one would likely compute linear standard scores which reflect the distance of each raw-score value from the mean in standard-deviation units. The relationship between linear standard scores and percentiles will vary from test to test.