Let’s cut back on standardized testing

Let’s cut back on standardized testing

Monty Neill


During the past decade, the use of standardized multiple-choice tests has exploded in the United States. The National Center for Fair & Open Testing (FairTest) has calculated that at least 100 million such exams are administered in the public schools each year. The actual number could be as high as 200 million–equivalent to each child taking over 60 standardized tests while completing the kindergarten-through-12th-grade school program.(1)

The increase in testing is reflected not only in the numbers of tests given, but in the ways they are used. Exams are now required in many districts for entry to school, placement in programs, promotion from grade to grade, and high school graduation. The expansion of these “high stakes” test applications indicates that, more and more, fundamental educational decisions are being based solely or primarily on test scores.

As a result, tests are dominating both curriculum and instruction in the public schools. Proponents say that the expansion of testing enhances accountability and strengthens school systems. But FairTest, along with many other organizations, believes that the testing explosion harms education and, in the process, many individual students.

Tests: What Do They Measure and How Well Do They Measure It?

Test proponents claim these products are “objective.” The only objective thing about them, however, is the method by which they are scored. Decisions about what areas to cover, what questions to include, what terminology to use, how difficult to make the test, and how scores are interpreted and used are all determined subjectively.(2)

Perhaps the most insidious example of testing is the grandfather of them all: the IQ test. While IQ tests purportedly measure intelligence or academic potential, they primarily measure the ability to take IQ tests. They are not a valid measure of a child’s ability or intelligence.(3)

IQ test makers have operated under a number of erroneous assumptions. First, they have assumed that intelligence is a unitary phenomenon. According to current research, it is not: abilities are multiple and complex, and within each individual, some abilities are more strongly developed than others.(4) Second, test makers have assumed that intelligence is measurable and that it is distributed among people so as to fit a bell-shaped curve–an assumption based largely on statistical convenience.(5) Consequently, test makers have organized the questions in such a way that the range of obtainable scores fits this bell-shaped curve, and they have determined that for each test taker, one score represents this thing they have measured and called “intelligence.”

Many standardized tests derive from the IQ test and contain similar defects. At best, these tests can measure only a very narrow range of skills and abilities. One limiting factor is the multiple-choice format itself. On a standardized spelling test, for example, students are asked to pick the correct spelling from four or five choices; in a classroom setting, on the other hand, a teacher calls out a word and asks the students to spell it. These are different skills. A student unable to spell a particular word might be able to recognize it on the test form and receive a “correct” score. Knowledge is simply not a multiple-choice reality.

The problem is one of validity: Does a test measure what we think it is measuring? More specifically, the problem is construct validity: How well does the test measure the intellectual ability it claims to measure? In other words, does the exam correlate with “academic potential” or “competence” or “reading” or some other underlying construct?(6) The multiple-choice “spelling” test cited above actually measures spelling recognition, not spelling.

In the absence of comprehensive validation studies, test manufacturers have assumed that their products evaluate one trait or ability, when in fact they often evaluate another. “Reading” tests, for example, measure not reading but rather reading skills, and they are based on a faulty understanding of the reading process.(7) Similarly, a test that presumably measures school achievement may really be testing another construct, such as verbal ability.(8) If a test is not valid for the specific purpose for which it is used, then any decisions based on its scores could be misleading and dangerous.

Tests should also be reliable. Parents, teachers, and administrators rely on a score to be an accurate index and not a statistical accident. However, every test has some “measurement error”; that is, a certain percentage of scores are incorrect. The higher the reliability, the better the test–but because even the most reliable test is wrong some of the time, decisions made solely on test scores will likewise be wrong some of the time.(9)

Reliability and validity are weakest on exams designed for young children, such as the Gesell Readiness tests. Independent studies have found that half the children placed on the basis of these test results are misplaced. Most often, children who are deemed “unready” are assigned to a special program or a transitional grade. Typically, those found to be misplaced are boys, predominantly from low-income or minority families.(10)


Question: What is the thing to do when you cut your finger?

2-point response: Put a Band-Aid

on it….

1-point response: Go to the

doctor (or hospital)…Get it stitched


0-point response: Such on it….

Don’t panic….Let it bleed.(11)

A Baltimore, Maryland, sociologist discovered that minority children perform poorly on this item from the Wechsler Intelligence Test (WISC-R), the intelligence test most widely used in the United States. After asking inner-city Baltimore youths why they answered the question the way they did, she found that their responses were related to their experiences. Many of the students thought that “cut” meant a big wound requiring medical attention.(12)

This is a typical example of a biased question on a standardized test. The subject is open to interpretation, and the “right” answer simply does not match the cultural background of all test takers. Unfortunately, cultural interpretations are not restricted to one or two test questions but rather pervade the entirely of many standardized tests.

All major intelligence tests used in this country are based on the experiences of white, middle-class youngsters. In addition, a few “wrong” answers can dramatically lower a child’s “intelligence” score. Consequently, the capabilities of children from minority or low-income groups are being seriously underestimated.(13) Test bias keeps numerous low-income and minority-group children out of “gifted” and “talented” educational programs. Moreover, it fills classes for the “educable mentally retarded” with two to three times more lower-income and minority children than middle-class white children.(14) Althouh a federal court has banned the use of intelligence tests for placing black children in California classrooms,(15) elsewhere these tests remain in use.

Test bias is a double-edged issue: one group is harmed, and another group is given a definite advantage. Girls, for instance, receive fewer scholarships than boys because they do not score as well on the SAT (Scholastic Aptitude Test) and the ACT (American College Testing assessment). Girls are penalized

and boys are given an advantage, in spite of the fact that young women earn higher grades in both high school and college.(16)

Bias enters a test in many ways and is not easily removed. Every test assumes a language, a culture, a set of experiences, and a model of ability or knowledge, and therefore cannot help but favor test takers who have had these particular life exposures. Test makers insist that they examine test items and remove those that contain offensive wording or content. In addition, test publishers perform statistical analyses to locate items on which sub-populations score poorly.(17) However, these measures inadequately address the question of bias because, first, bias is not always obvious, and second, it tends not to be confined to a few test items.

The elimination of test bias requires that each test as a whole must be examined and compared with other measures of performance or ability, such as grades or products that have themselves been scrutinized for biases. In reality, no test can be bias-free because no test can be culture-free. Any assessment tool must be used with caution!

Test bias both reflects and perpetuates social injustice. Tests reflect inequity by insisting on a narrow and limited view of ability or achievement, and they reinforce inequity by assigning low scorers to programs that are less likely to help them develop their abilities. To solve the problem, we must do more than eliminate the misuse of standardized testing; we must eliminate the social biases that condemn some children to both an inferior quality of life and a poor education.

Test Use and Impact

The most obvious examples of test misuse in elementary schools are retention-in-grade and tracking. The overemphasis on standardized testing also undermines important educational objectives–such as the teaching of higher order thinking skills and the improvement of school accountability–as well as important nonacademic objectives.

Retention. In some schools, children are kept out of kindergarten or first grade or are placed in special pre-first-grade classes on the basis of a single test score. Most of the tests used for such placements are sorely lacking in both reliability and validity.(18)

Current findings indicate that by the end of third grade, most children who were previously retained perform no better than those who were not.(19) In later years, children who were held back are more likely to drop out of school.(20) Using tests to retain children does not help them achieve academically, but it does increase the likelihood that they will eventually drop out or otherwise fail in school.

Tracking. A system based largely on test scores, tracking is designed to protect “slower” children from being overwhelmed and to help “advanced” children by moving them at a faster, more challenging pace. The research now shows that tracking damages lower-ranked students and does not help advanced students, who do just as well in mixed-ability groups.(21)

The harm cuased by tracking can be attributed to several factors. In lower-track classrooms, teachers often assume that the students are unable to do more complex work, textbooks and reading materials are “dumbed-down,” and students remain unexposed to the help and stimulation available from more advanced learners–an interaction that also helps the more advanced students. In addition, once lower-track children are labeled “dummies,” they are more apt to resist schooling, become “discipline problems,” or drop out of school altogether.

Damage to the curriculum. Standardized tests have both shaped the curricula of the schools and influenced the teaching methods of educators in harmful ways. Several major reports issued during the past year conclude that American students are not developing “higher order thinking skills.”(22) Research has also shown that the methods used to teach basic skills and to raise scores on standardized exams–such methods as drill, memorization, rote learning, and repetition–are counterproductive to learning higher order skills.(23) It seems clear that in preparing students to score well on mandated tests, teachers are diverting educational time and energy away from “higher order” agendas.

More generally, as the curricula have become test driven, the courses of study have been narrowed. Basal readers, used in most school systems, often contain material of no interest to students and language unrelated to either real life or good literature.(24) Although high test scorers are given the opportunity to read additional material, low scorers are given more of the dry basics.(25) This test-driven process pushes students away from real learning.

The point is not whether children need basic skills or whether there is a role for memorization or repetition. They do and there is. But these methods are not the essence of education. For too many of today’s students, especially those from low-income and minority families, schooling has been reduced to test-coaching.

Accountability. Testing is also said to improve the ability to assess the performance of teachers and schools. As testing spreads and increasingly defines the content of the curriculum, however, decision-making power is removed from the educational system (as well as from parents, students, and local government) and put in the hands of an unregulated testing industry. The late Oscar Buros, founder of the authoritative Mental Measurement Yearbooks, lamented, “It is practically impossible for a competent test technician or test consumer to make a thorough appraisal of the construction, validation and use of standardized tests…because of the limited amount of trustworthy information supplied by the test publishers.”(26) Can important decisions about the lives of our children remain in the hands of an unregulated, unaccountable industry that produces limited and often invalid instruments?

Testing Reform

The overuse and misuse of standardized testing endanger the educational health of our nation. Tests are a false solution to genuine educational problems. Nevertheless, since tests continue to be administered, FairTest has developed a reform agenda based on the following principles: * Tests must be fair and as unbiased as

possible; no high-stakes decision

should be based solely or primarily

on a test score. * Tests must be open: parents, teachers,

and others have a right to know how

tests are constructed, validated, and

used, and to challenge bias or

incorrect items. * Tests must be relevant: their use must

be confined to situations in which

they are directly helpful to both

educators and students. * If testing is administered for policy

information, random sampling (a

matrix survey sample) is the most

that should be done.

To develop a strong educational system for all, the testing avalanche must be rolled back. More useful and appropriate assessments must be devised and implemented to take its place.(27) For now, it is the task of parents, educators, and concerned citizens to make certain that testing is not used to harm students or to dictate the shape of education.


(1)Noe J. Medina and D. Monty Neill, Fallout from the Testing Explosion (Cambridge, MA: FairTest, 1988). A report that details much of the information contained in this article. (2)Banesh Hoffman, The Tyranny of Testing (New York: Crowell-Collier, 1962), pp. 60-61. (3)For discussions on IQ testing, see Stephen Jay Gould, The Mismeasure of Man (New York: Norton, 1981); Les Levidow, “‘Ability’ Labeling as Racism,” in Dawn Gill and Les Levidow, eds., Anti-Racist Science Teaching (London: Free Association Books, 1987): and N.J. Block and Gerald Dworkin, eds; The IQ Controversy (New York: Random House, 1981). (4)Ibid. (Gould); and Howard Gardner, Frames of Mind: The Theory of Multiple Intelligences (New York: Basic Books, 1985). (5)Charlotte Ryan, The Testing Maze (Chicago: National PTA, 1979), p. 8. (6)D. Monty Neill and Noe J. Medina, “Standardized Testing: Harmful to Educational Health,” Phi Delta Kappan (May 1989); George Madaus and Diana Pullin, “Questions to Ask When Evaluating a High-Stakes Testing Program,” NCAS Backgrounder (June 1987); and Anne Anastasi, Psychological Testing, 6th ed. (New York: Macmillan, 1988), ch. 6. (7)Deborah Meier, “Why Reading Tests Don’t Test Reading,” Dissent (Winter 1982-1983): 457-466; Sheila Valencia and P. David Pearson, “Reading Assessment: Time for a Change,” The Reading Teacher (April 1987): 726-732; and Kenneth S. Goodman et al., Report Card on the Basal Readers (Katonah, NY: Richard C. Owen, 1988), a report prepared for the National Council of Teachers of English. (8)Peter W. Airasian, “Symbolic Validation: The Case of State-Mandated, High-Stakes Testing,” Educational Evaluation and Policy Analysis (Winter 1988): 309. (9)See Note 6 (Anastasi, ch. 5). (10)Lorrie A. Shepard and Mary Lee Smith, “Flunking Kindergarten: Escalating Curriculum Leaves Many Behind,” American Educator (Summer 1988): 36; and “NAEYC Position Statement on Developmentally Appropriate Practice in the Primary Grades, Serving 5- through 8-Year-Olds,” Young Children (Jan 1988): 64-84. (11)David Wechsler, The Manual for the Wechsler Intelligence Scales for Children–Revised (New York: Psychological Corporation, 1974), p. 176. (12)John Butler, “Looking Backward: Intelligence and Testing in the Year 2000,” National Elementary Principal (March/April 1975): 67-75. (13)See Mary R. Hoover, Robert L. Politzer, and Orlando Taylor, “Bias in Reading Tests for Black Language Speakers: A Sociolinguistic Perspective,” Negro Educational Review (April, July 1987): 81-98; Shirley Brice Heath, Ways with Words: Language, Life, and Work in Communities and Classrooms (New York: Cambridge University Press, 1983): and Terry Meier, “The Case against Standardized Achievement Tests,” Rethinking Schools 3, no. 2 (1989): 12. (14)Jeremy D. Finn, “Patterns in Special Education Placement as Revealed by the OER Surveys,” in Placing Children in Special Education, Kitteler, N. Holtzman, and S. Messick, eds. (Washington, DC: National Academy Press, 1982). (15)Harold Dent et al., “Court Bans Use of I.Q. Tests for Blacks for Any Purpose in California State Schools,” Negro Educational Review (April-July 1987): 190-199. (16)”Civil Rights, Feminist Leaders Challenge National Merit Formula,” Fair Test Examiner (Summer 1988): 3. (17)Janice D. Scheuneman, “A Posteriori Analyses of Biased Items,” in Ronald A. Berk, ed., Handbook of Methods for Detecting Test Bias (Baltimore, MD: Johns Hopkins University Press, 1982); and Lorrie A. Shepard, “Identifying Bias in Test Items,” in B.F. Green, New Directions for Testing and Measurement: Issues in Testing–Coaching, Disclosure, and Ethnic Bias (San Francisco: Jossey-Bass, 1981). (18)See Note 6. (19)Mary Lee Smith and Lorrie Shepard, “What Doesn’t Work: Explaining Policies of Retention in the Early Grades,” Phi Delta Kappan (Oct 1987): 129-134. (20)Evaluation Update on the Effect of the Promotional Policy Program, Office of Educational Assessment, New York City Board of Education (12 Nov 1986). (21)Jeannie Oakes, Keeping Track: How Schools Structure Inequality (New Haven, CT: Yale University Press, 1985). (22)Arthur N. Applebee et al., Crossroads in American Education: A Summary of Findings (Princeton, NJ: National Assessment of Educational Progress/ETS, 1989), as well as numerous other reports. (23)Mary.C. McClellan, “Testing and Reform,” Phi Delta Kappan (June 1988): 769. (24)Harriet Tyson-Bernstein, A Conspiracy of Good Intentions: America’s Testbook Fiasco (Washington, DC: Council for Basic Education, 1988); and Note 7 (Goodman). (25)Arthur N. Applebee et al., Who Reads Best? (Princeton, NJ: National Assessment of Educational Progress, 1988). (26)Oscar K. Buros, “Fifty Years in Testing: Some Reminiscences, Criticisms and Suggestions,” Educational Reseacher (July-Aug 1977): 14. (27)The development of appropriate and authentic evaluation models and practices are reported in the FairTest Examiner; see also numerous articles in Educational Leadership’s special issue “Redirecting Assessment” (April 1989).

Monty Neill, EdD, is associate director of FairTest. A former daycare, high school, and college teacher, he has worked in public education, alternative education, and educational reform and currently lives in the Boston area.

COPYRIGHT 1990 Mothering Magazine

COPYRIGHT 2004 Gale Group