Examinations appear to be a necessary evil
Oliver, J Steve
At the end of World War 11, the Progressive Era of US education was quickly closing. What followed was a renewed emphasis on academic standards and the re-creation of the curriculum in science and mathematics. Yet with the renewed emphasis, there were voices that recognized the need to question the role and place of testing. In 1947 and 1948, some of this questioning played out in the pages of School Science and Mathematics.
The most influential voice during this period was B. Clifford Hendricks of the University of Nebraska. He wrote a series of articles published in SSM, and along with other authors, created a discussion that is highly relevant to today’s educational environment, with its new-found affinity for accountability through testing.
The first of his papers, “Why Not Abolish Tests?” appeared-in the February 1947 issue -of SSM._ He began,
“Examinations appear to be a necessary evil.” So says one of our influential teachers of high school science. That is a statement that frequently finds expression. Is that frequency an index of its correctness? Even if correct is it a desirable characterization of tests? Especially when spoken within hearing of students? May not this classification of examinations, be in part, educating our students to “dread” tests, to “get-out-of” examinations whenever possible and to “get nervous when taking examinations”? This is suggesting that teachers, more than they are aware, encourage defeatist attitudes in their students by the teachers’ conversation about their teaching tools. (p. 114)
He continued with a discussion of alternatives to formal exams, including daily observation, oral examinations, and teacher knowledge of the student, Hendricks pointed out that “considerable divergence has been found between rank order given students by their high school teachers and that established by achievement and aptitude tests taken when the high school students enter college” (p. 114). He went on to report that 30% of college teachers had indicated in a poll that they were quite satisfied with the current testing practices. He did not elaborate about the feelings of the other 70%.
In a statement which rings true to the current discussion of testing, he wrote,
The writer sat with a committee on a college examinations recently. In the course of its session one member proposed (seriously) that a course grade should depend wholly upon the single twohour examination. It is encouraging to report that his exalted confidence in a single examination was not supported by other members of the committee. Such an overemphasis of a single examination when imposed upon students in a class can he just as unfortunate in its effects as a continuous labeling of tests as “evil” (p. 115-116).
The famous educational psychologist David Ausubel has summed up the primary means to accomplish the goals of education: “Ascertain what a child knows and teach them accordingly.” Hendricks predated this statement with a quote from “Doctor Morrison,” who said that a good teachers spends “half of the time studying his pupils as growing individuals and the rest of the time in doing what that study indicates is desirable and necessary” (p. 116).
In the end, Hendricks gave four reasons why we should not “abolish tests.”
Not because they are “evil” for their disrepute comes, much of it, by unintentional adverse gossip of pupils and teachers.
Not because classroom estimates made by a teacher make them unnecessary for such oral evaluations have been shown to be of questionable validity and of uncertain reliability.
Not because great efforts to correct their shortcomings have been fruitless for it appears there has been little well directed attention to their improvement.
Not because their service toward subject matter achievement is extremely limited. Even a superficial analysis of teaching procedures makes proper tests appear imperative to their realization (p. 116).
The second installment of Clifford Hendricks articles about testing came in the March 1947 issue of SSM. This time the title was “For What Shall We Test?” He began with an analogy. During the recent war, Hendricks had a nephew who was a navigator on a bomber. To get the airplane safety to and from its desired target, he performed frequent and necessary “tests,” including measurements of the plane’s speed, wind velocity and direction, altitude and altitude variations, temperature, change in terrain, etc. “After these measurements were made they were then used to get the answer to the questions: ‘Where are we with reference to our target?’ and ‘Should we change our original plan in order to get there more quickly and with greater certainty?'” (p. 203).
Thus Hendricks reported that his nephew relied almost wholly upon tests. He continued, “The tests, however, were used as a means to an end. They were not ends in themselves. Considered in isolation they were of little worth” (p. 203-204). To carry his analogy to its conclusion, Hendricks identified the classroom teacher as not only the navigator but also the pilot and then added the critical observation: “Too often the immediate demands of the pilot’s job eclipse the overall need for navigation” (p. 204).
Hendricks believed that the key to answering the question, “For what shall we test?” was this: “We test to get evidence of progress toward a subject’s objectives and to furnish a basis for any modification of the program for the attainment of those aims more effectively and speedily” (p. 204). Thus his goal for assessment became a guide to curriculum rather than a response to it. At the time Hendricks was writing these articles, the emphasis on behavioral objectives was gaining steam. He made reference to this movement and summed it up when he wrote, “Aims of the course must be phrased in the language of behavior so that in trying to identify the accomplishment of those aims the appropriate behavior serves as a criterion” (p. 205). For Hendricks then assessment was a means of measuring the accomplishment of goals.
It would seem reasonable to assume that each teacher expects something to change about his pupils as they work with him upon a given course. If he expects change he should have means at hand by use of which he can gauge that change. Such indicators of change whether they are diary records, class responses or pencil and paper examinations are tests serving the educational “navigator” to be more certain of reaching his “target,” and with enough carry-over to be more certain to make “after-target” activities possible and probable” (p. 206).
The third installment of Clifford Hendricks’ articles, “How Shall We Test?” was printed in the April 1947 issue. As in the previous article, Hendricks began with an analogy. This time he contrasted the work of the teacher to that of a weatherman. In this case, though, he seemed to stretch his analogy beyond its capacity by saying that, due to the weather data collections instruments available, “at last, man can do something about the weather” (p. 322).
Just as a different instrument of measurement was needed for each individual sort of weather information so a different sort of procedure has been found necessary to get reliable indications for each different sort of accomplishment in school courses. To illustrate: it has been found that a test for information is not a valid means of getting evidence of ability to infer. Likewise, the use of test scores on accurate memory of information is not an indication of ability to apply and use that information. Test marks on equations and problems are not found to be valid indicators of abilities related to laboratory work (p. 323).
Hendricks examined several dilemmas facing teachers in postwar classrooms and still confront teachers of today. As happens today, he reported that “all teachers have been accused, at one time or another, of picking the questions which the disgruntled student didn’t ‘review’ in preparation for the examination” (p. 323). One alternative Hendricks suggested were “short answer” or “new-type” tests, which “are often indirect measures of the particular aspect of the subject under appraisal. These tests are increasingly finding their place in our teaching practice” (p. 323). He went on to describe what he called “performance tests. ”
Teachers of chemistry are in very general agreement that one of the aims of laboratory teaching is an improved technique particularly in the manipulation and assembly of apparatus. This might be characterized as “learning the language of testtubes.” There is also general agreement that the surest index of skill in that aspect of the laboratory program is obtained by having the student actually perform the desired assembly or “set-up” under the eyes of the evaluating observer. The few reported uses of this plan indicate that it has desirable promise. Its most bothersome defect is that it has not as yet been made to be practicably administrable” (p. 324).
The fourth installment from Clifford Hendricks was titled “How Good Are Our Tests?” and appeared in the May 1947 issue of SSM. He began construction of his answer to this question by staking out aspects of tests. For instance, he wrote, “A good test for any one of the above purposes is not equally good for other purposes. In the end, perhaps, we wish to know how ‘good’ our students are and wish the test used to help us arrive at such an estimate” (p. 470). He reported the result of research to help construct his argument.
One plan, by direct appeal to the teachers, sought to find in just what way they expected their course in college chemistry to change their students. Another attempt was directed at what college text books in chemistry plan for the chemistry course to do to students. When these two studies had their outcomes checked against examinations actually given by college teachers of chemistry, some disturbing disagreement became apparent. While fifty-two per cent of some 200 teachers of college chemistry expressed a desire that their students acquire some ability “to formulate and test hypotheses” not a single one of 7107 examination test items used by teachers of college chemistry was an item for that purpose. Likewise, forty-five per cent of the 200 teachers said they expected their course in chemistry to improve their students’ skill and develop proper habits in the use of the library. However, no reference was found to this part of their program in the 7107 examination items inventoried (p. 470-471).
Using the data from a sample of 108 of the 7,107 test items mentioned above, Hendricks presented a table breaking down the sample items into categories. These categories resulted from calculations of their discriminating power and validity.
Seven “kinds of questions” were identified and percentages were given for each type. Twenty three percent were identified as “too easy”; 13% had “low validity”; 7%t had “no validity” (no differentiation); and, 1% had “inversion” (poor students making the better grade). Thus 44% were called “faulty items.” Of the remainder, 16% were considered usable but not good, while 40% were rated as good items. Thus, a very large portion of the score was determined by “bad” test items. He concluded by identifying ways to determine if a test is good.
First, make sure the test properly and adequately samples the skills, understandings and personality traits our teaching aims have sought to achieve. Second, critically edit each item, after student answers have revealed their validities, looking for those with no validity or low validity. “Good” items do not have such deficiencies. Third, the student answers sheets will help spot the “easy ones” which consume time without adding to the usefulness of the test. Fourth, the “good” test has items that are reliable so that the score on it is “reproducible” by any given student (p. 474).
What was good advice in 1947 is just as good over 50 years later. It was novel advice in 1947 and shouldn’t be in 1999.
The final installment of Clifford Hendricks’ series of articles on testing appeared in the June 1947 issue under the title “How To Improve Our Tests.” After another analogy, he began with a brief history lesson that, like so many, could be happening today (or at least a lot more recently than 1947).
Forty or fewer years ago teachers gave lip service to what were called concomitant values of science and mathematics. They found, however, that too many times school courses seemed to make little change in the students in line with the -concomitant values.” The psychologists pointed out that teaching directed toward such aims should share the aims with the pupils; i.e. the student should be made conscious of that purpose and that provision should be made for practice with that aim in view. Is it not in order to suggest that after the practice, use should be made of tests which are valid as indices of the achievement of such values? If so, the further proposal that the students be motivated to want to take these tests rather than that they be induced to become appositive, by suggestion, and assume the attitude that “the teacher made us take the exams” (p. 555).
In this one statement, Hendricks included a variety of ideas that would impact science and mathematics teaching to the present day. These include overlapping goals of mathematics and science teaching; the importance of making students conscious of the classroom goals; and the assessment of affect and values.
Hendricks concluded his series by quoting the list of nine issues to consider in the improvement of testing. The original source document was titled “Achievement Tests in Relation to Teaching Objectives in General College Botany,” written by Clark Horton and published by the Botanical Society of America in 1939. Of the items in Horton’s list, my favorite is Statement 5: “An increased effort to evaluate the attainment of goals of instruction other than the acquisition of information – particularly aspects of thinking involved in the use of scientific method and in scientific attitude” (p. 559).
The problem of discerning what is meant by teaching to attain goals (or perhaps testing to evaluate the attainment of goals) in contrast to teaching to cover content is the most vexing problem in the reform of schools today. Clearly, it has a long-term standing. In the end, Hendricks had covered a great swath of the issues surrounding testing in the immediate postwar period and today. Through his analogies, his careful division of the materials into topics, and his methodical approach, he provided the readers of SSM with a worthwhile overview of the issues of school testing. The overview is still worthwhile today.
Finally, at the beginning of the October 1947 issue of SSM, there was printed along memorial to Dr. Otis W. Caldwell. As the authors of Early Days, we are well aware of Dr. Caldwell, as he was a constant presence in the pages of SSM since its first issue. He served as president of the association in 1905 and 1906 and was a prolific publisher of scholarly works. As one example of his productivity, publication dates for his books alone were 1901, 1903, 1911, 1914, 1915, 1923 (twice), 1925, 1929, 1931, 1933, 1934, 1940, and 1943. However most interesting in this memorial are quotes from an address that Dr. Caldwell had prepared to give for national broadcast on CBS radio. His subject was “Progress: Read to Learn.” But what is most interesting is his statement about who should strive for education. Caldwell is quoted as follows:
The scope and nature of American education has changed. Everybody may now have an education provided he has a little of the necessary gray matter. We even waste a lot of educational time trying to educate a minority who do not possess the ‘makings,’ because a little education has become the shibboleth of social respectability” (p. 601).
This seems an interesting attitude for such a wellrespected leader of education, but perhaps that is evidence that something really has changed in education. Clearly, it is not the statement of a person who believes that everyone should learn science or that universal scientific literacy should be the goal of schooling.
In our next column, Early Days will continue to explore the science and mathematics education issues of the postwar United States.
Editor’s Note: I Steve Oliver’s and B. Kim Nichol’s postal address is The University of Georgia, College of Education, 212 Aderhold Hall, Athens, GA 306027126, and e-mail address is email@example.com.
Copyright School Science and Mathematics Association, Incorporated Feb 2000
Provided by ProQuest Information and Learning Company. All rights Reserved