Student-Written Essay and the New SAT, The
A “student-written essay” will be part of the new writing section on the SAT beginning in the spring of 2005. The essay will be on all versions of the new SAT, unlike the ACT, where the essay will be optional. The test’s advocates purport that the addition of an essay on the new SAT will be an improvement for two reasons; it may cause the teaching of writing to be emphasized more at the secondary level, and add an assessment of students’ writing to the process of making admission decisions, thereby increasing the test’s accuracy in predicting freshman year GPA. The College Board’s Web site also mentions the possibility of using the new section for placement decisions.
I contend that the effects of this addition may, in practice, cause the wrong type of teaching at the secondary level, make the test less reliable, and cause a number of related problems for high schools and students. The purpose of this article is to point out the educational and statistical problems inherent in this section of the proposed test, and why the new essay should not become part of education or admission.
First of all, the essay may be educationally damaging to the students whose teachers change their approach to teaching writing to teach to this test. The speeded nature of the test and the “holistic” scoring rubric that will determine a student’s score may corrupt the teaching of good writing. The time limit on the test and the rushed approach to the grading of the test are done for convenience, not because these are parts of a good writing assessment. The format of this test should not dictate how writing is taught in our schools.
Another problem is that many states currently have high-stakes tests that determine students’ promotion and/or graduation, and the number is increasing. Also, because of “No Child Left Behind,” scores on these tests have significant consequences at the federal level for schools, school systems, teachers, and administrators. Most of these state-mandated tests have a writing component that is not usually timed, and have different formats and are scored on very different rubrics than the SAT writing test will be. This may create problems and confusion as schools and teachers will be forced to implement different and competing writing curricula as they try to serve two masters.
Good essay writing is a process and the teaching of good essay writing teaches it as a process. The old adage, which has a number of variations, is that “good writing isn’t written, it’s rewritten.” Good writing is planned, drafted, edited, rewritten, and proofread until the writer is satisfied with the product. If the new essay only evaluates the ability to write “an impromptu, quickly written first draft” (“Technical Guide to the SAT Ih Writing Subject Test,” College Board, 1999, p. 4), we run the risk that this is all some will expect the teaching of writing to be. If this is what the teaching of writing becomes, then it will be formularized and trivialized.
In terms of statistics, we will not know the validity and reliability of the new SAT for years. However, because the essay on the current SAT II: Writing Test is very similar to the proposed essay for the new SAT, this can be used as a model for what the statistics on the new test might look like. The statistics on the currently used essay indicate that there may be serious problems with reliability for the new essay. In addition, as the SAT Il is used on a much less diverse population of test takers than the new SAT will be, it is very possible that the new test will be even more unreliable.
According to a table on the College Board Web site*, the writing score reliability coefficient, which is “the correlation between scores on two essays, and is analogous to test-retest reliability” is .58. In the world of educational testing, tests with reliability coefficients below .70 may have limited applicability for individual measurement, as there is too much measurement error in the score. The .58 coefficient is well under the minimum acceptable for a test to be considered reliable, and a test cannot be valid unless it is reliable-reliability is a necessary, but not sufficient, condition for validity.
Because of the low test-retest reliability, many students who take it more than once will notice significant differences in their scores, and students will have to take the test more times so that they will be able to present their “true scores” rather than their “obtained scores” to colleges. In other words, the need to retest may be because of problems inherent in the test, as students cannot be sure that the score that they received was an accurate indicator of how they could actually do on the test. Repeating the test benefits no one except the College Board and ETS; the student does not benefit educationally. The time, effort and money that go into repeated testing are, in fact, counter-productive.
These uncertainties will also send students, who can afford it, to more and more coaching classes. If good coaching and good teaching were the same thing, that would at least be an advantage for those who can afford it. However, coaching is not good writing instruction, it is teach-to-the-test, format-driven, and rubric-driven instruction and drills. Because others will not be able to afford good coaching, the new section of the test will only exacerbate the inequalities already present in the test.
According to the table referenced above, the essay scoring reliability, which is “the correlation between two readings of an essay after adjudication,” is .77-.S2. (Adjudication occurs when the two graders differ by more than two points on the one-to-six point scale when scoring an essay.) While these coefficients are above the .70 threshold referenced above, they are still not as high as one would like to see in a test with such important consequences. Generally in educational and psychological measurement, reliability coefficients between .80 and .89 are considered “good” and those between .70-.79 are considered “adequate.” “Good” and “adequate” are relative terms; what might be adequate for some forms of measurement, might be too low for others. The highest standards should be used for tests with which decisions about individuals are made.
Two graders quickly assess the essays (about 90 seconds total) on a one to six scale, with a score of six demonstrating “clear and consistent mastery” and a one demonstrating “very little or no mastery.” While the differences between these extremes seem clear, the differences between a five (“reasonably consistent mastery” and a four (“adequate mastery”), or between a four (“adequate mastery”) and a three (“developing mastery”) are much less clear. (It is interesting to note that the word “competence” was used in the SAT II: Writing essay grading system and this has been replaced by the word “mastery.” Can we conclude that students’ writing has improved already?) The essay will receive a score between 12 and two, which makes it a short rating scale. Therefore, even a one-point difference in a score is significant on this assessment and a student’s actual score on it may be largely determined by the day he or she took the test and who graded it, more than his or her writing ability.
Many factors may be causing the problems with reliability. Until these factors are better understood and rectified, the essay should not be added. Colleges that want students to take a speeded “holistically-scored” writing test can require the ACT with the optional writing drill. I don’t see any reason to subject an additional one million students per year to this artificial, and educationally and statistically problematic, test of writing ability. If we want an evaluation tool that both encourages the teaching of good writing and accurately assesses a student’s writing ability, then we need to wait for more open discussion and more research than we are receiving from the College Board and ETS about the “new and improved” SAT.
* www. col legeboard.com/sat/cbsenior/ stats/stat033.html
BRAD MACGOWAN is the career center director and a counselor at Newton North High School (MA). He received a B.A. in Psychology (1981), an Ed.M. in School Counseling (1983) and an Ed.D. in Developmental Studies and Counseling from Boston University (MA). He is currently president of New England ACAC (2004-2005). In 1998, he was chosen to be a NACAC MIATPA Research Scholar; his report is posted on the “Research” section of the NACAC Web site.
Copyright National Association of College Admissions Counselors Winter 2005
Provided by ProQuest Information and Learning Company. All rights Reserved