Teaching critical thinking: The more, the better!

Teaching critical thinking: The more, the better!

Solon, Tom

A three-group study (N = 75) examined the question whether different amounts of critical thinking instruction would lead to significantly different levels of improvement in critical thinking test scores. The researcher used the Cornell Z Test (Ennis, Millman, & Tomko, 1985) to compare the pre and post scores of critical thinking (full-treatment), psychology (partial-treatment), and rhetoric (no treatment) student groups.A one-way ANOVA revealed a significant difference among the groups at post testing. Subsequent Newman-Keuts analysis indicated significant differences in all three pairwise comparisons. In other words, the partial treatment psychology group improved measurably more than the no-treatment rhetoric student controls, and the full-treatment critical thinking students significantly outperformed both of the other groups. The overall effect size was large, and the observed power was correspondingly high. Additional controls helped to address many of the commonly recognized threats to validity in quasi-experimental research.


Critical thinking courses, often packaged and marketed as basic logic courses, have long been a feature of the American higher education scene. More recently, with the advent of the critical thinking across the curriculum movement (Cameron & Richmond, 2002), numerous catalog course descriptions and syllabi have come to reflect the strategic infusion of small to moderate amounts of critical thinking materials into a fairly broad spectrum of undergraduate offerings in a wide variety of academic disciplines. An obvious assumption underlying both pedagogical procedures is the claim that they do in fact produce improved critical thinking. But what epistemic or empirical justification do such assertions have? Does it really make any measurable difference which of the two approaches to critical thinking instruction one uses? For that matter, how does one know whether any extent of critical thinking intervention has a quantitatively notable effect? Even more fundamental, what component attributes are to be included in a defensible construct of “critical thinking,” and how is it to be adequately implemented?

The following paper attempts to address these and other related issues. It reports the results of a controlled study of critical thinking development in a sample (N = 75) of community college students. A full-treatment experimental group was taking a critical thinking course. Another partial-treatment experimental group was enrolled in an introductory psychology course (i.e., the infusion technique). A third no-treatment control group was composed of rhetoric students. The critical thinking group had more than 40 hours of classroom instruction in critical thinking and over 80 hours of homework exercises. The psychology group received approximately 10 hours of class time intervention and about 20 hours of additional outside assignments. The rhetoric group, by way of contrast, had little or no critical thinking instruction (as operationally defined herein). The investigation focused on a specific research question:

Would a full-treatment group of critical thinking students improve their Cornell Z scores significantly more than both a partial-treatment group of psychology students and a no-treatment group of rhetoric student controls? The study is a sequel to Solon (2001).

The construct of critical thinking

A survey of the literature indicates that the term “critical thinking” is employed in a variety of ways. Some writers, such as McPeck (1981), use it exclusively in reference to those discipline specific analytical and problem solving skills necessary for a fairly advanced level of work in a particular field. There is, however, a much more commonly accepted sense of the term, and that is the intent here. “Critical thinking,” as used in this paper, refers to a set of basic and generic reasoning skills. These skills include the ability to identify and/or distinguish between:

1. inferences and non-inferences

2. assumptions (covert as well as overt) and conclusions

3. consistent and inconsistent statement sets

4. deductive and inductive reasoning

5. valid and invalid arguments

6. credible versus seriously questionable claims and sources

7. meaningful versus vague, ambiguous, and/or meaningless language

8. relevant versus irrelevant evidence

9. scientific versus pseudoscientific procedures

These nine distinct but inter-related abilities collectively constitute, of course, only an elementary sense of the term. A global and more comprehensive concept would no doubt require the enumeration of many other additional attributes. Nevertheless, the nine abilities that have been listed here do indeed form essential foundational elements for any reasonable, if perhaps loftier, notion of what critical thinking is. Advanced discipline specific reflective thought processes ordinarily presuppose these more fundamental and generic (interdisciplinary) reasoning skills, McPeck and others to the contrary notwithstanding. Furthermore, such abilities are behaviorally observable, measurable (at least indirectly so), and readily lend themselves to objective and standardized testing, as in the CAAP (ACT, 1990), Cornell (Ennis, Millman, & Tomko, 1985), and Watson-Glaser (Watson & Glaser, 1980) tests. Moreover, these nine basic critical thinking skills are the very ones ordinarily taught in beginning logic and critical thinking courses, and to a lesser extent in other philosophy and psychology courses (at least those like the author’s). Finally, they are also reflective of the content of many textbooks, such as those of Copi and Cohen (2000), Ennis (1996), and Jason (2001).

Review of literature

Although there exists an abundant theoretical and pedagogical literature on critical thinking in higher education (e.g., Kurfiss, 1988), there remains nevertheless a relative scarcity of published empirical work on the subject (see McMillan, 1987 and Solon, 2001). Even the February 1995 special issue of Teaching of Psychology contains less than a handful of articles that involve quantitative research on critical thinking. Also, while there is some empirical evidence from several different studies that an entire four-year undergraduate experience contributes to modest gains in overall critical thinking skills (King & Kitchener, 1994; Lawson, 1999; Pascarella & Terenzini, 1991), there is so far little scientific basis for the notion that a single college course–other than a critical thinking type of course-makes any positive measurable difference (Annis & Annis, 1979; Leshowitz, 1989; Nisbett, 1993; Ross & Semb, 1981; Seibert & Hedges, 1999; van Gelder, 2000). And, even in the cases of critical thinking courses, the evidence is mixed (see van Gelder, 2000).

Two recent studies, however, suggest that even more limited moderate amounts of critical thinking instruction can also lead to improved scores on one commercial instrument, namely the Cornell Z test. Allegretti and Frederick (1995) report having found that a group of college seniors (N = 24) who were enrolled in an interdisciplinary (psychology and philosophy) ethics seminar made statistically significant pre to post gains on the Cornell Critical Thinking Test, Level Z. Allegretti and Frederick imply that their systematic employment of the Toulmin model of reasoning analysis (Toulmin, Rieke, & Janik, 1984) may have been at least partially responsible for the impressive results obtained in their study. Such a conclusion may very well be correct. However, their study did not include a control group.

Encouraged by the findings of Allegretti and Frederick, but mindful of the need for additional controls, Solon (2001) proceeded to conduct further research on this topic. In his study, he found that a moderate (partial) treatment group of introductory psychology students (n = 26) improved their Cornell Z scores significantly more than a comparable untreated group of humanities students (also n = 26). The level of statistical significance reached was beyond .001, the effect size was greater than 1.0, and the observed power was .94 (for [alpha] = .05). Despite having obtained impressive quantitative results that are consistent with the hypothesis that a moderate amount of critical thinking instruction can be effective, the small total number of participants (N = 52) and the absence of random assignment in the Solon study point to the need for further research on this important topic.

The present paper, therefore, represents yet another, more methodologically refined effort to help resolve more fully the empirical issue of college coursework effects on critical thinking development. Specifically, the current study addresses the question whether different amounts of critical thinking instruction contribute to different levels of improvement in test scores. In particular, the investigation tests the hypothesis that a full-treatment group of critical thinking students will improve their Cornell Z scores more than both a partial-treatment group of psychology students and a no-treatment group of rhetoric students.



Seventy-five community college students took part in the study in exchange for a small amount of extra-credit (equal to about 5 percent of the course grade). Forty-four were women, 31 were men (6 of the total sample were members of various minorities-evenly distributed among the groups). The full-treatment experimental group (n = 25: 15 women; 10 men) was taking a critical thinking course. A partial-treatment experimental group (n = 24: 14 women; 10 men) was enrolled in an introductory psychology class. The no-treatment control group (n = 26: 15 women; 11 men) was composed of rhetoric students. Although this investigation did not involve random assignment of subjects, the investigator had several sources of data, all of which indicated a close initial similarity-if not equivalence-among the three groups.

First, they had similar educational backgrounds. All were freshmen or sophomores. Second, and more importantly, their critical thinking pretest scores were also similar (critical thinking M = 23.26, psych M = 23.625, rhet M = 24.23). Third, they exhibited similar mean (M) scores on the ASSET reading test, an ACT product used for placement purposes (critical thinking M = 43.88, psych M = 43.75, rhet M = 44.13). Fourth, they had similar GPA means (critical thinking M = 2.79, psych M = 2.86, rhet M = 2.73). Fifth, there were no notable group differences in average age, gender, ethnicity, or program of study. The most frequently declared major in all groups was business. All of this evidence combined to suggest that any appreciable posttest differences would not be due to any basic relevant academic or demographic group differences at the outset.


The critical thinking course covered the standard logic topics of deduction, induction, validity, soundness and scientific methodology. Formal and informal logic received roughly equal emphasis. Lectures were kept to a minimum. The majority of class time was spent on specific cases of argument analysis, small group work, and discussion (see the Appendix for descriptions and details regarding a sample class, homework assignment, and small group exercise topic). Students were required to keep a journal and had regular weekly assignments to collect and critique relevant critical thinking items from mass media sources of their choice.

There was no text, but the instructor provided copious study notes and exercises on syllogistic logic, the prepositional calculus, informal fallacies and the logic of science. There were also numerous handouts on the application of logic and critical thinking to various issues in philosophy, politics, law and everyday life in general (Solon, 1972; 1973; Solon & Wertz, 1969). In addition, audio-visual materials, such as videotapes of debates and television commercials were employed for purposes of critical analysis. Tests (other than the pre and post measures that were not used for grading purposes) were largely essay in nature, and required the students to develop cogent (or at least plausible) arguments to back their own chosen claims and/or recommendations. These essays ranged from a single short paragraph to several pages in length. They averaged between 250 and 500 words each.

The psychology course covered a number of topics typical for an introduction to the field. There was, however, a greater than average emphasis on research methods and critical thinking. The instructor supplemented the Coon (1998) text with additional materials and exercises, such as those that can be found in Chapters Four through Six of Halpern (1996) and Part Two of Meltzoff (1998). Although he did not attempt to teach an entire logic or critical thinking course in this particular setting, he did introduce his psychology students to a number of rules of inference and logical fallacies, such as modus tollens and post hoc ergo propter hoc, and Mill’s Methods of Induction. He also devoted several class periods to basic topics in statistics, tests and measurements.

He estimated that approximately one-fourth of all class-meeting time involved critical thinking instruction and activity (about 10 hours). There were also special ungraded critical thinking homework assignments that took an average of 20 hours to complete. Additionally, during the course, students were required to write six graded argumentative essays on psychology issues of their choice. Each essay was 250 to 500 words in length. Collectively they accounted for one-half of the course grade. Grading of these essays was based primarily on those critical thinking principles to which the students had been introduced during the semester.

The no-treatment rhetoric control group course, on the other hand, consisted of basic college level general composition instruction and material. Only one of six writing assignments required a persuasive essay. There was no sustained attempt to teach logical or critical thinking skills as operationally defined in this study.

The dependent measure selected for use in the study was the same Cornell Z that had been employed by both Allegretti and Frederick and Solon. The Cornell Z is a 52-item multiple-choice test, specifically designed for college students and other adults. It is relatively inexpensive, easy to administer and score, and the recommended 50-minute time limit makes it convenient for both the instructor-researcher and the student. It is similar in structure and content to the older, more well known Watson-Glaser Critical Thinking Appraisal, Forms A and B and the more recently developed CAAP Critical Thinking Test, Forms 88A and 88B, produced by the ACT organization.

The Cornell Z covers a wide variety of test items involving deduction, induction, critical observation, credibility, assumption identification, and meaning recognition. User norms and reliability data provided in the test manual, although far from ideal, meet or exceed the minimal psychometric standard for use with groups (i.e., r = .60). Split-half reliability data reported for two groups similar to those in the present study were .76 and .75 respectively. Internal consistency indices computed on two other midwestern samples (both N = 40) were .72 and .76 respectively. The most recently published reliability study on the Cornell Z (Frisby, 1992), and one not covered in the 1985 manual, had a total N of 527 and included a fairly large sample of community college students similar in ability to those in this study (Cornell Z M = 25.51). That sample yielded a Kuder-Richardson reliability measure of .80.

As for the attendant matter of validity, the manual cites several studies that, although relatively small in the total number of subjects included, nevertheless indicate consistent support for the divergent, convergent, and predictive validity of the Cornell Z. One cross-sectional study (Mines, 1980), for example, showed a significant difference between Cornell Z scores of freshmen and graduate students at the same institution. Also, in a related study (N= 100), the same investigator found a .79 correlation between Cornell Z and Watson-Glaser Critical Thinking Appraisal scores. Still other studies (Garett & Wulf, 1978; Linn, 1982) reported in the manual show that the Cornell Z predicts graduate school grades as well as does either the GRE General Test or the Miller Analogies Test.

Because it covers a wider variety of types of items, the Cornell Z seems to be superior in content validity to the Watson-Glaser test. Also, upon inspection, the Cornell test appears to have a higher ceiling than the shorter (32-item) CAAP test. More recent studies on the Cornell Z, such as that of Frisby (1992), have generally confirmed the value of the test. Furthermore, the most recent comprehensive scholarly review of the test by Lawrenz and Orton (1992) has fully endorsed its use.


All three groups took the Cornell Z test at the beginning and end of the semester. Students also supplied information regarding their current class schedules, extra-curricular activities, and employment. Anyone taking another course with a substantial critical thinking component, such as statistics or research methods, was excluded from the study. In addition, anyone who was involved with, say, the debate team, and/or anyone who was a research trainee (or someone employed in investigative journalism or detective work) was also removed from the study. As a result of these strictures, initial pools of 59 (critical thinking), 62 (psychology), 65 (rhetoric) shrank to pretest samples of 32 (critical thinking), 30 (psychology), 33 (rhetoric). Subsequent attrition during the school term left 25 in critical thinking, 24 in psychology, and 26 in rhetoric at posttest time.


Pretest results provided the following data: critical thinking (M = 23.76, SD = 4.51); psychology (M = 23.63, SD = 5.17); rhetoric (M = 24.23, SD = 5.19). Following the recent recommendations of Keselman, Huberty, Lix, Olejnik, Cribbie, Donohue, Kowalchuk, Lowman, Petosky, Keselman &. Levin (1998), Shapiro-Wilk W and F tests were performed in order to confirm the normality of the individual samples and the homogeneity of variance of each pair of samples. Those underlying assumptions of ANOVA having been verified, a one-way ANOVA yielded the following result: F (2, 72) = 0.10, p = .90. This outcome indicates that the three groups were not significantly different at pretesting.

Posttest results provided the following data: critical thinking (M = 30.32, SD = 3.67); psychology (M = 26.88, SD = 4.24); rhetoric (M = 23.27, SD = 5.51). Again Shapiro-Wilk W and F tests were conducted to confirm that the samples were normally distributed and that the variances were homogeneous. These ANOVA assumptions having been met, a one-way ANOVA produced the following outcome: F (2, 72) = 15.26, p = .00003, Effect Size (Cohen’s f) = .65, Power at the .05 level = .99. This result indicates a significant difference among the groups at post testing.

A post hoc Newman-Keuls test revealed significant differences in all three pairwise comparisons. The psychology post mean was significantly higher than the rhetoric post mean at the .05 level. Also, the critical thinking group significantly exceeded psychology at the .05 level and rhetoric at the .01 level. Table 1 provides a summary of the Newman-Keuls test results. Figure 1 conveys the same results in a simple graphic format. In accord with suggestions made by Cohen (1988), Robinson & Levin (1997), Thompson (1997), and Wilkinson and the APA Task Force on Statistical Inference (1999), the writer has included effect size (Cohen’s f) and power calculations. With regard to the latter, he has followed the guidelines of Hopkins, Coulter, & Hopkins (1981), and Shaughnessy, Zechmeister &. Zechmeister (2000).

Conclusions and recommendations

The gains made by the critical thinking group in this study parallel those reported earlier by Allegretti and Frederick in their 1995 research. Also, the gains achieved by the psychology group in this case are comparable to those reported earlier by Solon (2001). The overall effect size is in the large range and the level of power of this study is high. Interestingly, there are now three outcome studies showing positive philosophy and psychology coursework effects with the Cornell Z. Moreover, all these findings with the Cornell Z, coupled with the results obtained by Ross & Semb (1981), Seibert &_ Hedges (1999), and van Gelder (2000)–which involved the use of various alternative measures-begin to suggest a pattern of empirical support for the thesis that deliberate instruction in critical thinking can indeed be effective. The current findings seem also to provide initial provisional evidence that different levels of treatment can lead to significantly different levels of improvement. The critical thinking course intervention had more impact than the infusion approach. Additionally, if one includes the results of studies such as those of Annis &. Annis (1979), Hatcher (1999), Leshowitz (1989), Reed &. Kromrey (2001), Riniolo & Schmidt (1999), and Wesp & Montgomery (1998), one is faced with an emerging trend (perhaps it is only a mini-trend) of inductive evidence leading to the following conclusion: A little critical thinking instruction can do some good, and the more instruction students get, the better they seem to do on a variety of dependent measures.

Due to the pre-experimental nature of the Allegretti and Frederick investigation and the quasiexperimental character of the rest of the studies mentioned (including this one), the totality of evidence is not yet definitive. Obviously, more research work needs to be done in order to help fully resolve this important issue. Perhaps the next logical step would be to compare different methods of critical thinking instruction at the same level of treatment. In such a study it might be feasible to do a genuine field experiment, complete with random assignment of participants. Randomization, of course, continues to be the gold standard of empirical research. Other possible directions for future investigation would be controlled studies on philosophy courses that contain a substantial critical thinking component, such as ethics and introductory philosophy. There appears to be a definite need for more empirical research in these areas. Teachers who are interested in pursuing such projects, but would prefer to use an essay-type test, might well consider adopting the Ennis-Weir Test (1985) as their dependent measure. A recent article by Reed & Kromrey (2001) provides an excellent model for this kind of investigation-particularly in a community college setting.

Perhaps some day, in the not too distant future, there will be a sufficient number of quality published studies to warrant a comprehensive metaanalysis. When that day arrives, the issue of the efficacy of critical thinking instruction will truly be settled beyond a reasonable doubt. Until then, however, there remains much work to do.


Allegretti, C. L., & Frederick, J. N. (1995). A model for thinking critically about ethical issues, Teaching of Psychology, 22, 46-48.

American College Testing Program (ACT). (1990). Report on the technical characteristics of CAAP: Pilot year 1: 1988-1989. Iowa City, IA.: Author.

Annis, D., & Annis, L. (1979). Does philosophy improve critical thinking? Teaching Philosophy, 3, 145-152.

Cameron, P., &. Richmond, G. (2002). Critical thinking across the curriculum. The Community College Enterprise, 8(2), 59-70.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences, 2nd ed. Hillsdale, NJ: Erlbaum.

Coon, D. (1998). Introduction to psychology: Exploration and application, 8th ed. Pacific Grove, CA: Brooks/Cole.

Copi, I. M., & Cohen, C. (2000). Introduction to logic, 11th ed. Upper Saddle River, NJ: Prentice-Hall.

Ennis, R. H. (1996). Critical thinking. Upper Saddle River, NJ: Prentice-Hall.

Ennis, R. H., Millman, J., & Tomko, T. N. (1985). Cornell Critical Thinking Tests, Level X and Level Z manual, 3rd ed. Pacific Grove, CA: Midwest Publications.

Ennis, R. H. & Weir, E. (1985). The Ennis-Weir Critical Thinking Essay Test manual, Pacific Grove, CA: Midwest Publications.

Frisby, C. L. (1992). Construct validity and psychometric properties of the Cornell Critical Thinking Test (Level Z): A contrasted groups analysis. 291-303.

Garett, K., &. Wulf, K. (1978). The relationship of a measure of critical thinking ability to personality variables and to indicators of academic achievement. Educational and Psychological Measurement, 40, 437-450.

Halpern, D. F. (1996). Thought and knowledge: An introduction to critical thinking, 3rd ed. Hillsdale, NJ: Erlbaum.

Hatcher, D. L. (1999). Why critical thinking should be combined with written composition. Informal Logic, 19, 171-183.

Hopkins, K. D., Coulter, D. K., & Hopkins, B. R. (1981). Tables for quick power estimates when comparing means. Journal of Special Education, 15, 389-394.

Jason, G. (2001). Critical thinking: Developing an effective uiorldview. Belmont, CA: Wadsworth/Thomson.

Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donohue, B., Kowalchuk, R. K., Lowman, L. L., Petosky, M. D., Keselman, J. C., & Levin, J. R. (1998). Statistical policies of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350-386.

King, P. M., & Kitchener, K. S. (1994). Developing reflective judgment. San Francisco: Jossey-Bass.

Kurfiss, J. G. (1988). Critical thinking: Theory, research, practice, and possibilities. Washington, DC: Association for the Study of Higher Education.

Lawrenz, F., &. Orton, R. E. (1992). (Review of the) Cornell Critical Thinking Tests-Level X and Level Z. Test Critiques, 9, 123-131.

Lawson, T. J. (1999). Assessing psychological critical thinking as a learning outcome for psychological majors. Teaching of Psychology, 26, 207-209.

Leshowitz, B. (1989). It is time we did something about scientific illiteracy. American Psychologist, 44, 1159-1160.

Linn, R. (1982). Ability testing: Individual differences, prediction, and differential prediction. In A. K. Wigdor and W. R. Garner (Eds.), Ability testing: Uses, consequences, and controversies (pp. 335-388). Washington, DC: National Academy Press.

McMillan, J. H. (1987). Enhancing college students’ critical thinking: A review of the studies. Research in Higher Education, 26, 3-29.

McPeck, J. (1981). Critical thinking and education. New York: St. Martin’s Press.

Meltzoff, J. (1998). Critical thinking about research: Psychology and related fields. Washington, DC: American Psychological Association.

Mines, R. A. (1980). Levels of intellectual development and associated critical thinking skills in young adults. Dissertation Abstracts International, 41, 1495A.

Nisbett, R. E. (Ed.). (1993). Rules for reasoning. Hillsdale, NJ: Erlbaum.

Pascarella, E. T., & Terenzini, P. T. (1991). How college affects students. San Francisco: Jossey-Bass.

Reed, J. H., &. Kromrey, J. D. (2001). Teaching critical thinking in a community college history course: Empirical evidence from infusing Paul’s model. College Student journal, 35(2), 201-215.

Riniolo, T. C. &. Schmidt, L. A. (1999). Demonstrating the gambler’s fallacy in an introductory statistics class. Teaching of Psychology, 26, 198-200.

Robinson, D. H., & Levin, J. R. (1997). Reflections on statistical and substantive significance, with a slice of replication. Educational Researcher, 26, 21-26.

Ross, G. A., &. Semb, G. (1981). Philosophy can teach critical thinking skills! Teaching Philosophy, 4, 111-122.

Seibert, C., & Hedges, S. (1999). Do students learn in my logic class: What are the facts? Teaching Philosophy, 22, 141-159.

Shaughnessy, J. J., Zechmeister, E. B., & Zechmeister, J. S. (2000). Research methods in psychology, 5th ed. Boston: McGraw-Hill.

Solon, T. (1972). Some logical issues in Aquinas’ third way. Proceedings of the American Catholic Philosophical Association, 46, 78-83.

Solon, T. (1973). The logic of Aquinas’s tertia, via. Mind, 82, 598-599.

Solon, T. (2001). Improving critical thinking in an introductory psychology course. Michigan Community College Journal: Research and Practice, 7(2), 73-80.

Solon, T., & Wertz, S. K. (1969). Hume’s argument from evil. The Personalist, 50, 383-392.

Thompson, B. (1997). Editorial policies regarding statistical significance tests: Further comments. Educational Researcher, 26, 29-32.

Toulmin, S., Rieke, R., & Janik, A. (1984). An introduction to reasoning, 2nd ed. New York: Macmillan.

van Gelder, T. J. (2000). Learning to reason: A reason!-able approach. In C. Davis, T. J. van Gelder, & R. Wales (Eds.), Cognitive science in Australia, 2000: Proceedings of the fifth Australian cognitive science society conference, Adelaide, AU: Causal.

Watson, G., &. Glaser, E. M. (1980). Watson-Glaser Critical Thinking Appraisal, Forms A and B manual. San Antonio, TX: Psychological Corporation.

Wesp, R., &. Montgomery, K. (1998). Developing critical thinking through the study of paranormal phenomena. Teaching of Psychology, 25, 275-278.

Wilkinson, L., & the APA Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.

Tom Solon

Mr. Solon is Professor of Psychology and Philosophy at Danville Area Community College, Danville, Illinois.

Copyright Schoolcraft College Fall 2003

Provided by ProQuest Information and Learning Company. All rights Reserved