Bridging The Qualitative-Quantitative Data Canyon

Bridging The Qualitative-Quantitative Data Canyon

Thomas W. O’Rourke

While the goals of some data collection may be exclusively qualitative or quantitative, in many cases both are desirable. Quantitative data are characterized by a numerical response. The number may be an actual value, such as a person’s weight, years of education, cholesterol level, or test score. The number may also reflect an arbitrary value such as a survey variable where male is given a code of 1 and female a 2. In this case the data are nominal, where the numbers are used only to categorize different groups. The numbers can also represent an ordinal scale (1 = never, 2 = rarely, 3 = sometimes, 4 = often, 5 = always) or an interval scale (what percent of the students were in attendance on a certain date).

In contrast, qualitative data are characterized by a nonnumerical value. The data may be simply recorded verbatims. For example, in evaluating a health education program, you may have several open-ended questions such as “What do you think would be the best thing for you to do to improve your health in the next six months?” or “What did you like most about the program?”


Generally, in collecting quantitative data, you should attempt to keep open-ended questions to a minimum. For ease of analysis you should categorize possible responses, including an “other” for unanticipated responses, to make the list exhaustive and mutually exclusive. That is, all possible answers are included and any one answer can fit into only one category. You would then provide an arbitrary number for each category. For example, in a recent survey, college students were asked the following question: “Why did you go to another provider for care you could have received on campus?”

Before reading on, think about the categories you would have included. As you might suspect, there are many possibilities. In this survey possible answers we developed were:

Yes No

1. Didn’t think service was offered at the 1 2

health center on campus.

2. Don’t like the health center on campus. 1 2

3. Could get care sooner elsewhere. 1 2

4. Could get better care elsewhere. 1 2

5. Location more convenient than the 1 2

health center on campus.

6. Hours more convenient than the health 1 2

center on campus.

7. Was off-campus at the time. 1 2

8. Better ethnic or gender representation elsewhere. 1 2

9. More culturally sensitive to my needs. 1 2

10. Physician was personal/family doctor before 1 2

coming to campus.

11. Concerns about confidentiality at the health 1 2

center on campus.

12. Cost of care covered by insurance or somebody 1 2

13. Other (PLEASE SPECIFY) 1 2

On face value, because of our past experience and from the results of a pretest, we felt fairly comfortable with the categories developed. However, it would not be surprising that if we had a group of 10 to 20 people developing response categories, including yourself, the categories developed might be different. By categorizing from the outset we, in fact, determined the response categories–set the parameters for people to respond to rather than have them tell us their reasons and then coding the data.

Certainly there are many instances when open-ended questions are not needed. For example, if you wish to know if people simply had used the student health center within the past year, then you might categorize the question as “During the past 12 months, have you used the student health center?” (yes-no). However, if you wished to know what they like most and least or why they had gone to an outside provider for a service they could have received at the health center on campus, then a qualitative response may be appropriate.


It should be noted that both qualitative and quantitative data can be very useful. It need not be a case of either-or. While each can stand on its own, it is possible (and the purpose of this article) to show that the two types of data can be bridged. That is, it is possible to incorporate the advantages of the more open, less structured qualitative data into a quantitative format for subsequent analysis. Here’s an example.

Let’s say you are interested in the question posed earlier: “What do you think would be the best thing for you to do to improve your health in the next six months?” Obviously, many responses are possible. If you had a sample of 500 people you would expect many different responses. Some might be similar while others more detailed or unique. One way to analyze these data qualitatively would be simply to review the verbatim responses. Another would be to transform the qualitative responses into a useful quantitative format. For example, a number of respondents may have given answers related to good nutrition or healthful eating. Others may have mentioned getting more exercise or more activity. In this case, quantitative categories could be created, such as 1 = nutrition and 2 = exercise, etc.

If you had a large number of cases to work with, you might be able to create even more categories. For example, you could create a hierarchical coding structure such as:

10 = nutrition, general

11 = fewer calories

12 = less fat

13 = lower cholesterol

14 = more fruits/vegetables

15 = less red meat


19 = other specific nutrition

20 = exercise/activity, general

21 = walking, jogging

22 = aerobics


29 = other specific exercise/activity

It is not necessary to use every number in sequence. If the open-ended question generated multiple answers from some respondents, one could use the same set of codes for them (e.g., mention #1, mention #2, mention #3), including a code (e.g., 0 or blank) to indicate no additional mentions.

Using a coding structure like this would allow you to (a) code specific as well as general responses and (b) obtain information about topics easily. For example, if you wanted data at the most specific level, you would analyze the variables coded with the previous categories. If you wanted the more general groups, you could create new variables (e.g., area) so that:

if improve = 10-19, area = 1 (nutrition) if improve = 20-29, area = 2 (exercise) etc.

It should be noted that the level of specificity requires a sufficient number of cases. It would not be useful to create 50 categories for 100 cases. In that situation it would be more appropriate to have only 5 to 10 categories.

The answers to one question can thus be presented numerically yet framed by rich verbatim responses. In this way the researcher can get the benefits of the qualitative data as well as the quantitative data, in essence, bridging the qualitative-quantitative data canyon and improving your data analysis in the process.


Aday, L. (1996). Designing and Conducting Health Surveys. (Second edition). San Francisco: Jossey-Bass.

Bell, F. D. (1995). Basic Biostatistics. (Chapter 1). DuBuque, IA: Wm C. Brown Communications, Inc..

Sarvela, P. & McDermott, R. (1993). Health Education Evaluation and Measurement – A Practitioner’s Perspective. Dubuque, IA: WCB Brown & Benchmark, p. 239.

Sudman, S., & Bradburn, N. (1982). Asking Questions: A Practical Guide to Questionnaire Design. San Francisco: Jossey-Bass.

Thomas W. O’Rourke is a Professor in the Department of Community Health and School of Clinical Medicine, University of Illinois at Urbana-Champaign, IL 61820. Diane P. O’Rourke is Assistant Director for Survey Operations, Survey Research Laboratory, University of Illinois at Chicago, Urbana office, 61801.

COPYRIGHT 2000 University of Alabama, Department of Health Sciences

COPYRIGHT 2001 Gale Group