Practical Guidelines for Assessing the Clinical Significance of Health-Related Quality of Life Changes within Clinical Trials
Health-related quality of life (HRQOL) assessment is becoming common practice in many clinical trials. There is much debate over how to determine the clinical significance of changes in HRQOL scores. A number of techniques have been used to address this issue. This paper reviews the most popular of these approaches for use in a clinical trial setting. More specifically, the anchor-based “minimal clinically important difference” technique is described and critiqued, as is the more traditional distribution-based effect size technique. A novel application of effect size, which applies a common statistical premise known as the empirical rule, is also presented. The review of these techniques indicates that there is no single, optimal solution to determining clinical significance of changes in HRQOL scores. However, it is encouraging to note that they all suggest a similar criterion of a half-standard deviation for whether or not a change in HRQOL score is clinically significant. Recommendations are given for reporting the clinical significance of HRQOL assessments in clinical trials.
Key Words HRQOL; Minimal clinically important difference; Effect size; Clinical significance; Psychometrics
The status of HRQOL research in clinical trials is at a critical juncture. Over the past 20 years there has been growing interest in the assessment of HRQOL, especially with the advancement in healthcare interventions. A new medication may safely prolong Lhe life of a patient but there is an increasing need to show that it will also improve the patient’s quality of life. The concept of including HRQOL in clinical trials has been fairly well established in recent years.
This growing need to assess HRQOL has predicated a proliferation of newly developed instruments. The precision with which HRQOL instruments have been developed has been gradual and variable, but a number of tried and tested instruments are now available. What remains to be delineated, however, are mechanisms that will provide ready and intuitive interpretation of the resultant data wilhin the clinical context. If we can increase a patient’s HRQOL through some health intervention, how do we know if this change is clinically relevant? This is a question with no easy answer. If lhe meaning of such data cannot be made understandable for clinicians and consumable by patients, then the impetus for measuring HRQOL is lost.
The heart of the problem with which researchers are faced is assessing a tangible benefit (improvement in health) with an intangible construct (HRQOL). Changes in HRQOL are assessed, too often, strictly in terms of statistical significance. However, a p-value of less than 5% does not necessarily provide an indication of clinical significance. For example, Wilke et al. (1) reported that treatment with alprostadil led to statistically significant changes in the Duke Health Profile HRQOL domains. Mental health, social health, and self-esteem domains had statistically significant improvements of 1.60, 2.30, and 2.31 units, respectively. Although statistically significant, it is hard to believe that a 1- or 2-point shift on a scale of O to 100 is clinically meaningful. It is not unusual to see a global summary in the discussion section of a clinical manuscript that “HRQOL significantly improved,” without a mention of the relevant power considerations. For example, a statistically significant p-value based on the comparison of groups of 500 observations each is reflective of the ability to detect changes of less than 1 point on a 100-point scale with greater than 80% power.
Philosophically, there are three perspectives for assessing the clinical significance of changes in HRQOL scores. One measure of clinical significance is the degree to which a patient’s HRQOL must change before he/she perceives that a change has occurred. The opinion of the clinician as to the amount of change in HRQOL scores that would mandate some form of clinical intervention is another way of defining significance. Finally, the epidemiological perspective regarding what proportion of patients would experience shifts in HRQOL score holds obvious clinical implications. Any or all of these perspectives may be applicable. The challenge, however, is to be able to obtain, a priori, such information in a consistent and credible fashion.
There is not likely to be a final singular solution to the problem of assessing clinical significance. Recently, special issues in Statistics in Medicine (2000, Vol. 19) and the Journal of Consulting and Clinical Psychology (1999, Vol. 67) were dedicated to this topic without a successful resolution or even an agreement to the relative merits of various approaches.
Pragmatically, there are relatively few methods for assessing the clinical significance of a change in HRQOL scores within the context of a clinical trial. How would one, for example, assess the clinical relevance of a five-point shift in posttreatment HRQOL scores? The first and simplest approach would be to review the literature to determine how other researchers have assessed the magnitude of changes on the same HRQOL instrument. However, such data are not always available, especially for a specific patient subpopulation. One is hence forced to draw upon more generic existing techniques to determine clinical significance.
There are basically two complimentary analytical approaches: one requiring concomitant data collection, the other relying on statistical theory. One method defines a minimal clinically important difference (MCID) using a patient global rating of change on HRQOL scores (2,3,4). The second method uses the statistical entity known as the effect size (ES), which expresses the magnitude of effect in terms of lhe distributional standard deviation (SD) (5). Other techniques have arisen but they are generally based around the MClD or ES approaches.
We describe and evaluate each of the techniques that have particular applicability for assessing the relevance of HRQOL changes in clinical trials. As will be seen, the encouraging conclusion is that the methods seem to provide comparable results. We conclude with recommendations for reporting the clinical significance of HRQOL data to facilitate interpretation of clinical trial results.
THE MINIMAL CLINICALLY IMPORTANT DIFFERENCE APPROACH
Juniper and colleagues (2,3,4) define a MCID as, “the smallest difference in score … which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management.” The MCID approach relates a patient’s own perception of what a meaningful change is in his/her health status to a corresponding change in his/her HRQOL score (4). This approach is often referred to as the “anchor-based approach” since the clinical significance of HRQOL changes is “anchored” to another endpoint (the patient’s own self-assessment of health status change since treatment).
Jaeschke et al. (6) were the first to address what would constitute, on a given self-report instrument, an MCID. They used the Chronic Respiratory Questionnaire and the Chronic Heart Failure Questionnaire, which assess three domains (dyspnea, fatigue, and emotion), via a seven-point Likert scale. Seventy-five patients receiving treatment for respiratory problems or receiving treatment after heart failure were asked a “global rating of change” question about how they felt their shortness of breath (dyspnea), fatigue, and emotional health had changed since treatment started. Patients were classified using the global rating of change questions into four groups: “no change,” “minimum,” “moderate,” and “largest,” for each domain. The mean change in the three domains ranged from: 0.43 to 0.64 for the minimum group; 0.81 to 0.96 for the moderate change group; and 0.86 to 1.47 points for the largest change group. Jaeschke et al. concluded from these results that an MCID was roughly half a point on a seven-point scale.
A related study was carried out by juniper et al. (2) to determine if the notion of an MCID of 0.5 point on a seven-point Likert scale held true for a newly developed asthma HRQOL instrument. Patient global rating of change scores were classified on four asthma domains. Patients who were categori/.ed by the global rating of change as “no change,” “small change,” “moderate change,” and “large change” reported average change in HRQOL scores of 0.11, 0.52, 1.03, and 2.29 points, respectively. Since the average for those in the small change category for each item in the four domains was 0.52, juniper et al. concluded that their results supported the notion of the MCID as about half a point on a seven-point scale.
More recently, Osoba et al. (7) used the MCID approach on the European Organisation for Research and Treatment of Cancer’s Quality of Life Questionnaire (QLQ-C30). Patients were asked to assess their level of perceived change in physical, emotional, and social functioning and in global HRQOL since having received treatment for either breast cancer or small-cell lung cancer. Classifying patients into different groups of change (none, little, moderate, very much), mean scores on the QLQ-C30 were calculated based on a O- to 100-point scale. The mean change in QLQ-C30 scores for patients reporting a little change ranged from 5 to 10 units for the 4 domains. The MCID definition of a 0.5-point shift on a 1 to 7 scale corresponds to roughly 8 units of change on a O- to 100-scale. These results supported the original MCID definition.
An alternative formulation of the MCID approach to assessing clinical significance in clinical trials is to ask patients directly if the treatment experience was worth the time and effort. For example, one could ask the following questions:
1. Was it worth participating in this clinical trial? Yes/No or Hlile/Moderale/Lot.
2. Would you do it again if you had it to do over? Yes/No.
3. Do you think your QOL has been improved by participating in this trial? Yes/No.
4. Is you r satisfaction (or lack of satisfaction) with this trial related to the treatment outcome? Yes/No.
5. Would you recommend participating in clinical trials to others? Yes/No.
These questions have been referred to collecLively as “Was it Worth it” items (8). For analysis purposes, clinical significance can be simply assessed by comparing the proportion of patients in each treatment group who report a worthwhile (and, therefore, inherently clinically significant) treatment effect. This method has seen widespread informal application, but little formal testing or validation work has been done. This is an area for further research.
Another modification of the MCID approach is to associate the HRQOL scores with an objective measure of health status, which is often available in clinical trials. An example of this approach related changes on the St. George’s Respiratory Questionnaire to a six-minute walk test (9,10,11). A six-minute walk test is often used to test the functional status of patients with chronic obstructive pulmonary disease as a reliable and valid objective assessment of mobility. A 6% increase in the walk test was defined as an improvement in functioning, and then used to determine the cut-off whereby the St. George’s Respiratory Questionnaire could best predict responders (sensitivity) and nonresponders (specificity) to treatment. The best sensitivity/specificity split on the St. George’s Respiratory Questionnaire was at a four-point change, and was hence defined as a clinically significant difference within or between populations. The four-point criterion has since been used to report the clinical meaningfulness of changes on St. George’s Respiratory Questionnaire in two studies (12,13).
Although the use of assessments such as the six-minute walk test to determine improvement are seemingly more objective than a subjective assessment such as the global rating of change, the percentage change required in the six-minute walk test to indicate clinically meaningful improvement in function has not been established. The 6% increase used by Jones et al. was arbitrary. No explanation was given as to why this cut-off was chosen beyond clinician opinion.
THE EFFECT SIZE APPROACH
An ES summarizes the amount of overlap between two distributions. This approach to assessing clinical significance is often inferred when discussing the “distribution-based approach” in the literature. ES is calculated as the difference in means of two distributions divided by the pooled SD. A large ES indicates a small overlap of the distributions while a small ES indicates a large overlap. Cohen (5) defined small, medium, and large ESs as 0.2, 0.5, and 0.8 times the standard deviation, respectively. Typically, an estimate of the SD is derived from the literature or pilot studies. A moderate ES is generally considered clinically significant for any end-point (8).
The ES approach, as a distribution-based method, holds intrinsic appeal to statisticians and psychometricians. Power calculations for a wide variety of endpoints are routinely calculated via the ES approach (14,15). Sloan et al. have demonstrated the applicability of the approach for assessing the clinical significance of HRQOL endpoints in many oncology clinical trials (16).
The ES method is helpful in determining the clinical significance of changes in health status measures by using benchmarks (17). Kazis et al. compared the established treatment of gold therapy with results from a non-steroidal, antiinflammatory drug (NSAID) study using the nine domains of the Arthritis Impact Measurement Scale (17). The ESs for the placebo group were used as a benchmark for minimal effectiveness, and the ESs obtained from the gold treatment group as substantial change. They calculated the ESs for the NSAID group and found that the ESs were higher than the placebo group and similar to those found for the gold treatment. By comparing ESs it was shown that the new treatment was as effective (if not better) than the more traditional treatment and therefore, by implication, that the change in health status measure in the NSAID group was clinically meaningful. Furthermore, by using this approach, if statistically significant changes were shown on the health status measure, but the ES of this change was no belter than placebo, it could have been concluded that although the change was statistically significant it was not clinically relevant.
Distribution may have an important influence on ES. If the HRQOL measure has a highly skewed distribution then the use of a nonparametric test is recommended, that is, medians and measures of dispersion such as interquartile range rather than means and SDs.
A SIMPLE POWER CALCULATION APPROACH: THE EMPIRICAL RULE EFFECT SIZE METHOD
This empirical rule effect size (ERES) method is a direct modification of the ES approach (16). It is intended to circumvent the problem of obtaining an estimate of the SD based solely on previous, potentially inapplicable, clinical circumstances.
The empirical rule, a famous truism of statistical theory, states that approximately 99% of any distribution will fall within 3 standard deviations of the mean (8). Hence, the range of any HRQOL tool’s distribution is roughly equivalent to six standard deviations. Based on an assumption that a HRQOL tool can be transformed into a theoretical range of O to 100, an initial estimate of the SD for the HRQOL tool is found to be 16.7% (ie, 100/6). Sloan et al. (16) reviewed various HRQOL instruments (eg, SF36, QLQ-C30, and FACT instruments-FACT-B, FACT-C, FACT-L) in order to demonstrate that the estimate of 16.7% was valid for HRQOL instruments. The estimates from the literature confirmed the hypothesis that SD estimates were usually 17% of the range of scores for the instrument (8).
The ERES method continues by combining this initial estimate of 16.7% for the SD with Cohen’s classification system for small, moderate, and large ESs. A small ES of 0.2 times the SD translates to 3 % of the HRQOL tool’s theoretical range. A moderate ES of 0.5 times the SD is 8% of the HRQOL tool’s theoretical range, and a large ES of 0.8 times the SD is 13% of the HRQOL tool’s theoretical range. Hence, the ERES method defines a clinically significant change as equivalent to 8% of the HRQOL tool’s theoretical range. Sloan (16) provides a series of examples of clinically significant effects for many of the commonly used H RQOL assessment tools. The importance of this approach is that it applies Io any HRQOL tool across any clinical trial. From this general result, units of change required to demonstrate such ESs can be calculated and used in sample size determination, as a benchmark, or for observed clinical significance.
As an example, the Symptom Distress Scale (SDS) (18) has 13 items with a 1- to 5-point scale and a range of potential scores from 13 to 65. Applying the ERES method indicates that a small effect size (3%) would be two units of change, a medium ES (8%) would be four units of change, and a large ES (13%) would be six units of change. In practical terms, a two-point shift indicates a change of one point on every sixth item on the SDS and is arguably not clinically meaningful. However, a four-point shift is a one-point shift on every third item, while six points is a one-point shift on every other item. Intuitively, a change of one point in every other item would appear large, whereas a change in every third item seems a reasonable benchmark for clinical significance.
The ERES approach uses long-established statistical theory to overcome some of the criticism leveled at the ES method in terms of arbitrariness of the effective size classifications. An obvious advantage of the ERES is that it does not require a global rating of change item to classify patients as improved or not improved. Thus, the criticisms previously identified with using such a global assessment to ascertain clinically significant changes do not apply to ERES or ES.
Thus, even in situations where all you know about a HRQOL questionnaire is an average score and SD, a priori statements about the number of units of change required indicating clinical significance can still be estimated. This magnitude of change can then be used to determine sample size for a given study. Use of the ERES method does not necessarily preclude supplementation by the MCID approach. The global rating of change question could still be used to confirm the MCID approach and the units of change determined a priori using the ES/ERES approach.
DISCUSSION: STRENGTHS AND WEAKNESS OF THE MCID, ES, AND ERES APPROACHES
The MCID is a simple, and arguably the most recognized, approach for assessing clinical significance of HRQOL change scores. However, there are a number of methodological issues that need resolution.
There is no standard format for the global rating of change question. Indeed, this question has not undergone validity or reliability testing (19). One study used two different global rating of change questions that dealt with:
1. Assessing how well a patient’s asthma is controlled, and
2. A global question for change in asthma (20).
The two different global rating of change questions produced different MCID results. For example, the symptom domain had an average change on the asthma-global question of 0.42 and 0.78 on the asthma-control question. The phrasing of the global rating of change question, hence, has the potential to influence the definition of what is clinically meaningful via the MCID approach.
Different authors have used different cutoff points for minimum change (2,4,6,7,21). Jacschke (6) used one to three units of change on the global rating of change scale to represent minimum change, while juniper (2) used two to three units of change. Although the difference in classification only affects a small proportion of respondents, a precise definition of the cutoff point for minimum change would facilitate the generalizability of this technique.
There are pragmatic challenges to the MCID method as well. The addition of a global rating of change question may be impractical for clinical trials, especially if the instrument has more than five domains. One way around this is to choose selected domains that are assumed to be most sensitive to the treatment under investigation. It is unlikely that such information will be known in many clinical trial settings in advance. It may also be that one general global rating of change can be used for all domains, but this requires further investigation.
A further potential weakness of the MCID approach is the circular reasoning inherent in using a global rating of change question to derive clinically significant change definitions. If we want to find out whether a change is clinically significant, why not just ask the patient? If the patient tells us that a significant change has occurred, it begs the question as to whether we need to apply an abstract number to the concrete concept that some change has occurred. Rather than trying to relate the patient’s perception that a change has occurred to the average change in an HRQOL tool, it is just as reasonable to simply use the indicator variable to assess whether HRQOL has changed across the population.
The MCID technique has generated considerable debate about its strengths and weaknesses, primarily because it is the most widely quoted technique for assessing clinical significance of HRQOL changes. More research addressing some of the issues that have been highlighted above is necessary. Once these issues have been fully explored for a wider variety of HRQOL instruments, the veracity of the 0.5-point change on a seven-point scale as a gold standard may be achievable. Until such time, researchers may well be advised to include a global rating of change-type question(s) in their clinical trials and/or use a clinical endpoint to anchor the level of health status change to HRQOL changes.
The ES method has the major strength of simplicity of application. It has been criticized as being a function of sample size, although the argument is largely fallacious and could be applied to the MCID method as well (19). There is a basic discomfort among some clinicians with the seeming arbitrariness of applying a statistical process to define clinical significance. More work is needed to demonstrate the generalizability across a broad spectrum of disease sites and HRQOL assessment tools.
The major strength of the ERES method is applicability across a broad spectrum of HRQOL tools and clinical situations without supplementary data collection. The number of domains measured or amount of data collected does not add additional burden to the ascertainment of a clinically significant difference via the ERES method. Indeed, one could apply the method with very little information about the HRQOL instrument involved. Sloan demonstrated that the method is robust across a wide variety of distributional assumptions (16).
The simplicity of the ERES method gives rise to some challenges and questions. For some HRQOL questionnaires, the theoretical range of the instrument is rarely observed in its entirety when applied to a clinical situation. For example, although the FACT-G ranges from 33 to 132, anyone who would truly score 50 would likely not be able to participate in a clinical trial. Hence, one must modify the theoretical range to perhaps more practical limits before calculating the ERES estimate for the SD as 16.7% of the range. Similarly, truncated distributions where the patient population is homogeneously seriously ill or relatively healthy can be accommodated by incorporating this knowledge into the definition of the theoretical range.
Another basic challenge for the ERES method is its relative newness. More empirical evidence is needed before widespread acceptance of this method will occur. It has been argued that Cohen’s moderate effect size is a more conservative estimate of clinical significance than MClD’s minimal setting. The differences to date would seem to be relatively minor, but more studies are needed to justify this general claim.
A UNIFICATION PROPOSAL: THE HALF-SD BENCHMARK FOR CLINICAL SIGNIFICANCE
From the discussion above, it could be argued that using half an SD to determine the level of change is sufficient to show a minimum clinically significant change in HRQOL scores. This originates from the fact that the ES method suggests that a moderate effect size is half an SD. The MCID-based clinically significant change of 0.5 per one- to seven-point Likert-type item, proposed by juniper and colleagues, equates to a half an SD shift when the ERES method is applied. Examples in the literature add further credence to the argument. Cclla has suggested a 10-point change in the FACT-G scale as being clinically important (22). Translating the theoretical range for the FACT-G of 112 points to a 0-to 100-point scale with a standard deviation typically observed to be around 16 points produces a clinically significant difference to be defined as roughly half an SD. Feinstein (23) eloquently expressed, via an alternative and complicated method, based on correlation coefficients, that 0.5 SDs is a reasonable benchmark. Thus, with these values being consistent across a number of approaches and studies, the half-SD could be considered a “unifying theory” for producing a ballpark estimate for clinical significance.
There are a number of issues that arise from the proposal of such a general benchmark. Would a 0.5 SD shift represent the same magnitude of effect for mild and severe patients? Recent research suggests that severity of illness is indeed important, finding that patients with worse health required larger numerical gains on the physical function domain of SF36 than patients with better initial health (24). Similarly, is a half SD change at the extreme values of a scale the same as a half SD change in the middle of the scale? Finally, does a 0.5 SD shift indicate the same clinical significance across different disease groups?
The issues of the preceding paragraph apply to all methods for assessing clinical significance. While it would be appealing to have a single rule that applied to absolutely every clinical situation without modification, it is also unrealistic. None of the above questions, however, render any of the methods for determining clinical significance invalid. The questions need only to be seen for what they are-supplementary information gleaned from specific clinical environments. Guidelines for clinical treatments can generally be specified in a succinct manner, but often need to be modified when applied to idiosyncratic or unique clinical settings. The same, no doubt, will be true for the MCID, ES, and ERES methods for assessing clinical significance. The most encouraging sign, however, is that all methods seem to converge to the same general neighborhood. If this claim can be substantiated with further empirical data, then the choice of method will become one of mere logistical convenience.
REPORTING THE CLINICAL HRQOL ASSESSMENT IN CLINICAL TRIALS
Having described alternative approaches for assessing clinical significance, we now profer some suggestions for reporting results. Many clinical papers contain no more than a single paragraph detailing the methodology results and clinical significance of the HRQOL assessment. Ideally, the following information on clinical significance should be explicitly documented in all clinical trials that contain reports for an HRQOL assessment.
1. Specify the tools used, including the theoretical range of raw scores and any transformations,
2. Define what the investigators have decided is a clinically important effect size for the HRQOL domain assessed. Provide previous relevant estimates of standard deviations, and
3. Specify the power available for each assessment tool to detect the defined clinically significant effect.
1. Present differences (between groups or over time) and standard deviations observed for all instruments. Indicate which differences are considered to be clinically significant, and
2. Provide summaries reflecting the percentage of scores that reached the threshold of clinical significance.
1. Discuss whether the effects arc as large as anticipated, or seen previously, and why, and
2. Describe the potential clinical implications of the HRQOL findings.
If these elements are consistently incorporated into the interpretation of journal articles, our understanding of HRQOL results will improve.
There are, at present, two generally accepted approaches to assessing clinically meaningful changes in HRQOL scores in a clinical trial setting: anchor- and distribution-based approaches. These techniques are sometimes seen as competitive approaches. It would appear, however, that they can be used to complement each other. By using ES, it has been shown that 0.5 of an SD may be the minimum criteria for assessing clinically significant change. This is supported by the MClD with an 0.5-unit change representing half an SD change. It is also supported by other investigations (7,19,23).
At present, as with so many aspects of HRQOL research, there are many questions but no clear answers around this issue. For example, Cella (22) and others summarize the state of the art/science of clinical significance calculation by saying it is too complex to think that a single approach will ever be uniformly powerful or accepted. In such a situation, it is important to offer a menu of approaches along with some general guidelines so that the researcher may make intelligent choices among the options. More validation of these approaches is ongoing in order to establish that these criterion can be recommended for application in all clinical trials.
A vital component of assessing clinical significance is to make the method accessible to researchers and clinicians. For example, stating that we will use a half SD as a benchmark for clinical significance is not sufficient. It is an easy task, however, to report clinical significance based on the original scale units. For example, as specified previously, the clinician need only be informed that a change of 10 units on the raw FACT-G summated score or 10% of the range of the tool is a clinically significant effect. Hence, presenting the proportion of the patients in placebo and treatment groups who achieved a clinically significant effect is a simple method of communicating results.
The gold standard for determining the clinical significance of HRQOL results remains, and must remain, within the purview of the clinical investigator and/or the patient wherever possible. Applying abstract numbers to a complex concept combining the various viewpoints is likely to be too complex to ever be able to be described succinctly in a unifyingly simple approach that satisfies all situations. One can propose, as we have in this paper, a simple method that has broad application, particularly in the presence of scant preliminary information. Just as every approach to clinical theory must be adapted and modified to fit certain situations, so must any approach for assessing the clinical significance of HRQOL scores be taken as a guideline, not a rigid rule to be applied blindly. The key point here is that simple methods are vital to the success of assessing HRQOL in clinical trials to avoid overly complicating the scientific process. The alternative of just ignoring HRQOL as too difficult to be achieved is unpalatable.
The methods discussed in this paper are intended largely for the case where detailed information on clinical significance is not available or forthcoming from the patient or clinician. This is likely to be the situation in the majority of HRQOL investigations. Nonetheless, attempting to elicit opinions from patients and/or physicians as to the degree of impact on HRQOL various results would have in terms of the number of questions answered differently or number of points apart on a scale people need to be a priori is strongly encouraged. Indeed, Bellamy et al. (25,26) have successfully used a “Delphi panel” of clinicians to determine MCID for a number of measures.
In conclusion, the most encouraging sign of this research is the consistency of results regardless of the method applied to define clinical significance. The theory-based ES/ERES methods provide a ballpark estimate for clinical significance that is identical, for practical purposes, to that observed with the empirically-based MCID approach (16). The variations in the results across methods and settings are well within the bounds of measurement error, and therefore, arguably ignorable. Indeed, the difficulty in identifying what is a clinically significant change in HRQOL is reminiscent of a famous reference made regarding the definition of pornography in saying that, while it was difficult to define, “You know it when you see it.” The ultimate practical finding of this work is that, in the absence of further knowledge, a half SD can be used as a reasonable initial estimate of a clinically significant change.
1. Wilke RJ, Click HA, McCarron TJ, el al. Quality of life effects of alprostadil therapy for erectile dysfunction. J Urol. 1996;157:2124-2128.
2. juniper EF, Guyatt GH, Willam A, Griffith LE. Determining a minimal important change in a disease-specific Quality of Life Questionnaire. J Clin Epidemiol 1994;47:8f -87.
3. juniper EF. The value of quality of life in asthma. EurRespirRev. 1997;7:333-337.
4. juniper EF. Quality of life questionnaires: does statistically significant = clinically important? J Allergy Clin lmmunol. 1998;102:16-17.
5. Cohen J. Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Academic Press; 1988.
6. Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989; 10:407-415.
7. Osoba D, Rodrigues G, Myles J, Zee B, Pater J. Interpreting the significance of changes in healthrelated quality-of-Iifc scores. J Clin Oncol. 1998; 16:139-144.
8. Sloan J, Symonds T. Health related quality of life: When does a statistically significant change become clinically significant? Presented at the ISOQOL Educational Workshop, Washington, DC, January, 2001.
9. Guyatt GH, Sullivan MJ, Thompson PJ, et al. The 6-minute walk: a new measure of exercise capacity in patients with chronic heart failure. Can Med Assoc J. 1985:132:919-923.
10. Bittner V, Weiner DH, Yusuf S, et al. Prediction of mortality and morbidity with a 6-minute walk test in patients with left ventricular dysfunction. SOLVD Investigators. JAMA. 1993:270: 1702-1707.
11. Jones PW, Quirk FH, Baveystock CM. The St George’s Respiratory Questionnaire. Respir Med. 1991-85 Suppl B:25-31; Discussion 33-37.
12. Jones PW. Quality of life, symptoms and pulmonary function in asthma: long-term treatment with nedocromil sodium examined in a controlled multicentre trial. Nedocromil Sodium Quality of Life Study Group. EurRespir J. 1994:7:55-62.
13. Jones PW, Bosh TK. Quality of life changes in COPD patients treated with salmeterol. Am J Respir Grit Care Med. 1997:155:1283-1289.
14. Pocock SJ. Statistical and ethical issues in monitoring clinical trials. Stat Med. 1993; 12(15-16): 1459-1469.
15. Senn SJ, Auclair P. The graphical representation of clinical trials with particular reference to measurements over time. Slat Med. 1990; Nov;9 (11): 1287-1302.
16. Sloan JA. Asking the obvious questions regarding patient burden. J Clin Oncology. 2002;20:4-6.
17. Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care. 1989; 27:8178-189.
18. McCorkle R, Young K. Development of a symptom distress scale. Cancer Nurs. 1978:1:373-378.
19. Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol. 1997:50:869-879.
20. Barber BL, Santanello NC, Epstein RS. Impact of the global on patient perceivable change in an asthma specific QOL questionnaire. Qual Life Res. 1996:5:117-122.
21. Redelmeier DA, Guyatt GH, Goldstein RS. Assessing the minimal important difference in symptoms: a comparison of two techniques. J Clin Epidemiol. 1996:49:1215-1219.
22. Cella DF. Quality of life outcomes: measurement and validation. Oncology (Huntingt). 1996;10: 233-246.
23. Feinstein AR. Indexes of contrast and quantitative significance for comparisons of two groups. Stat Med. 1999:18:2557-2581.
24. Stucki G, Daltroy L, Katz JN1 Johannesson M, Liang MH. Interpretation of change scores in ordinal clinical scales and health status measures: the whole may not equal the sum of the parts. J Clin Epidemiol. 1996;49:711-717.
25. Bellamy N, Anastassiades TP, Buchanan WW, et al. Rheumatoid arthritis antirhcumatic drug trials. III. Setting the delta for clinical trials of antirhcumatic drugs results of a consensus development (Delphi) exercise. J Rheumatol. 1991;18:1908-1915.
26. Bellamy N, Carette S, Ford PM, et al. Osteoarthritis antirheumatic drug trials. III. Setting the delta for clinical trials-Results of a consensus development (Delphi) exercise. J Rheumatol. 1992;19: 451-457.
Jeff Sloan, PhD
Cancer Center Statistics,
Mayo Clinic, Rochester,
Tara Symonds, PhD
Outcomes Research, Pfizer
LTD, Kent, United Kingdom
Delfino Vargas-Chanes, PhD
Cancer Center Statistics,
Brooke Fridley, MS
Department of Statistics,
Iowa State University,
Jeff A. Sloan, PhD,
Cancer Center Statistics,
Department of Health
200 First Street SW,
Rochester, MN 55905
Copyright Drug Information Association 2003
Provided by ProQuest Information and Learning Company. All rights Reserved