Implications of lack of concordance among oncologists for evidence-based medicine derived clinical guidelines

Implications of lack of concordance among oncologists for evidence-based medicine derived clinical guidelines

Elfenbein, Gerald J

Probably the most important thing my first mentors at Johns Hopkins, Drs. Albert Owens and George Santos, told me in 1967 was to avoild tilting at windmills, like Don Quixote, because there were just too many of them. Sage advice, but sometimes advice thalt cannot be followed. Thirty-five years later, I participated in two separate oncology survey programs1 in which case scenarios were forwarded by facsimile: How would I treat a given patient at a specific point in his/her malignant disease course? I was asked to choose one from several preset answers. The selected respondents were supposedly “experts” in medical oncology. They could be considered representative of the oncologists who sit on panels and establish clinical guidelines using evidence-based medicine. One would think that the results from these surveys would point to a single “best” treatment response in each circumstance because evidence-based medicine would define “best”. But this was not the case. I accumulated 54 consecutive reports from these surveys: for treatment decisions, concordance among clinical oncologists was low (Figure 1). On average, the first choice answer was selected only 50% of the time. The second choice answer was selected only 50% of the frequency of the first choice, or 25% of the time, on average. This relationship holds for the third, fourth, fifth, and sixth choices, and describes a hyperbolic function with a zero asymptote. Although a majority or often a plurality “rules”, it is perfectly unclear to me what the “correct” answers were.

As we practice medicine on a daily basis, we have grown accustomed to a high level of concordance in the interpretation both of imaging studies and of preparations of pathologic specimens. Furthermore, we increasingly expect a single “best answer” for each case scenario. We ascribe to the fundamental belief that evidence-based medicine provides clinicians with practice guidelines to make appropriate decisions for their patients in each circumstance, which should be reflected by concordance in surveys.

What does the lack of a high level of concordance in treatment decisions by oncologists teach us about these beliefs and expectations?

The trivial answers are that the scenarios may have been inaccurate or incomplete; the survey questions may have been ambiguous; or the answers may have all been inappropriate or even incorrect. As tempting as these answers, in all probability none are correct. What is more likely is that evidence-based medicine has limitations that have not been explored thoroughly nor presented clearly and that practice guidelines, as derivatives of evidence-based medicine, are not sufficiently reliable to assist clinicians in making decisions for their patients in many scenarios with a high degree of uniformity.

Evidence-based medicine has established a hierarchy of reliability of data published in the literature, with least reliable being case reports and small series without historical controls (evidence level five) and most reliable being large, randomized, controlled trials (RCT) and meta-analyses (evidence level one; see Figure 2). On the surface, this hierarchy seems reasonable. But it should not be considered absolutely authoritative. By their very nature, RCT have very stringent entry criteria, which translate in clinical research to studies of a very restricted subpopulation of patients with a specific disease, i.e., the eligible patients display a very narrow range of variability within the disease spectrum, thus limiting applicability to the rest of the spectrum. Lacking results from patients-at-large, clinicians are forced to extrapolate data to make clinical decisions. Extrapolation is fraught with a great chance of error. Further, the design of RCT (not their statistical underpinning) may be flawed (such as the selection of the control arm treatment), leading to misinterpretations of the data and subsequent conclusions about what to do, which potentially may be the opposite of what actually should be done. Moreover, RCT may be terminated too early to find a difference (beta error) or, even worse, reported prematurely because eager study investigators want to share their findings.3 Finally, RCT may select endpoints that are inappropriate for the scenario being studied. An example of such an endpoint is overall survival.

For a scenario in which the patient is terminally ill and no other therapy is of any known value in prolonging life, testing a new drug or treatment modality in a RCT compared to no therapy at all and then observing overall survival for improvement to justify using the drug or treatment is quite reasonable. However, when the patient is receiving first line therapy for a malignancy (such as adjuvant therapy for high risk, primary breast cancer) that has a natural history of prolonged duration and if salvage therapy can prolong life after relapse has occurred, then examining overall survival is not at all justifiable because overall survival is determined not just by first line therapy but also by subsequent salvage therapies as well as other determinants. Overall survival is at least one step removed from first line therapy and requires a very long observation period, perhaps 5 to 10 years, before it may be evaluated properly to minimize beta error. Disease-free survival is the more appropriate endpoint because it is directly related to first line therapy. Moreover, it defines the treatment-free interval for patients who do relapse and the population of patients that have the potential of being cured. What needs to be established is a hierarchy of endpoints that are directly attributable to the therapy under investigation and not to potentially multiple, intervening steps (Figure 3). If, for instance, disease-free survival is equivalent for both arms of a study, then the next level of evaluation could be quality of life, followed by cost of therapy.

Finally, the durability of validity of the conclusions drawn from RCT (also called phase III trials) and meta-analyses has come under close scrutiny.5 This evaluation found that phase II (levels three and four) trial findings are at least as durable, if not more, than phase III trials and meta-analyses (levels one and two). If evidence-based medicine aims to improve the quality of care via guidelines that would result in concordance of decisions and uniformity of treatment, then evidence-based medicine cannot and will not succeed for at least one very good reason. There are far too many clinical scenarios (cf. Don Quixote’s windmills) than evidence-based medicine could hope to address by RCT, let alone meta-analyses. Researchers have neither the time nor money to study all the scenarios. Moreover, when evidence-based medicine concludes that “there is no evidence to support the use of a specific treatment” or “a specific treatment has no value,” in many cases this is because there are no level one or two studies and level three and four studies have been ignored or, worse, discounted as “weak” science. Consequently, these statements often lead to denial of insurance funding of treatment because the treatment may be not “appropriate”. Let’s not reproduce Galileo’s scenario.6 He did not have an RCT to prove that the planets revolved about the sun not the earth, and it cost him dearly. The most important leaps forward in knowledge are often based on level three and four evidence from phase II studies.

We need a more mature attitude towards and better understanding of evidence-based medicine, so that it may serve, not disserve, our patients. The clinical guidelines generated from evidence-based medicine will be of greatest utility only when they are put into proper perspective, because uniformity of treatment does not assure quality of care. Patients are individuals with unique clinical scenarios; their treatment plans need to be guided by all of the medical literature and individualized to their particular circumstances.


1. Network for Oncology Communication & Research and Oncology CaseLink, Triple I Oncology.

2. Sackett DL. Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest 1989;95 (2 Suppl): 2S-4S.

3. Peters W, Rosner G, Vredenburgh J, et al. A prospective, randomized comparison of two doses of combination alkylating agents (AA) as consolidation after CAF in high-risk primary breast cancer involving ten or more axillary lymph nodes (LN): Preliminary results of CALGB 9082/SWOG 9114/NCIC MA-13. Proc ASCO 18: 1a, 1999 (abstract).

4. Elfenbein GJ. Clinical trials and survival curves: The shape of things to come. Acta Haematologica 2001;105:188-94.

5. Poynard T, Munteanu M, Ratziu V, et al. Truth survival in clinical research: an evidence-based requiem? Ann Intern Med 2002; 136: 888-95.

6. The New York Times. The 10 ‘most beautiful’ experiments of all time. Providence Sunday Journal, September 29, 2002.

Gerald J. Elfenbein, MD

Gerald J. Elfenbein, MD, is Director, Adele R. Decof Cancer Center; Director, Blood and Marrow Transplant Program, Roger Williams Medical Center, and Professor of Medicine, Boston University School of Medicine.


Gerald J. Elfenbein, MD

Adele R. Decof Cancer Center

Roger Williams Medical Center

825 Chalkstone Avenue

Providence, RI 02908

Phone: (401) 456 6565

Fax: (401) 456 6793


Copyright Rhode Island Medical Society Aug 2003

Provided by ProQuest Information and Learning Company. All rights Reserved