Chapter 3. Decision-Making in Clinical Medicine

Decision-Making in Clinical Medicine: Introduction
To a medical student who requires 2 hours to collect a patient's history and perform a physical examination and several additional hours to organize that information into a coherent presentation, an experienced clinician's ability to decide on a diagnosis and management plan in a fraction of the time seems extraordinary. What separates the experienced clinician's performance from that of the novice is an elusive quality called "expertise." The first part of this chapter will provide a brief introduction to what is known about the development of expertise in clinical reasoning.
Equally bewildering to the student are the proper use of diagnostic tests and the integration of the results into the patient's clinical assessment. A novice medical practitioner typically uses a "shotgun" approach to testing, hoping to hit a target without knowing exactly what that target is. The expert, in contrast, usually has a specific target in mind and adjusts the testing strategy to it. The second part of the chapter will review briefly some of the crucial basic statistical concepts useful in the interpretation of diagnostic tests. Quantitative tools available to assist in clinical decision-making also will be discussed.
Evidence-based medicine (EBM) is the term used to describe the integration of the best available research evidence with clinical judgment and experience as applied to the care of individual patients. The third part of the chapter will provide a brief overview of some of the tools of EBM.
Brief Introduction to Clinical Reasoning
Clinical Expertise
It is surprisingly difficult to define clearly what is meant by "clinical expertise." Chess has its masters, music its virtuoso performers, and athletics its Olympians. But in medicine, once training is complete and the boards are passed, there are no further tests or benchmarks of performance or ability that can be used to identify those who have attained the highest level of abilities in their clinical roles. Of course, there are always a few clinicians who are believed by their colleagues to have special problem-solving abilities: the "elite" who are consulted when particularly difficult or obscure cases have baffled everyone else. But for all their expertise, such doctors typically cannot explain what processes and methods they use to achieve their impressive results. Furthermore, it is not clear that their diagnostic virtuosity can be generalized. In other words, an expert on hypertrophic cardiomyopathy may be no better (and possibly worse) than a first-year resident at diagnosing and managing a patient with neutropenia, fever, and hypotension.
Broadly construed, clinical expertise includes not only cognitive dimensions and the integration of verbal and visual cues or information but also complex motor skills that are required in the performance of various invasive and noninvasive procedures and tests. In addition, the ability to communicate effectively with patients and work effectively with members of the medical team could be included as important aspects of "the complete package" of expertise in medicine. In this chapter, however, the focus will be on the cognitive elements (clinical reasoning), particularly as they relate to diagnosis. This focus is driven by two factors. First, the most important "actions" in clinical medicine are not procedures or prescriptions but the judgments (both diagnoses and treatment choices) from which all other aspects of medical care flow. Second, although the research on medical expertise is relatively sparse overall, it is best developed in the area of diagnostic decision-making. Much less work has been done on treatment decisions or the technical skills involved in the performance of procedures.
The obvious difficulty involved in the study of clinical reasoning is that it takes place in the heads of doctors and is therefore not readily observable. Further, doctors may not even be aware of how they reason in many cases, and so they may be unable to describe the processes they use. To overcome this difficulty, one line of research has focused on how doctors should reason diagnostically rather than on how they actually do reason. In addition, because of the difficulties of empirical research in this area, much of what is known about clinical reasoning comes from empirical studies of nonmedical problem-solving behavior. The field has been influenced by important work from cognitive psychology, sociology, medical education, economics, informatics, and decision sciences. However, because of this diversity of perspectives, no single integrated model of clinical reasoning exists, and not infrequently, different terms and models are proposed for similar phenomena.
Intuitive versus Analytic Reasoning
One useful contemporary model of reasoning (dual-process theory) distinguishes two general systems of cognitive processes. Intuition (System 1) provides rapid effortless judgments from memorized associations—for example, African-American women and hilar adenopathy equals sarcoid—or from the reduction of complex data by means of pattern recognition and other heuristics. Typically, the clinician is unable to say how those judgments were formulated. Analysis (System 2), the other form of reasoning in the dual-process model, is slow, methodical, and effortful. These are, of course, idealized extremes of the cognitive continuum. The way these systems interact in different decision problems and the way they differ between experts and novices remain the subject of considerable debate. Much work has also been done to identify how each of these systems can lead to errors in judgment.
Pattern recognition is a complex cognitive process that appears largely intuitive. One can recognize people's faces, the breed of a dog, or the model of an automobile without necessarily being able to say what specific features prompted the recognition. Analogously, an experienced clinician often can recognize the pattern of a diagnosis she or he is very familiar with after a very short amount of time with the patient. The student, who does not have that stored repertoire of diagnostic patterns, must use a more laborious analytic approach along with much more intensive data collection to reach the diagnosis.
The following three brief scenarios of a patient with hemoptysis present three distinct patterns:

A 46-year-old man presents to his internist with a chief complaint of hemoptysis. He is otherwise healthy, is a nonsmoker, and is recovering from an apparent viral bronchitis. For this patient, the pattern would suggest that the acute bronchitis is responsible for the small amount of blood-streaked sputum the patient has observed. In this case, a chest x-ray may provide sufficient reassurance that a more serious disorder is not present.
In the second scenario, a 46-year-old patient with the same chief complaint who has a 100-pack-year smoking history, a productive morning cough, and episodes of blood-streaked sputum fits the pattern of carcinoma of the lung. Consequently, along with the chest x-ray, the physician obtains a sputum cytology examination and refers this patient for a chest CT scan.
In the third scenario, a 46-year-old patient with hemoptysis who is from a developing country is evaluated with an echocardiogram as well, because the physician thinks she hears a soft diastolic rumbling murmur at the apex on cardiac auscultation, suggesting rheumatic mitral stenosis.

The primary mistake that can result from relying on the free use of pattern recognition in diagnosis is premature closure: concluding that one already knows the correct diagnosis and therefore failing to complete the data collection that would demonstrate the lack of fit of the initial pattern selected. Consider the following hypothetical example. A 45-year-old male patient with a 3-week history of a "flulike" upper respiratory infection (URI) presented to his physician with symptoms of dyspnea and a productive cough. On the basis of the presenting complaint, the clinician pulled out a "URI assessment form" to obtain patient information that could be beneficial in improving the quality and efficiency of care. The physician quickly completed the examination components outlined on this structured form, noting in particular the absence of fever and a clear chest examination. He then prescribed an antibiotic for presumed bronchitis, showed the patient how to breathe into a paper bag to relieve his "hyperventilation," and sent him home with the reassurance that his illness was not serious. After a sleepless night with significant dyspnea unrelieved by breathing into a bag, the patient developed nausea and vomiting and collapsed. He was brought into the emergency department in cardiac arrest and could not be resuscitated. Autopsy showed a posterior wall myocardial infarction (MI) and a fresh thrombus in an atherosclerotic right coronary artery. What went wrong? The clinician decided, on the basis of the patient's appearance, even before starting the history, that the patient's complaints were not serious. He therefore felt confident that he could perform an abbreviated and focused examination by using the URI assessment protocol rather than considering the broader range of possibilities and performing appropriate tests to confirm or refute his initial hypotheses. In particular, by concentrating on the URI, the clinician failed to elicit the full dyspnea history, which would have suggested a far more serious disorder, and he neglected to search for other symptoms that could have directed him to the correct diagnosis.
Cognitive shortcuts or rules of thumb, sometimes referred to as heuristics, are another type of intuitive mental process that can be invoked to understand how experts solve complex problems of the sort encountered daily in clinical medicine with great efficiency. The original work on the use of heuristics in problem solving was done largely in laboratory experiments on psychology undergraduates. The objective of the research program was to test the statistical intuition of those subjects against the rules of statistics to understand how such intuitions could be biased. Hence, discussions of the use of heuristics in decision-making tend to focus more on ways in which their use can lead to errors in judgment than on their successes. Although there are many heuristics of possible relevance to clinical reasoning, only four will be mentioned here.
When assessing a particular patient, clinicians often weigh the probability that the patient's clinical features match those of the class of patients with the leading diagnostic hypotheses being considered. In other words, the clinician is searching for the diagnosis for which the patient appears to be a representative example; this cognitive shortcut is called the representativeness heuristic. This heuristic is analogous to pattern recognition. However, if there are two (or more) competing diagnoses that could explain the patient's symptoms, physicians who use the representativeness heuristic can reach erroneous conclusions if they fail to consider the underlying prevalence of the two competing diagnoses (i.e., the prior, or pretest, probabilities). Consider a patient with pleuritic chest pain, dyspnea, and a low-grade fever. A clinician might consider acute pneumonia and acute pulmonary embolism to be the two leading diagnostic alternatives. Using the representativeness heuristic, the clinician might judge both diagnostic candidates to be equally likely, although doing so would be wrong if pneumonia was much more prevalent in the underlying population. Mistakes also may result from a failure to consider that a pattern based on experience with a small number of prior cases probably will be less reliable than one based on greater experience.
A second commonly used cognitive shortcut, the availability heuristic, involves judgments made on the basis of how easily prior similar cases or outcomes can be brought to mind. For example, an experienced clinician may recall 20 elderly patients seen over the last few years who presented with painless dyspnea of acute onset and were found to have acute MI. A novice clinician may spend valuable time seeking a pulmonary cause for the symptoms before considering and then confirming the cardiac diagnosis. In this situation, the patient's clinical pattern does not fit the expected pattern of acute MI, but experience with this atypical presentation, along with the ability to recall it, can help direct the physician to the diagnosis.
Errors with the availability heuristic can come from several sources of recall bias. For example, rare catastrophes are likely to be remembered with a clarity and force out of proportion to their value—for example, a patient with a sore throat eventually found to have leukemia or a young athlete with leg pain eventually found to have a sarcoma—and recent experience is, of course, easier to recall and therefore more influential on clinical judgments.
The third commonly used cognitive shortcut, the anchoring heuristic, involves estimating a probability by starting from a familiar point (the anchor) and adjusting to the new case from that perspective. Anchoring can be a powerful tool for diagnosis but may be used incorrectly. For example, a clinician may judge the probability of coronary artery disease (CAD) to be very high after a positive exercise thallium test because the prediction has been anchored to the test result ("positive test = high probability of CAD"). Yet, as discussed below, this prediction would be inaccurate if the clinical (pretest) picture of the patient being tested indicated a low probability of disease (e.g., a 30-year-old woman with no risk factors). As illustrated in this example, anchors are not necessarily the same as the pretest probability (see "Measures of Disease Probability and Bayes' Theorem," below).
The fourth heuristic, which might be termed the simplicity heuristic, states that clinicians should use the simplest explanation possible that will account adequately for the patient's symptoms or findings (Occam's razor). Although this is an attractive and often useful principle, it is important to remember that there is no biologic basis for it.
Experienced physicians use analytic reasoning processes (System 2) much more often when the problem they face is recognized to be complex or to involve important unfamiliar elements or features. In such situations, the clinician proceeds much more methodically in what has been referred to as the hypothetico-deductive model of reasoning. From the outset, the expert clinician is generating, refining, and discarding diagnostic hypotheses. The questions she asks in the history are driven by the hypotheses she is working with at the moment. Even the physical examination is focused on specific questions. Is the spleen enlarged? How big is the liver? Is it tender? Does it have any palpable masses or nodules? Each question focuses the attention of the examiner on the exclusion of all other inputs until it is answered, allowing the examiner to move on to the next specific question. Each diagnostic hypothesis sets a context for the diagnostic steps to follow and provides testable predictions. For example, if the enlarged and quite tender liver felt on physical examination is due to acute hepatitis (the hypothesis), certain specific liver function tests should be markedly elevated (the prediction). If the tests come back normal, the hypothesis may have to be discarded or substantially modified.
Negative findings often are neglected but are as important as positive ones in establishing and refining diagnostic hypotheses. Chest discomfort that is not provoked or worsened by exertion in an active patient reduces the likelihood that chronic ischemic heart disease is the underlying cause. The absence of a resting tachycardia and thyroid gland enlargement reduces the likelihood of hyperthyroidism in a patient with paroxysmal atrial fibrillation.
The acuity of a patient's illness can play an important role in overriding considerations of prevalence and the other issues described above. For example, clinicians are taught to consider aortic dissection routinely as a possible cause of acute severe chest discomfort along with MI, even though the typical history of dissection is different from that of MI and dissection is far less prevalent (Chap. 248). This recommendation is based on the recognition that a relatively rare but catastrophic diagnosis such as aortic dissection is very difficult to make unless it is explicitly considered as a diagnostic imperative. If the clinician fails to elicit any of the characteristic features of dissection by history and finds equivalent blood pressures in both arms and no pulse deficits, he may feel comfortable discarding the aortic dissection hypothesis. If, however, the chest x-ray shows a possible widened mediastinum, the hypothesis may be reinstated and a diagnostic test ordered [e.g., thoracic computed tomography (CT) scan, transesophageal echocardiogram] to evaluate it more fully. In nonacute situations, the prevalence of potential alternative diagnoses should play a much more prominent role in diagnostic hypothesis generation.
Cognitive scientists studying the thought processes of expert clinicians have observed that clinicians group data into packets, or "chunks," that are stored in their memories and manipulated to generate diagnostic hypotheses. Because short-term memory typically can hold only 7–10 items at a time, the number of packets that can be actively integrated into hypothesis-generating activities is similarly limited. For this reason, the cognitive shortcuts discussed above can play a key role in the generation of diagnostic hypotheses, many of which are discarded as rapidly as they are formed (thus demonstrating that the distinction between analytic and intuitive reasoning is arbitrary and simplistic but useful nonetheless).
Research into the hypothetico-deductive model of reasoning has had surprising difficulty identifying the elements that distinguish experts from novices. This has led to a shift from examining the problem-solving process of experts to analyzing the organization of their knowledge. For example, diagnosis may be based on the resemblance of a new case to prior individual instances (exemplars). Experts have a larger store of recalled cases, for example, visual memory in radiology. In the more abstract prototype model of knowledge, clinicians do not simply rely on specific cases but have constructed elaborate conceptual networks or models of disease to arrive at their conclusions. That is, expertise involves an increased ability to connect symptoms, signs, and risk factors to one another; relate those findings to possible diagnoses; and identify the additional information necessary to confirm the diagnosis. More recently, fuzzy trace theory has placed greater emphasis on intuitionism in which expertise involves the ability to distill the "gist" or essence of diagnosis by processing less information and discarding extraneous data with an emphasis on memory and meaning or recognition and retrieval.
Although no single theory has emerged to account for the key features of expertise in medical diagnosis, experts have more knowledge about more things and a larger repertoire of cognitive tools to employ in problem solving than do novices. One definition of expertise highlights the ability to make powerful distinctions. Memorization alone is insufficient; instead, expertise involves a working knowledge of the diagnostic possibilities and what features distinguish one from another. What remains less clear is whether there is any didactic program that would allow the accelerated development of a novice into an expert or ensure the same high level of expertise among more experienced physicians. Some current recommendations include using a combined approach to clinical reasoning, that is, emphasizing to students the importance of both conscious deliberative analytic and intuitive pattern recognition nonanalytic reasoning strategies and thus giving students flexibility in applying any particular reasoning strategy to overcome case-specific weaknesses.
Important Modifiers of Clinical Decision-Making
More than a decade of research on variations in clinician practice patterns has shed much light on the forces that shape clinical decisions. These factors can be grouped conceptually into three overlapping categories: (1) factors related to physicians' personal characteristics and practice style, (2) factors related to the practice setting, and (3) factors related to economic incentives.
Factors Related to Practice Style
One of the key roles of the physician in medical care is to serve as the patient's agent to ensure that necessary care is provided at a high level of quality. Factors that influence this role include the physician's knowledge, training, and experience. It is obvious that physicians cannot practice evidence-based medicine (described later in the chapter) if they are unfamiliar with the evidence. As would be expected, specialists generally know the evidence in their field better than do generalists. Surgeons may be more enthusiastic about recommending surgery than are medical doctors because their belief in the beneficial effects of surgery is stronger. For the same reason, invasive cardiologists are much more likely to refer chest pain patients for diagnostic catheterization than are noninvasive cardiologists or generalists. The physician beliefs that drive these different practice styles are based on personal experience, recollection, and interpretation of the available medical evidence. For example, heart failure specialists are much more likely than generalists to achieve target angiotensin-converting enzyme (ACE) inhibitor therapy in their heart failure patients because they are more familiar with what the targets are (as defined by large clinical trials), have more familiarity with the specific drugs (including doses and side effects), and are less likely to overreact to foreseeable problems in therapy such as a rise in creatinine levels or symptomatic hypotension. Other intriguing research has shown a wide distribution of acceptance times of antibiotic therapy for peptic ulcer disease after widespread dissemination of the "evidence" in the medical literature. Some gastroenterologists accepted this new therapy before the evidence was clear (reflecting, perhaps, an aggressive practice style), and some lagged behind (a conservative practice style, associated in this case with older physicians). As a group, internists lagged several years behind gastroenterologists.
An example of the mixed effects on patient outcomes associated with rapid acceptance of new evidence involves the case of adding spironolactone (an aldosterone receptor antagonist) to the drug regimen for patients with systolic heart failure. In a large, well-done clinical trial (Randomized Aldactone Evaluation Study, RALES) published in 1999, this therapy produced a significant reduction in all-cause mortality rates. Over the next 2 years the use of spironolactone increased fivefold in the province of Ontario, Canada. That rapid uptake was associated with a significant increase in the rates of hospital admission for both hyperkalemia- and hyperkalemia-associated deaths. At least some of these adverse effects of using this "evidence-based medicine" appeared to be related to treatment of patients who would not have been eligible for the RALES trial and who had contraindications to the use of the drug.
The opinion of influential leaders also can have an important effect on practice patterns. That influence can occur at both the national level (e.g., expert physicians teaching at national meetings) and the local level (e.g., local educational programs, "curbside consultations"). Opinion leaders do not have to be physicians. When conducting rounds with clinical pharmacists, physicians are less likely to make medication errors and more likely to use target levels of evidence-based therapies.
The patient's welfare is not the only concern that drives clinical decisions. The physician's perception about the risk of a malpractice suit resulting from either an erroneous decision or a bad outcome creates a style of practice referred to as defensive medicine. This practice involves using tests and therapies with very small marginal returns to preclude future criticism if there is an adverse outcome. For example, a 40-year-old woman who presents with a long-standing history of intermittent headache and a new severe headache along with a normal neurologic examination has a very low likelihood of having structural intracranial pathology. Performance of a head CT or magnetic resonance imaging (MRI) scan in this situation would constitute defensive medicine. However, the results of the test could provide reassurance to an anxious patient.
Practice Setting Factors
Factors in this category relate to the physical resources available to the physician's practice and the practice environment. Physician-induced demand is a term that refers to the repeated observation that physicians have a remarkable ability to accommodate to and employ the medical facilities available to them. One of the foundational studies in outcomes research showed that physicians in Boston, where the ratio of hospital beds to patients was higher, had an almost 50% higher hospital admission rate than did physicians in New Haven, despite there being no obvious differences in the resulting health or mortality rate of the cities' inhabitants. The physicians in New Haven were not aware of using fewer hospital beds for their patients, nor were the Boston physicians aware of using less stringent criteria to admit patients. In both cities, physicians unconsciously adopted their practice styles to the available level of hospital beds.
Other environmental factors that can influence decision-making include the local availability of specialists for consultations and procedures; "high-tech" facilities such as angiography suites, a heart surgery program, and MRI machines; and fragmentation of care.
Economic Incentives
Economic incentives are closely related to the other two categories of practice-modifying factors. Financial issues can exert both stimulatory and inhibitory influences on clinical practice. In general, physicians are paid on a fee-for-service, capitation, or salary basis. In fee-for-service, the more the physician does, the more he gets paid. The economic incentive in this case is to do more. When fees are reduced (discounted fee-for-service), doctors tend to increase the number of services provided. Capitation, in contrast, provides a fixed payment per patient per year, encouraging physicians to take on more patients but to provide each patient with fewer services. Expensive services are more likely to be affected by this type of incentive than are inexpensive preventive services. Salary compensation plans pay physicians the same regardless of the amount of clinical work performed. The incentive here is to see fewer patients.
In summary, expert clinical decision-making can be appreciated as a complex interplay between cognitive processes used to simplify and organize large amounts of complex information and physician biases reflecting education, training, and experience, all of which are shaped by powerful, sometimes perverse, external forces. In the next section, we will review a set of statistical tools and concepts that can be useful in making clinical decisions in the presence of uncertainty.

Interpretation of Diagnostic Tests in the Context of Decision-Making
Despite the great technological advances in medicine over the last century, uncertainty remains a key challenge in all aspects of medical decision-making. Compounding this challenge is the massive information overload that characterizes modern medicine. Today's experienced clinician needs access to close to 2 million pieces of information to practice medicine. According to one estimate, doctors subscribe to an average of seven journals, representing over 2500 new articles each year. Of course, to be useful, this information must be integrated with the specific data collected on each patient being cared for. Although computers appear to offer the obvious solution both for management of information and for better quantitation and management of the daily uncertainties of medical care, many practical problems must be solved before computers can be integrated into the clinician's reasoning process in a way that demonstrably improves the quality of care.
Although a fully-integrated computed-based system of diagnosis and management remains a distant possibility, there are tools available now that can assist in aspects of patient management. In addition, understanding the nature of diagnostic test information can help make a clinician a more efficient user of such data. This section of the chapter will review some important concepts related to diagnostic testing.
Diagnostic Testing: Measures of Test Accuracy
The purpose of performing a test on a patient is to reduce uncertainty about the patient's diagnosis or prognosis and to aid the clinician in making management decisions. Although diagnostic tests commonly are thought of as laboratory tests (e.g., measurement of serum amylase level) or procedures (e.g., colonoscopy or bronchoscopy), any technology that changes a physician's understanding of the patient's problem qualifies as a diagnostic test. Thus, even the history and physical examination can be considered a form of diagnostic test. In clinical medicine, it is common to reduce the results of a test to a dichotomous outcome, such as positive or negative, normal or abnormal. In many cases, this simplification results in the waste of useful information. However, such simplification makes it easier to demonstrate some of the quantitative ways in which test results data can be used.
The accuracy of diagnostic tests is defined in relation to an accepted "gold standard," which is presumed to reflect the true state of the patient (Table 3-1). To define the diagnostic performance of a new test, an appropriate population must be identified (ideally, patients on whom the new test would be used), and both the new and the gold standard tests are applied to all subjects (use of an inappropriate population or incomplete application of the gold standard test may lead to biased estimates of test performance). The results of the two tests are then compared. The sensitivity, or true-positive, rate of the new test is the proportion of patients with disease (defined by the gold standard) who have a positive (new) test. This measure reflects how well the test identifies patients with disease. The proportion of patients with disease who have a negative test is the false-negative rate and is calculated as 1 – sensitivity. The proportion of patients without disease who have a negative test is the specificity, or true-negative, rate. This measure reflects how well the test correctly identifies patients without disease. The proportion of patients without disease who have a positive test is the false-positive rate, calculated as 1 – specificity. A perfect test would have a sensitivity of 100% and a specificity of 100% and would completely separate patients with disease from those without it.

Table 3-1 Measures of Diagnostic Test Accuracy

	Disease Status
Test Result	Present	Absent
Positive	True-positive (TP)	False-positive (FP)
negative	false-negative (fn)	true-negative (tn)
Identification of patients with disease
True-positive rate (sensitivity) = TP/(TP + FN)
False-negative rate = FN/(TP + FN)
True-positive rate = 1 – false-negative rate
Identification of patients without disease
True-negative rate (specificity) = TN/(TN + FP)
False-positive rate = FP/(TN + FP)
True-negative rate = 1 – false-positive rate

Calculating sensitivity and specificity requires selection of a threshold value or cut point at or above which the test is considered "positive." For any specific test, as this cut point is moved to improve sensitivity, specificity falls and vice versa. This dynamic trade-off between more accurate identification of subjects with disease versus those without disease is often displayed graphically as a receiver operating characteristic (ROC) curve (Fig. 3-1). An ROC curve plots sensitivity (y axis) versus 1 – specificity (x axis). Each point on the curve represents a potential cut point with an associated sensitivity and specificity value. The area under the ROC curve often is used as a quantitative measure of the information content of a test. Values range from 0.5 (no diagnostic information from testing at all; the test is equivalent to flipping a coin) to 1.0 (perfect test).
In the testing literature, ROC areas often are used to compare alternative tests that can be employed for a particular diagnostic problem. The test with the highest area (i.e., closest to 1.0) is presumed to be the most accurate. However, ROC curves are not a panacea for evaluation of diagnostic test utility. Like Bayes' theorem (discussed below), they typically are focused on only one possible test parameter (e.g., ST-segment response in a treadmill exercise test) to the exclusion of other potentially relevant data. In addition, ROC area comparisons do not simulate the way test information actually is used in clinical practice. Finally, biases in the underlying population used to generate the ROC curves (e.g., related to assessment of the test in individuals unrepresentative of those in whom the test will be used clinically) can bias the ROC area and the validity of a comparison among tests.
Measures of Disease Probability and Bayes' Theorem
Unfortunately, there are no perfect tests. After every test is completed, the true disease state of the patient remains uncertain. Quantifying this residual uncertainty can be done with Bayes' theorem. This theorem provides a simple mathematical way to calculate the posttest probability of disease from three parameters: the pretest probability of disease, the test sensitivity, and the test specificity (Table 3-2). The pretest probability is a quantitative expression of the confidence in a diagnosis before the test is performed. In the absence of more relevant information, it is usually estimated from the prevalence of the disease in the underlying population. For some common conditions, such as CAD, nomograms and statistical models have been created to generate better estimates of pretest probability from elements of the history and physical examination. The posttest probability, then, is a revised statement of the confidence in the diagnosis, taking into account what was known both before and after the test.

Table 3-2 Measures of Disease Probability

Pretest probability of disease = probability of disease before test is performed. May use population prevalence of disease or more patient-specific data to generate this probability estimate.

Posttest probability of disease = probability of disease accounting for both pretest probability and test results. Also called pre-dictive value of the test.

Bayes' theorem computational version:

Bayes' theorem example: With a pretest probability of 0.50 and a "positive" diagnostic test result (test sensitivity = 0.90, test specificity = 0.90):

The term predictive value often is used as a synonym for the posttest probability. Unfortunately, clinicians commonly misinterpret reported predictive values as intrinsic measures of test accuracy. Studies of diagnostic tests compound the confusion by calculating predictive values on the same sample used to measure sensitivity and specificity. Since all posttest probabilities are a function of the prevalence of disease in the tested population, such calculations may be misleading unless the test is applied subsequently to populations with the same disease prevalence. For these reasons, the term predictive value is best avoided in favor of the more informative posttest probability.
To understand conceptually how Bayes' theorem estimates the posttest probability of disease, it is useful to examine a nomogram version of Bayes' theorem (Fig. 3-2). In this nomogram, the accuracy of the diagnostic test in question is summarized by the likelihood ratio, which is defined as the ratio of the probability of a given test result (e.g., "positive" or "negative") in a patient with disease to the probability of that result in a patient without disease.
For a positive test, the likelihood ratio positive is calculated as the ratio of the true-positive rate to the false-positive rate [or sensitivity/(1 – specificity)]. For example, a test with a sensitivity of 0.90 and a specificity of 0.90 has a likelihood ratio of 0.90/ (1 – 0.90), or 9. Thus, for this hypothetical test, a "positive" result is 9 times more likely in a patient with the disease than in a patient without it. Most tests in medicine have likelihood ratios for a positive result between 1.5 and 20. Higher values are associated with tests that are more accurate at identifying patients with disease, with values of 10 or greater are of particular note. If sensitivity is excellent but specificity is less so, the likelihood ratio will be reduced substantially (e.g., with a 90% sensitivity but a 60% specificity, the likelihood ratio is 2.25).
For a negative test, the corresponding likelihood ratio negative is the ratio of the false-negative rate to the true-negative rate [or (1 – sensitivity)/specificity]. The smaller the likelihood ratio (i.e., the closer to 0) is, the better the test performs at ruling out disease. The hypothetical test considered above with a sensitivity of 0.9 and a specificity of 0.9 would have a likelihood ratio for a negative test result of (1 – 0.9)/0.9, or 0.11, meaning that a negative result is almost 10 times more likely if the patient is disease-free than if the patient has disease.
Applications to Diagnostic Testing in CAD
Consider two tests commonly used in the diagnosis of CAD: an exercise treadmill and an exercise single-photon emission CT (SPECT) myocardial perfusion imaging test (Chap. 229). Meta-analysis has shown that a positive treadmill ST-segment response has an average sensitivity of 66% and an average specificity of 84%, yielding a likelihood ratio of 4.1 [0.66/(1 – 0.84)]. If this test is used on a patient with a pretest probability of CAD of 10%, the posttest probability of disease after a positive result rises to only about 30%. If a patient with a pretest probability of CAD of 80% has a positive test result, the posttest probability of disease is about 95%.
The exercise SPECT myocardial perfusion test is a more accurate test for the diagnosis of CAD. For our purposes, assume that the finding of a reversible exercise-induced perfusion defect has both a sensitivity and a specificity of 90%, yielding a likelihood ratio for a positive test of 9.0 [0.90/(1 – 0.90)]. If we again test the low pretest probability patient and that patient has a positive test, by using Fig. 3-2 it can be demonstrated that the posttest probability of CAD rises from 10 to 50%. However, from a decision-making point of view, the more accurate test may not improve diagnostic confidence enough to change management. In fact, the test has moved the physician from being fairly certain that the patient did not have CAD to being completely undecided (a 50:50 chance of disease). In a patient with a pretest probability of 80%, using the more accurate exercise SPECT test raises the posttest probability to 97% (compared with 95% for the exercise treadmill). Again, the more accurate test does not provide enough improvement in posttest confidence to alter management, and neither test has improved much on what was known from clinical data alone.
Although it depends on the sensitivity and specificity, in general, if the pretest probability is low (e.g., 20%), even a positive result on a very accurate test will not move the posttest probability to a range high enough to rule in disease (e.g., 80%). Pretest probabilities are often particularly low in screening situations in which patients are asymptomatic. In such cases, specificity becomes particularly important. For example, in screening first-time female blood donors without risk factors for HIV, a positive test raised the likelihood of HIV to only 67% despite a specificity of 99.995% because the prevalence was 0.01%. One useful mnemonic is positive SpPin: a positive test with high specificity rules in disease (keeping in mind the caveats just noted about pretest probability). Conversely, with a high pretest probability, a negative test may not rule out disease adequately if it is not sufficiently sensitive. The other mnemonic is negative SnNout: a negative test with high sensitivity rules out disease. Thus, the largest gain in diagnostic confidence from a test occurs when the clinician is most uncertain before performing it (e.g., pretest probability between 30% and 70%). For example, if a patient has a pretest probability for CAD of 50%, a positive exercise treadmill test will move the posttest probability to 80% and a positive exercise SPECT perfusion test will move it to 90% (Fig. 3-2).
Bayes' theorem, as presented above, employs a number of important simplifications that should be considered. First, few tests have only two useful outcomes, positive and negative, and many tests provide numerous pieces of data about the patient. Even if these data can be integrated into a summary result, multiple levels of useful information may be present (e.g., strongly positive, positive, indeterminate, negative, strongly negative). Although Bayes' theorem can be adapted to this more detailed test result format, it is computationally complex to do so. Similarly, when multiple tests are performed, the posttest probability may be used as the pretest probability to interpret the second test. However, this simplification assumes conditional independence—that is, that the results of the first test do not affect the likelihood of the second test result—and this is often not true.
Finally, it has long been asserted that sensitivity and specificity are prevalence-independent parameters of test accuracy, and many texts still make this statement. This statistically useful assumption, however, is clinically simplistic. A treadmill exercise test, for example, has a sensitivity in a population of patients with one-vessel CAD of around 30%, whereas its sensitivity in patients with severe three-vessel CAD approaches 80%. Thus, the best estimate of sensitivity to use in a particular decision often varies, depending on the distribution of disease stages present in the tested population. A hospitalized, symptomatic, or referral population typically has a higher prevalence of disease and, in particular, a higher prevalence of more advanced disease than does an outpatient population. As a consequence, test sensitivity will tend to be higher in hospitalized patients, whereas test specificity will be higher in outpatients.
Statistical Prediction Models
Bayes' theorem, as presented above, deals with a clinical prediction problem that is unrealistically simple relative to most problems a clinician faces. Prediction models that are based on multivariable statistical models can handle much more complex problems and substantially enhance predictive accuracy for specific situations. Their particular advantage is the ability to take into account many overlapping pieces of information and assign a relative weight to each on the basis of its unique contribution to the prediction in question. For example, a logistic regression model to predict the probability of CAD considers all the relevant independent factors from the clinical examination and diagnostic testing and their significance instead of the small handful of data that clinicians can manage in their heads or with Bayes' theorem. However, despite this strength, the models are too complex computationally to use without a calculator or computer (although this limitation may be overcome once medicine is practiced from a fully computerized platform).
To date, only a handful of prediction models have been validated properly. The importance of independent validation in a population separate from the one used to develop the model cannot be overstated. An unvalidated prediction model should be viewed with the skepticism appropriate for a new drug or medical device that has not been through rigorous clinical trial testing.
When statistical models have been compared directly with expert clinicians, they have been found to be more consistent, as would be expected, but not significantly more accurate. Their biggest promise, then, would seem to be to help less-experienced clinicians become more accurate in their predictions.

Formal Decision Support Tools
Decision Support Systems
Over the last 40 years, many attempts have been made to develop computer systems to aid clinical decision-making and patient management. Conceptually, computers offer a very attractive way to handle the vast information load that today's physicians face. The computer can help by making accurate predictions of outcome, simulating the whole decision process, or providing algorithmic guidance. Computer-based predictions using Bayesian or statistical regression models inform a clinical decision but do not actually reach a "conclusion" or "recommendation." Artificial intelligence systems attempt to simulate or replace human reasoning with a computer-based analogue. To date, such approaches have achieved only limited success. Reminder or protocol-directed systems do not make predictions but use existing algorithms, such as guidelines, to guide clinical practice. In general, however, decision support systems have had little impact on practice. Reminder systems, although not yet in widespread use, have shown the most promise, particularly in correcting drug dosing and promoting adherence to guidelines. The full impact of these approaches will be evaluable only when computers are fully integrated into medical practice.
Decision Analysis
Compared with the methods discussed above, decision analysis represents a completely different approach to decision support. Its principal application is in decision problems that are complex and involve a substantial risk, a high degree of uncertainty in some key area, or an idiosyncratic feature that does not "fit" the available evidence. An example decision tree created to evaluate strategies for screening for HIV infection is shown in Fig. 3-3. Infected individuals who are unaware of their illness may cause up to 20,000 new cases of HIV infection annually in the United States. In addition, because of delayed diagnosis, about 40% of HIV-positive patients progress to AIDS within a year of the initial diagnosis. Early identification offers the opportunity both to prevent progression to AIDS through the use of serial CD4 counts and measurements of viral load linked to selective use of combination antiretroviral therapy and to encourage reduction of risky sexual behavior.
The Centers for Disease Control and Prevention (CDC) proposed in 2003 that routine HIV testing should be a part of standard medical care. In a decision-model exploration of this proposed strategy compared with usual care, assuming a 1% prevalence of unidentified HIV infection in the population, routine screening of a cohort of 43-year-old men and women increased life expectancy by 5.5 days and cost $194 per subject screened. The cost-effectiveness ratio for screening relative to usual care was $15,078 per quality-adjusted life year (the additional cost to society to increase population health by 1 year of perfect health). Results were sensitive to assumptions about the effectiveness of behavior modification on subsequent sexual behavior, the benefits of early therapy for HIV infection, and the prevalence and incidence of HIV infection in the population targeted. This model, which required over 75 separate data points, provides novel insights into a clinical management problem that has not been subjected to a randomized clinical trial.
Although such models have been developed and used to estimate short- and long-term survival for alternative choices, the process of building and evaluating decision models is generally too complex for use in real-time clinical management. The potential for this tool therefore lies in the development of a set of published or online models addressing a particular decision or policy area that can serve to highlight key pressure points in the problem.

Evidence-Based Medicine
The "art of medicine" is defined traditionally as a practice combining medical knowledge (including scientific evidence), intuition, and judgment in the care of patients (Chap. 1). EBM updates this construct by placing much greater emphasis on the processes by which clinicians gain knowledge of the most up-to-date and relevant clinical research to determine for themselves whether medical interventions alter the disease course and improve the length or quality of life. The meaning of practicing EBM becomes clearer through an examination of its four key steps:

Formulating the management question to be answered
Searching the literature and online databases for applicable research data
Appraising the evidence gathered with regard to its validity and relevance
Integrating this appraisal with knowledge about the unique aspects of the patient (including the patient's preferences about the possible outcomes)

Step 1 involves generating well-formulated questions that involve four or five components—PICOD: patient or population, intervention, comparator, outcome, and, sometimes, D for study design, (e.g., does routine percutaneous coronary intervention improve survival compared with initial medical management in 60-year-old men with stable angina and known CAD?) Steps 2 and 3 are the heart of EBM as it is currently used in practice and relate to the underlying fundamental principle that the strength of medical evidence supporting a therapy or strategy is hierarchical. The process of searching the world's research literature and appraising the quality and relevance of studies thus identified can be quite time-consuming and requires skills and training that most clinicians do not possess. Thus, the best starting point for most EBM searches is the identification of recent systematic overviews of the problem in question (Table 3-3).

Table 3-3 Selected Tools for Finding the Evidence in Evidence-Based Medicine

Name	Description	Web Address	Availability
Evidence-Based Medicine Reviews	Comprehensive electronic database that combines and integrates: 1. The Cochrane Database of Systematic Reviews 2. ACP Journal Club 3. The Database of Abstracts of Reviews of Effectiveness	www.ovid.com	Subscription required. Available through medical center libraries and other institutions.
Cochrane Library	Collection of EBM databases, including The Cochrane Database of Systematic Reviews—full text articles reviewing specific health care topics.	www.cochrane.org	Subscription required. Abstracts of systematic reviews available free online. Some countries have funding to provide free access to all residents.
ACP Journal Club	Collection of summaries of original studies and systematic reviews. Published bimonthly. All data since 1991 available on Web site, updated yearly.	www.acpjc.org	Subscription required.
Clinical Evidence	Monthly updated directory of concise overviews of common clinical interventions.	www.clinicalevidence.com	Subscription required. Free access for United Kingdom and developing countries.
MEDLINE	National Library of Medicine database with citations back to 1966.	www.nlm.nih.gov	Free via Internet.

Generally, the EBM tools listed in Table 3-3 provide access to research information in one of two forms. The first, primary research reports, is the original peer-reviewed research work that is published in medical journals. Initial access to this information in an EBM search may be gained through MEDLINE, which provides access to a huge amount of data in abstract form. However, in using MEDLINE it is often difficult to locate reports that are on point in a sea of irrelevant or unhelpful information and be reasonably certain that important reports have not been overlooked. The second form, systematic reviews, comprehensively summarizes the available evidence on a particular topic up to a certain date and provides the interpretation of the reviewer and thus is the highest level of evidence in the hierarchy. Explicit criteria are used to find all the relevant scientific research and grade its quality. The prototype for this kind of resource is the Cochrane Database of Systematic Reviews. One of the key components of a systematic review is a meta-analysis. The next two sections will review some of the major types of clinical research reports available in the literature and the process of aggregating those data into meta-analyses.
Sources of Evidence: Clinical Trials and Registries
The notion of learning from observation of patients is as old as medicine itself. Over the last 50 years, physicians' understanding of how best to turn raw observation into useful evidence has evolved considerably. Case reports, personal anecdotal experience, and small single-center case series are now recognized as having severe limitations in validity and generalizability, and although they may generate hypotheses or be the first reports of adverse events, they have no role in formulating modern standards of practice. The major tools used to develop reliable evidence consist of the randomized clinical trial and the large observational registry. A registry or database typically is focused on a disease or syndrome (e.g., cancer, CAD, heart failure), a clinical procedure (e.g., bone marrow transplantation, coronary revascularization), or an administrative process (e.g., claims data used for billing and reimbursement).
By definition, in observational data, the care of the patient is not controlled by the investigator. Carefully collected prospective observational data can achieve a level of quality approaching that of major clinical trial data. At the other end of the spectrum, data collected retrospectively (e.g., chart review) are limited in form and content to what previous observers thought was important to record, which may not serve the research question under study particularly well. Data not specifically collected for research (e.g., claims data) often have important limitations that cannot be overcome in the analysis phase of the research. Advantages of observational data include the ability to capture a broader population than is typically represented in clinical trials because of inclusion and exclusion criteria. In addition, observational data are the primary source of evidence for questions for which a randomized trial cannot or will not be performed. For example, it may be difficult or unethical to randomize patients to test diagnostic or therapeutic strategies that are unproven but widely accepted in practice. In addition, patients cannot be randomized to a sex, racial/ethnic group, socioeconomic status, or country of residence. Physicians are also not willing to randomize patients to a potentially harmful intervention, such as smoking or overeating to develop obesity.
The major difference between a well-done randomized clinical trial and a well-done prospective observational study of a particular management strategy is the lack of protection from treatment selection bias in the latter. The use of observational data to compare diagnostic or therapeutic strategies assumes that there is sufficient uncertainty in practice to ensure that similar patients will be managed differently by different physicians. In short, the analysis assumes that there is an element of randomness (in the sense of disorder rather than in the formal statistical sense) to clinical management. In such cases, statistical models attempt to adjust for important imbalances and "level the playing field" so that a fair comparison among treatment options can be made. When management is clearly not random (e.g., all eligible left main coronary artery disease patients are referred for coronary bypass surgery), the problem may be too confounded (biased) for statistical correction, and observational data may not provide reliable evidence.
In general, the use of concurrent controls is vastly preferable to that of historical controls. For example, comparison of current surgical management of left main CAD with left main CAD patients treated medically during the 1970s (the last time these patients were routinely treated with medicine alone) would be extremely misleading since the quality of "medical therapy" has made substantial improvements in the interval.
Randomized controlled clinical trials include the careful prospective design features of the best observational data studies but also include the use of random allocation of treatment. This design provides the best protection against confounding due to treatment selection bias (a major aspect of internal validity). However, the randomized trial may not have good external validity (generalizability) if the process of recruitment into the trial resulted in the exclusion of many potentially eligible subjects.
Consumers of medical evidence need to be aware that randomized trials vary widely in their quality and applicability to practice. The process of designing such a trial often involves a great many compromises. For example, trials designed to gain U.S. Food and Drug Administration (FDA) approval for an investigational drug or device have to address certain regulatory requirements that may result in a trial design different from what practicing clinicians would find useful.
Meta-Analysis
The Greek prefix meta signifies something at a later or higher stage of development. Meta-analysis is research done on research data for the purpose of combining and summarizing the available evidence quantitatively. Although it can be used to combine nonrandomized studies, meta-analysis is used most typically to summarize all the randomized trials on a particular therapeutic problem. Ideally, unpublished trials should be identified and included to avoid publication bias (i.e., "negative" trials may not be published). Furthermore, some of the best meta-analyses obtain and analyze the raw individual patient-level data from all trials rather than working only with what is available in the published reports of each trial. Not all published meta-analyses are reliable sources of evidence on a particular problem. Their methodology must be scrutinized carefully to ensure proper study design and analysis. The results of a well-done meta-analysis are likely to be most persuasive if they include at least several large-scale, properly performed randomized trials. Although meta-analysis can help detect benefits when individual trials are inadequately powered (e.g., the benefits of streptokinase thrombolytic therapy in acute MI demonstrated by ISIS-2 in 1988 were evident by the early 1970s through meta-analysis), in cases in which the available trials are small or poorly done, meta-analysis should not be viewed as a remedy for the deficiency in primary trial data.
Meta-analyses typically focus on summary measures of relative treatment benefit, such as odds ratios or relative risks. Clinicians also should examine what absolute risk reduction (ARR) can be expected from the therapy. A useful summary metric of absolute treatment benefit is the number needed to treat (NNT) to prevent one adverse outcome event (e.g., death, stroke). NNT is simply 1/ARR. For example, if a hypothetical therapy reduced mortality rates over a 5-year follow-up by 33% (the relative treatment benefit) from 12% (control arm) to 8% (treatment arm), the absolute risk reduction would be 12% – 8% = 4% and the NNT would be 1/.04, or 25. Thus, it would be necessary to treat 25 patients for 5 years to prevent 1 death. If the hypothetical treatment was applied to a lower-risk population, say, with a 6% 5-year mortality, the 33% relative treatment benefit would reduce absolute mortality by 2% (from 6 to 4%), and the NNT for the same therapy in this lower-risk group of patients would be 50. Although not always made explicit, comparisons of NNT estimates from different studies should account for the duration of follow-up used to create each estimate.
Clinical Practice Guidelines
According to the 1990 Institute of Medicine definition, clinical practice guidelines are "systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances." This definition emphasizes several crucial features of modern guideline development. First, guidelines are created by using the tools of EBM. In particular, the core of the development process is a systematic literature search followed by a review of the relevant peer-reviewed literature. Second, guidelines usually are focused on a clinical disorder (e.g., adult diabetes, stable angina pectoris) or a health care intervention (e.g., cancer screening). Third, the primary objective of guidelines is to improve the quality of medical care by identifying areas where care should be standardized, based on compelling evidence. Guidelines are intended to "assist" decision-making, not to define explicitly what decisions should be made in a particular situation, in part because evidence alone is never enough for clinical decision-making (e.g., deciding whether to intubate and administer antibiotics for pneumonia in a terminally ill individual, in an individual with dementia, or in an otherwise healthy 30-year-old mother).
Guidelines are narrative documents constructed by an expert panel whose composition often is determined by interested professional organizations. These panels vary in the degree to which they represent all relevant stakeholders. The guideline documents consist of a series of specific management recommendations, a summary indication of the quantity and quality of evidence supporting each recommendation, and a narrative discussion of the recommendations. Many recommendations have little or no supporting evidence and thus reflect the expert consensus of the guideline panel. In part to protect against errors by individual panels, the final step in guideline construction is peer review, followed by a final revision in response to the critiques provided.
Guidelines are closely tied to the process of quality improvement in medicine through their identification of evidence-based best practices. Such practices can be used as quality indicators. Examples include the proportion of acute MI patients who receive aspirin upon admission to a hospital and the proportion of heart failure patients with a depressed ejection fraction treated with an ACE inhibitor. Routine measurement and reporting of such quality indicators can produce selective improvements in quality, since many physicians prefer not to be outliers.

Conclusions
In this era of EBM, it is tempting to think that all the difficult decisions practitioners face have been or soon will be solved and digested into practice guidelines and computerized reminders. However, EBM provides practitioners with an ideal rather than a finished set of tools with which to manage patients. The significant contribution of EBM has been to promote the development of more powerful and user-friendly EBM tools that can be accessed by busy practitioners. This is an enormously important contribution that is slowly changing the way medicine is practiced. One of the repeated admonitions of EBM pioneers has been to replace reliance on the local "gray-haired expert" (who may be wrong but is rarely in doubt) with a systematic search for and evaluation of the evidence. But EBM has not eliminated the need for subjective judgments. Each systematic review or clinical practice guideline presents the interpretation of "experts" whose biases remain largely invisible to the review's consumers. Moreover, even with such evidence, it is always worth remembering that the response to therapy of the "average" patient represented by the summary clinical trial outcomes may not be what can be expected for the patient sitting in front of a physician in the clinic or hospital. In addition, meta-analyses cannot generate evidence when there are no adequate randomized trials, and most of what clinicians confront in practice will never be thoroughly tested in a randomized trial. For the foreseeable future, excellent clinical reasoning skills and experience supplemented by well-designed quantitative tools and a keen appreciation for individual patient preferences will continue to be of paramount importance in the professional life of medical practitioners.

Online Post

Popular posts

Nov 17, 2011

Chapter 3. Decision-Making in Clinical Medicine

0 comments to “Chapter 3. Decision-Making in Clinical Medicine”

Post a Comment

Categories

Blog archive

Blogger news

Blogroll

About