Introduction
Deep-learning artificial intelligence (AI) systems have demonstrated impressive performance across a variety of clinical tasks, including diagnosis, risk prediction, triage, mortality prediction, and treatment planning.Footnote 1,2 A problem, however, is that the inner workings of these systems have often proven thoroughly resistant to understanding, explanation, or justification, not only to end-users (e.g., doctors, clinicians, and nurses) but also to the designers of these systems themselves. Such AI systems are commonly described as “opaque,” “inscrutable,” or “black boxes.” The initial response to this problem in the literature was a demand for “explainable AI.” However, recently, several authors have suggested that making AI more explainable or “interpretable” is likely to be achieved at the cost of the accuracy of these systems and that a preference for explainable systems over more accurate AI is ethically indefensible in the context of medicine.Footnote 3,4
In this article, we defend the value of interpretability in the context of the use of AI in medicine. We point out that clinicians may prefer interpretable systems over more accurate black boxes, which in turn is sufficient to give designers of AI reason to prefer more interpretable systems in order to ensure that AI is adopted and its benefits realized. Moreover, clinicians may themselves be justified in this preference. Medical AI should be analyzed as a sociotechnical system, the performance of which is as much a function of how people respond to AI as it is of the outputs of the AI. Securing the downstream therapeutic benefits from diagnostic and prognostic systems is critically dependent on how the outputs of these systems are interpreted by physicians and received by patients. Prioritizing accuracy over interpretability overlooks the various human factors that could interfere with downstream benefits to patients. We argue that, in some cases, a less accurate but more interpretable AI may have better effects on patient health outcomes than a “black box” model with superior accuracy, and suggest that a preference for the use of a highly accurate black box AI systems, over less accurate but more interpretable systems, may itself constitute a form of lethal prejudice that may diminish the benefits of AI to patients—and perhaps even harm them.
The Black Box Problem in Medical AI
Recent advances in artificial AI and machine learning (ML) have significant potential to improve the current practice of medicine through, for instance, enhancing physician judgment, reducing medical error, improving the accessibility of medical care, and improving patient health outcomes.Footnote 5,6,7 Many advanced ML AI systems have demonstrated impressive performance in a wide variety of clinical tasks, including diagnosis,Footnote 8 risk prediction,Footnote 9 mortality prediction,Footnote 10 and treatment planning.Footnote 11 In emergency medicine, medical AI systems are being investigated to assist in the performance of diagnostic tasks, outcome prediction, and clinical monitoring.Footnote 12 A problem, however, is that ML algorithms are notoriously opaque, “in the sense that if one is a recipient of the output of the algorithm […], rarely does one have any concrete sense of how or why a particular classification has been arrived at from inputs.”Footnote 13 This can occur for a number of reasons, including a lack of relevant technical knowledge on the part of the user, corporate or government concealment of key elements of an AI system, or at the deepest level, a cognitive mismatch between the demands of human reasoning and the technical approaches to mathematical optimization in high-dimensionality that are characteristic of ML.Footnote 14 Joseph Wadden suggests that the black box problem “occurs whenever the reasons why an AI decision-maker has arrived at its decision are not currently understandable to the patient or those involved in the patient’s care because the system itself is not understandable to either of these agents.”Footnote 15
A variety of related concerns have been raised over the prospect of black box clinical decision support systems being operationalized in clinical medicine. Some authors worry that human physicians may act on the outputs of black box medical AI without a clear understanding of the reasons behind them,Footnote 16 or that opacity may conceal erroneous inferences or algorithmic biases that could jeopardize patient health and safety.Footnote 17,18,19 Others are concerned that opacity could interfere with the allocation of moral responsibility or legal liability in the instance that patient harm results from accepting and acting upon the outputs of a black box medical AI system,Footnote 20,21,22 or that the use of black-box medical AI systems may undermine the accountability that healthcare practitioners accept for AI-related medical error.Footnote 23 Still, others are concerned that black box medical AI systems cannot, will not, and perhaps ought not to be trusted by doctors or patients.Footnote 24,25,26 These concerns are especially acute in the context of emergency medicine, where decisions need to be made quickly and coordinated across teams of multispecialist practitioners.
Responding to these concerns, some authors have argued that medical AI systems will need to be “interpretable,” “explainable,” or “transparent” in order to be responsibly utilized in safety-critical medical settings and overcome these various challenges.Footnote 27,28,29
The Case for Accuracy
Recently, however, some authors have argued that opacity in medical AI is not nearly as problematic as critics have suggested, and that the prioritization of interpretable over black box medical AI systems may have several ethically unacceptable implications. Critics have advanced two distinct arguments against the prioritization of interpretability in medical AI.
First, some authors have highlighted parallels between the opacity of ML models and the opacity of a variety of commonplace medical interventions that are readily accepted by both doctors and patients. As Eric Topol has noted, “[w]e already accept black boxes in medicine. For example, electroconvulsive therapy is highly effective for severe depression, but we have no idea how it works. Likewise, there are many drugs that seem to work even though no one can explain how.”Footnote 30 The drugs that Topol is referring to here include aspirin, which, as Alex John London notes, “modern clinicians prescribed […] as an analgesic for nearly a century without understanding the mechanism through which it works,” along with lithium, which “has been used as a mood stabilizer for half a century, yet why it works remains uncertain.”Footnote 31 Other authors have also highlighted acetaminophen and penicillin, which “were in widespread use for decades before their mechanism of action was understood,” along with selective serotonin reuptake inhibitors, whose underlying causal mechanism is still unclear.Footnote 32 Still, others have highlighted that the opacity of black-box AI systems is largely identical to the opacity of other human minds, and in some respects even one’s own mind. For instance, John Zerilli and coauthors observe that “human agents are […] frequently mistaken about their real (internal) motivations and processing logic, a fact that is often obscured by the ability of human decision-makers to invent post hoc rationalizations.”Footnote 33 According to some, these similarities imply that clinicians ought not to be anymore concerned about opacity in AI than they are about the opacity of their colleagues’ recommendations, or indeed the opacity of their own internal reasoning processes.Footnote 34
This first argument is a powerful line of criticism of accounts that hold that we should entirely abjure the use of opaque AI systems. However, it leaves open the possibility that, as we shall argue below, interpretable systems have distinct advantages that justify our preferring them.
The second argument assumes—as does much of the AI and ML literature—that there is an inherent trade-off between accuracy and interpretability (or explainability) in AI systems. In their 2016 announcement of the “Explainable AI (XAI)” project, for instance, the United States Defence Advanced Research Projects Agency claims that “[t]here is an inherent tension between ML performance (predictive accuracy) and explainability; often the highest performing methods (e.g., deep learning) are the least explainable, and the most explainable (e.g., decision trees) are less accurate.”Footnote 35 Indeed, attempts to enhance our understanding of AI systems through the pursuit of intrinsic or ex ante interpretability (for instance, by restricting the size of the model, implementing “interpretability constraints,” or using simpler, rule-based classifiers over more complex deep neural networks) are often observed to result in compromises to the accuracy of a model.Footnote 36,37,38 In particular, the development of a high-performing AI system entails an unavoidable degree of complexity that often interferes with how intuitive and understandable the operations of these systems are in practice.Footnote 39
Consequently, some authors suggest that prioritizing interpretability over accuracy in medical AI has the ethically troubling consequence of compromising the accuracy of these systems, and subsequently, the downstream benefits of these systems for patient health outcomes.Footnote 40,41 Alex London has suggested that “[a]ny preference for less accurate models—whether computational systems or human decision-makers—carries risks to patient health and welfare. Without concrete assurance that these risks are offset by the expectation of additional benefits to patients, a blanket preference for simpler models is simply a lethal prejudice.”Footnote 42 According to London, when we are patients, it is more important to us that something works than that our physician knows precisely how or why it works.Footnote 43 Indeed, this claim appears to have been corroborated by a recent citizen jury study, which found that participants were less likely to value interpretability over accuracy in healthcare settings compared to non-healthcare settings.Footnote 44 London thus concludes that the trade-off between accuracy and interpretability in medical AI ought therefore to be resolved in favor of accuracy.
The Limits of Post Hoc Explanation
One popular response to these concerns is to hope that improvements in post hoc explanation methods could enhance the interpretability of medical AI systems without compromising their accuracy.Footnote 45 Rather than pursuing ex ante or intrinsic interpretability, post hoc explanation methods attempt to extract explanations of various sorts from black-box medical AI systems on the basis of their previous decision records.Footnote 46,47,48 In many cases, this can be achieved without altering the original, black-box model, either by affixing a secondary explanator to the original model or by replicating its statistical function and overall performance through interpretable methods.Footnote 49
The range of post hoc explanation methods is expansive, and it is beyond the scope of this article to review them all here. However, some key examples of post hoc explanation methods include sensitivity analysis, prototype selection, and saliency masks.Footnote 50 Sensitivity analysis involves “evaluating the uncertainty in the outcome of a black box with respect to different sources of uncertainty in its inputs.”Footnote 51 For instance, a model may return an output with a confidence interval of 0.3, indicating that it has produced this output with low confidence, with the aim of reducing the strength of a user’s credence. Prototype selection involves returning, in conjunction with the output, an example case that is as similar as possible to the case that has been entered into the system, with the aim of illuminating some of the criteria according to which output was generated. For instance, suppose a medical AI system, such as IDx-DR,Footnote 52 was to diagnose a patient with diabetic retinopathy from an image of their retina. A prototype selection explanator might produce, in conjunction with the model’s classification, a second example image that is most similar to the original case, in an attempt to illustrate important elements in determining its output. Lastly, saliency masks highlight certain words, phrases, or areas of the image that were most influential in determining a particular output.
Post hoc explanation methods have demonstrated some potential to minimize some of the concerns of the critics of opacity discussed in section “The Black Box Problem in Medical AI,” while also side-stepping the objections of critics of interpretability discussed in section “The Case for Accuracy.” However, post hoc explanation methods also suffer from a number of significant limitations, which preclude them from entirely resolving this debate.
First, the addition of post hoc explanation methods to “black box” ML systems adds another layer of uncertainty to the evaluation of their outputs and inner workings. Post hoc explanations can only offer an approximation of the computations of a black-box model, meaning that it may be unclear how the explanator works, how faithful it is to the model, and why its outputs or explanations ought to be accepted.Footnote 53,54,55
Second, and relatedly, such explanations often only succeed in extracting information that is highly incomplete.Footnote 56 For example, consider an explanator that highlights the features of a computed breast tomography scan that were most influential in classifying the patient as high-risk. Even if the features highlighted were intuitively relevant, this “explanation” offers a physician little reason to accept the model’s output, particularly if the physician disagrees with it.
Third, the aims of post hoc explanation methods are often under-specified, particularly once the problem of agent-relativity in explanations is considered. Explanations often need to be tailored to a particular audience in order to be of any use. As Carl Zednik has expressed, “although the opacity of ML-programmed computing systems is traditionally said to give rise to the Black Box Problem, it may in fact be more appropriate to speak of many Black Box Problems—one for every stakeholder.”Footnote 57 An explanation that assumes a background in computer science, for instance, may be useful for the manufacturers and auditors of medical AI systems, but is likely to deliver next to no insight for a medical professional that lacks this technical background. Conversely, a simple explanation tailored to patients, who typically lack both medical and computer science backgrounds, is likely to provide little utility to a medical practitioner. Some post hoc explanations may prove largely redundant or useless, while others may influence the decisions of end-users in ways that could reduce the clinical utility of these systems.
Finally, the focus on explanation has led to the neglect of justification in explainable medical AI.Footnote 58 , Footnote 59 Explanations are descriptive, in that they give an account of why a reasoner arrived at a particular judgment, but justifications give a normative account of why that judgment is a good judgment. There is a significant overlap between explanations and justifications, but they are far from identical. Yet within the explainability literature in AI, explanations and justifications are rarely distinguished, and when they are, it is the former that is prioritized over the latter.Footnote 60
Consequently, despite high hopes that explainability could overcome the challenges of opacity and the accuracy-interpretability trade-off in medical AI, post hoc explanation methods are not currently capable of meeting this challenge.
Three Problems with the Prioritization of Accuracy over Interpretability
In this section, we highlight three problems underlying the case for accuracy, concerning (1) the clinical objectives of medical AI systems and the need for accuracy maximization; (2) the gap between technical accuracy in medical AI systems and their downstream effects upon patient health outcomes; and (3) the reality of the accuracy-interpretability trade-off. Both together and separately, these problems suggest that interpretability is more valuable than critics appreciate.
First, the accuracy of a medical AI system is not always the principal concern of human medical practitioners and may, in some cases, be secondary to the clinician’s own ability to understand and interpret the outputs of the system, along with certain elements of the system’s functioning, or even the system as a whole. Indeed, the priorities of clinicians are largely dependent upon their conception of the particular aims of any given medical AI system. In a recent qualitative study, for instance, Carrie Cai and coauthors found that the importance of accuracy to medical practitioners varies according to the practitioners’ own conception of a medical AI system’s clinical objectives.Footnote 61 “To some participants, the AI’s objective was to be as accurate as possible, independent of its end-user. […] To others, however, the AI’s role was to merely draw their attention to suspicious regions, given that the pathologist will be the one to make sense of those regions anyway: “It just gives you a big picture of this is the area it thinks is suspicious. You can just look at it and it doesn’t have to be very accurate”.Footnote 62 In these latter cases, understanding a model’s reasons or justifications for drawing the clinician’s attention to a particular area of an image may rank higher on the clinicians’ list of priorities than the overall accuracy of the system, in order that they may reliably determine why a model has drawn the clinician’s attention to a particular treatment option, piece of information, or area of a clinical image. This is not to deny the importance of accuracy in, say, a diagnostic AI system for the improvement of patient health outcomes, but rather to suggest that, in some cases, and for some users, the accuracy of an AI system may not be as critical as London and other critics have supposed, and may rank lower on the clinician’s list of priorities than the interpretability of the system. Depending upon the specific performance disparities between black box and interpretable AI systems, there may be cases where clinicians prefer less accurate systems that they can understand over black-box systems with superior accuracy. If users prefer interpretable models over “black box” systems, then the potential downstream benefits of “black box” AI systems for patients could be undermined in practice if, for instance, clinicians reject them or avoid using them. Implementing “black box” systems over interpretable systems without respect for the preferences of the users of these systems may result in suboptimal outcomes that could have otherwise been avoided through the use of less accurate but more interpretable AI systems. Even if clinicians’ preference for interpretability is a prejudice, if it is sufficiently widespread and influential, it may be sufficient to justify designers of AI to prioritize interpretability in order to increase the likelihood that AI systems will be adopted and their benefits realized.
Second, contra London, clinicians may themselves be justified in this preference. The case for accuracy appears to erroneously assume a necessary causal link between technical accuracy and improved downstream patient health outcomes. While diagnostic and predictive accuracies are certainly important for the improvement of patient health outcomes, they are far from sufficient. Medical AI systems need to be understood as intervening in care contexts that consist of an existing network of sociotechnical relations, rather than as mere technical “additions” to existing clinical decision-making procedures.Footnote 63 , Footnote 64 , Footnote 65 How these AI systems will become embedded into these contexts, and alter existing relations between actors, is crucially important to the extent to which they will produce downstream health benefits. As Sara Gerke and coauthors argue, the performance of medical ML systems will be influenced by a variety of broader human factors beyond the system itself, including the way that clinicians respond to the outputs of the systems, “the reimbursement decisions of insurers, the effects of court decisions on liability, any behavioral biases in the process, data quality of any third-party providers, any (possibly proprietary) ML algorithms developed by third parties, and many others.”Footnote 66 Thus, as Thomas Grote observes in his recent discussion of clinical equipoise in randomized clinical trials of diagnostic medical AI systems, “even if the AI system were outperforming clinical experts in terms of diagnostic accuracy during the validation phase, its clinical benefit would still remain genuinely uncertain. The main reason is that we cannot causally infer from an increase in diagnostic accuracy to an improvement of patient outcome” (emphasis added).Footnote 67 There is a gap, in other words, between the accuracy of medical AI systems and their effectiveness in clinical practice, insofar as improvements in the accuracy of a technical system do not automatically translate into improvements in downstream health outcomes. Indeed, this observation is borne out by the current lack of evidence of downstream patient benefits generated from even the most technically accurate of medical AI systems. Superior accuracy is, in short, insufficient to demonstrate superior outcomes.
One reason for this gap comes from the fact that human users do not respond to the outputs of algorithmic systems in the same way that we respond to our own judgments and intuitions, nor even to the recommendations of other human beings. Indeed, according to one recent systematic review in human–computer interaction studies, “the inability to effectively combine human and nonhuman (i.e., algorithmic, statistical, and machine) decision making remains one of the most prominent and perplexing hurdles for the behavioral decision making community.”Footnote 68 Prevalent human biases affect the interpretation of algorithmic recommendations, classifications, and predictions. As Sara Gerke and coauthors observe, “[h]uman judgement […] introduces well-known biases into an AI environment, including, for example, inability to reason with probabilities provided by AI systems, over extrapolation from small samples, identification of false patterns from noise, and undue risk aversion.”Footnote 69 In safety-critical settings and high-stakes decision-making contexts such as medicine, these sorts of biases could pose significant risks to patient health and well-being.
Moreover, some of these biases are more likely to occur in cases where the medical AI system is opaque, rather than interpretable. Algorithmic aversion, for instance, is a phenomenon in which the users of an algorithmic system consistently reject the outputs of an algorithmic system, even when the user has observed the system performance to a high standard consistently over time, and when following the recommendations of the system would produce better outcomes overall.Footnote 70 , Footnote 71 Algorithmic aversion is most commonly observed in cases where the users of the system have expertise in the domain for which the system is designed (e.g., dermatologists in the diagnosis of malignant skin lesions);Footnote 72 in cases where the user has seen the system make (even minor) mistakes;Footnote 73 but most importantly for our purposes, in cases where the algorithmic system is perceived to be opaque by its user. Footnote 74 “Thus,” claim Michael Yeomans and coauthors, “it is not enough for algorithms to be more accurate, they also need to be understood.”Footnote 75
Finally, in passing, it is worth noting that some authorities have begun to contest the reality of the accuracy-interpretability trade-off in AI and ML. In particular, Cynthia Rudin has recently argued that the accuracy-interpretability trade-off is a myth, and that simpler, more interpretable classifiers can perform to the same general standard as deep neural networks after preprocessing, particularly in cases where data are structured and contain naturally meaningful features, as is common in medicine.Footnote 76 , Footnote 77 Indeed, Rudin argues that interpretable AI can, in some cases, demonstrate higher accuracy than comparatively black box AI systems. “Generally, in the practice of data science,” she claims, “the small difference in performance between ML algorithms can be overwhelmed by the ability to interpret results and process the data better at the next iteration. In those cases, the accuracy/interpretability trade-off is reversed—more interpretability leads to better overall accuracy, not worse.”Footnote 78 In a later article coauthored with Joanna Radin,Footnote 79 Rudin highlights a number of studies that corroborate the comparable performance of interpretable and black box AI systems across a variety of safety-critical domains, including healthcare.Footnote 80 , Footnote 81 , Footnote 82 Rudin and Radin also observe that even in computer vision and image-recognition tasks, in which deep neural networks are generally considered the state of the art, a number of studies have succeeded in implementing interpretability constraints to deep learning models without significant compromises in accuracy.Footnote 83 , Footnote 84 , Footnote 85 Rudin concludes that the uncritical acceptance of the accuracy-interpretability trade-off in AI often leads researchers to forego any attempt to investigate or develop interpretable models, or even develop the skills required to develop these models in the first place.Footnote 86 She suggests that black box AI systems ought not to be used in high-stakes decision-making contexts or safety-critical domains unless it is demonstrated that no interpretable model can reach the same level of accuracy. “It is possible,” claim Rudin and Radin, “that an interpretable model can always be constructed—we just have not been trying. Perhaps if we did, we would never use black boxes for these high-stakes decisions at all.”Footnote 87 While, as we have argued here, it will, at least in some circumstances, be defensible to prioritize interpretability at the cost of accuracy, if Rudin is correct, the price of the pursuit of interpretability may not be as high as critics—and our argument to this point—have presumed.
Superior accuracy is, therefore, not enough to justify the use of black-box medical AI systems over less accurate but more interpretable systems in clinical medicine. In many cases, it will be genuinely uncertain a priori whether a more accurate black-box medical AI system will deliver greater downstream benefits to patient health and well-being compared to a less accurate but more interpretable AI system. Indeed, under some conditions, less accurate but more interpretable medical AI systems may produce better downstream patient health outcomes than more accurate but nevertheless opaque systems.
Conclusion
The prioritization of accuracy over interpretability in medical AI, therefore, carries its own lethal prejudices. While proponents of accuracy over interpretability in medical AI are correct to emphasize that the use of less accurate models carries risks to patient health and welfare, their arguments overlook the comparable risks that the use of more accurate but less interpretable models could present for patient health and well-being. This is not to suggest that the use of opaque ML AI systems in clinical medicine is unacceptable or ought to be rejected. We agree with proponents of accuracy that “black box” AI systems could deliver substantial benefits to medicine, and that the risks may eventually be reduced enough to justify their use. However, a blanket prioritization of accuracy in the technical trade-off between accuracy and interpretability itself looks to be unjustified. Opacity in medical AI systems may constitute a significant obstacle to the achievement of improved downstream patient health outcomes, despite how technically accurate these systems may be. More attention needs to be directed toward how medical AI systems will become embedded in the sociotechnical decision-making contexts for which they are being designed. The downstream effects of medical AI systems on patient health outcomes will be mediated by the decisions and behavior of human clinicians, who will need to interpret the outputs of these systems and incorporate them into their own clinical decision-making procedures. The case for prioritizing accuracy over interpretability pays insufficient attention to the reality of this situation insofar as it overlooks the negative effects that opacity could have upon the hermeneutic task of physicians in interpreting and acting upon the outputs of black box medical AI systems and subsequent downstream patient health outcomes.
Funding Statement
This research was supported by an unrestricted gift from Facebook Research via the Ethics in AI in the Asia-Pacific Grant Program. The work was conducted without any oversight from Facebook. The views expressed herein are those of the authors and are not necessarily those of Facebook or Facebook Research.
Acknowledgments
Earlier versions of this article were presented to audiences at Macquarie University’s philosophy seminar series, the University of Wollongong’s seminar series hosted by the Australian Centre for Health Engagement, Evidence, and Values (ACHEEV), the University of Sydney’s Conversation series hosted by Sydney Health Ethics, and the Ethics in AI Research Roundtable sponsored by Facebook Research, and the Centre for Civil Society and Governance of the University of Hong Kong. The authors would like to thank the audiences of these seminars for comments and discussion that improved the article.
Conflicts of Interest
R.S. is an Associate Investigator in the ARC Centre of Excellence for Automated Decision-Making and Society (CE200100005) and contributed to this article in that role. J.H. was supported by an Australian Government Research Training Program scholarship.