AI: Are we ready for black box solutions?

This article first appeared in Digital Edge, The Edge Malaysia Weekly, on August 16, 2021 - August 22, 2021.

"In the area of medical imaging, black box algorithms are not only more accurate than human users in many cases, they tend to be far more effective at vision-based tasks than their simpler white box counterparts.” - Evgeniou

-A +A

As artificially intelligent (AI) vision systems continue to astound in the field of medical diagnostics and imaging, more and more questions are being asked about the opaque, difficult-to-understand underpinnings of these applications. 

Condensed, the broad stroke of the question is as follows: How do we regulate or otherwise account for the application of so-called “black box” or “non-interpretable” algorithms, which are seen to make far more accurate imaging decisions than human operators? 

It is an important question; one that is fundamentally tied to the broader issue of accountability in healthcare. Black box algorithms are referred to as such because their decision-making processes are — for the time being at least — largely invisible to and poorly understood by their creators. 

Nonetheless, research at this cutting-edge intersection of AI and medical diagnostics indicates that black box algorithms — esoteric and complex things that they are — provide substantially more accurate medical imaging decisions than their human counterparts.

Better medical decision-making sounds great. We just do not quite know how black box algorithms arrive at their decisions. And that seems to give us pause. 

A new paper in Science Magazine cautions against moving away from black box algorithms, citing a number of limitations in the alternative, otherwise known as “white box” or “interpretable” algorithms. 

The July 2021 paper, written by Boris Babic, Sara Gerke, Theodoros Evgeniou, and I Glenn Cohen, argues against the “near-consensus emerging in favour of explainable AI/ML (machine learning) among academics, governments and civil society groups”. 

The paper adds: “Many are drawn to this approach to harness the accuracy benefits of non-interpretable AI/ML … while also supporting transparency, trust and adoption. We argue that this consensus, at least as applied to healthcare, both overstates the benefits and undercounts the drawbacks of requiring black-box algorithms to be explainable.” 

A key question at this juncture is the extent to which black box algorithms are more accurate or reliable than their white box counterparts. 

In an interview with Digital Edge, Evgeniou says at present, black box algorithms are “by far the best AI model in terms of accuracy in medical imaging”. 

He adds: “In the area of medical imaging, black box algorithms are not only more accurate than human users in many cases, they tend to be far more effective at vision-based tasks than their simpler white box counterparts.” 

Some caveats apply, however. 

Early research indicates that with datasets of less complexity, there tends not to be too big of a performance difference in the interpretative abilities of black and white box algorithms.

“However, there are datasets that, when fed into the black box algorithms, yielded far more accurate decisions than their opposite — sometimes, significantly more so. 

“We’re trying to understand the reasons for this, but very generally, it is possible that the difference in performance between these two types of algorithms has to do with the complexity of the data. Black box algorithms perform far better on highly complex datasets that are ‘very noisy’ and have large variances in their data.”

Against this backdrop, Evgeniou argues that policymakers and interest groups need to understand the following trade-off: “We can either forego the accuracy gains [of black box algorithms] altogether and use only white box algorithms. In this case, we knowingly develop an inferior, AI-based medical imaging product. 

“Alternatively, we use the black box algorithms, and if policymakers insist, researchers [have to] try to come up with explanation frameworks to, rather paradoxically, make sense of decisions that are fundamentally unknowable. 

“With the latter, we then have to ask ourselves: Are we as users OK with having some degree of explanation for these decisions, even if these explanations turn out to be wrong? 

“If so, are we then at risk of instilling a false sense of confidence in human operators, who might not fully appreciate the non-interpretable nature of black box algorithms?” 

For his part, Evgeniou says relying solely on white box algorithms is untenable. “For me, that is tantamount to abandoning innovation altogether. Without black box algorithms, we just won’t be able to develop the most accurate imaging solutions possible. As such, the idea of using white box algorithms alone is wrong.” 

These are unique and difficult issues to grapple with, and Evgeniou suggests that at this early stage of the innovation process, explanation frameworks should be limited to a particular class of highly specialised users. 

“There are multiple types of potential users for this technology — from patients, healthcare workers, general practitioners, product developers, engineers and manufacturers, and, of course, medical specialists. Different types of users have different ways of perceiving the conclusions drawn up by black box algorithms. 

“It may be preferable for now that explanation frameworks be provided to just the medical specialists and product developers, engineers and manufacturers.”

Ultimately, one question that regular people ought to ask themselves is this: “Am I okay with my medical specialist relying on an AI-derived imaging decision, even if the said specialist does not fully understand how the decision was arrived at?”