I recently had the opportunity to work on a review paper for Cancer Discovery on AI in Oncology.
In the review, we attempted to describe the broad landscape of AI applications in oncology — everything from AI in mammography to AI for clinical trial matching.
One lens that we used to survey the landscape was FDA clearances for AI devices in cancer. There are now hundreds of such clearances across dozens of companies. For example, multiple companies now have FDA-cleared AI models for detecting cancer in mammograms, or detecting polyps in colonoscopy images.
The Role of the FDA
In the US, AI devices are evaluated and regulated by the Food and Drug Administration (FDA). Historically, companies have a role in developing new medicines and new medical devices. But, the government also has a role. When it comes to foods and drugs, its primary job is to protect people’s safety and ensure that new commercial inventions actually result in true benefit. If all this seems quaint, and government overreach, check out what the US was like before the FDA — hint, hint not good.
For new drugs, the FDA has strict regulations and clinical trial requirements. For software, it’s not quite as rigorous. But, the FDA does require that companies marketing medical software go through a vetting process. AI software is no different, and is required to go through the same vetting process. If you want more details about how all this works, I recommend that you check out How the FDA Regulates AI.
Taking one step back, all FDA clearances in healthcare to-date are firmly planted in the “world before ChatGPT”. By that, I mean that current FDA clearances are based on narrowly defined AI tasks, such as detecting cancer in mammography images. Given a narrowly defined task, a locked AI model, and a carefully curated reference set, companies can provide evidence to the FDA that their AI devices are safe and provide medical value. The healthcare AI devices cleared so far are also not designed to be “self-driving”, meaning that they do not render final medical judgement, and are designed to merely assist clinicians.
What’s next?
Fast-forward to the world of generative AI and Large Language Models. We now have AI models that are capable of quite extraordinary and open-ended tasks. We have AI models that can pass medical exams and AI models that can make cancer therapy recommendations. We even have companies working on virtual health-care agents that will converse with patients and provide medical guidance after hospital discharge (side note: I find this last use case particularly creepy).
All of this sounds like the stuff of science fiction. But, the central questions remain: will this AI be safe for patients? Will this AI result in true patient benefit? And, what role should the government have in ensuring the safety and efficacy of such AI applications?
Given the billions of dollars now pouring into generative AI, potential regulation of medical applications is going to be a really big deal. I suspect the valuation and even long-term viability of many AI companies will hinge on these questions.
Core Questions
While I have no answers, I do have questions. Specifically:
👉 Given the nearly limitless set of inputs and outputs of generative AI, how will we evaluate their safety or efficacy?
👉 Does the FDA have the AI expertise needed to evaluate generative AI healthcare applications?
👉 How high should the bar be for autonomous AI, e.g. health care agents that converse directly with patients?
👉 What happens if an AI model hallucinates the wrong answer and a patient is harmed? Who is liable? The company behind the AI model? The doctor? The hospital? The regulators?
👉 How do we evaluate long-term impact of healthcare workers using AI, and their potential drift towards over-reliance on AI and automation bias?
Envisioning the Future
One of the most thoughtful people trying to tackle these issues is Stephen Gilbert from the Dresden University of Technology. Gilbert has published several papers on the topic, including Large language model AI chatbots require approval as medical devices, and Guardrails for the use of generalist AI in cancer care.
This last piece is particularly provocative, as even Gilbert — who has spent countless hours thinking on this topic — does not have a crystal ball of the future.
In his view, we really only have three options. Option 1 is that regulators accept that we have no framework to objectively evaluate generative AI and decide to completely block all such applications. This keeps patients safe, but stifles all innovation. Option 2 is that regulators force generative AI applications into more narrowly defined medical tasks, where they can be more objectively evaluated. This might seem a reasonable compromise, but one of the most dazzling elements of generative AI is its extreme flexibility, especially when tackling new questions and new domains. Option 3 is that we take multiple steps back and radically rethink the entire regulatory framework.
Option 3 is most probably the way to go, but it will require huge stakeholder engagement across government, industry and healthcare providers. It will also require that regulators keep pace with advances in AI. Regulators also don’t want to be too heavy handed. Governments certainly have a significant role in protecting the public, but they also have a historical role in promoting innovation.
Getting the balance right is going to be quite the challenge, and the next few years should certainly be interesting.
As always, drop me a note below, and let me know what you think.