Return to Malmö
New clinical trial in Malmö, Sweden is pushing the boundaries of AI in breast cancer.
In an average year, Dr. Kristina Lång reviews more than 5,000 mammograms. Undisturbed, she can easily review 50-70 cases per hour. Remarkably, she is just one of five radiologists, responsible for screening 65,000 women for breast cancer in Malmö, the third largest city in Sweden.
Each mammogram reviewed by Lång gets placed in one of two buckets. The first is where most patients want to go – no cancer. The second is the one most of us dread – possible cancer, more diagnostic follow-up needed. This constant sifting of patients can have life-altering consequences. The earlier you detect cancer, the more effectively you can intervene. Mammography screening is therefore an essential tool in reducing the overall mortality of breast cancer.
It's a lot of exams to read and when you read these huge volumes of screening exams, it's easy to overlook subtle cancers.
But, mammograms are not perfect, and neither are radiologists. This is compounded by the fact that the great majority of cases are negative, and finding cancer cases is akin to finding needles in a giant haystack of images. As Lång admits, "It's a lot of exams to read and when you read these huge volumes of screening exams, it's easy to overlook subtle cancers."
These challenges have now led Lång to the cutting edge of AI in cancer. Her path here was hardly linear though. As she confides to me, “When choosing a medical specialty I was compelled by the visual side of medicine.” This led to dermatology, which led to radiology. Which ultimately led to a partnership with the Dutch AI company, Screenpoint Medical. Lång now uses AI to screen women for breast cancer. Over the past two years, she has led one of the largest clinical trials of AI in cancer. The trial – dubbed MASAI, has just reported interim results, and its early findings are already significant.
Dr. Kristina Lång is a breast radiologist at Lund University in Malmö, Sweden. She is now leading one of the largest clinical trials of AI in cancer. (Photo by Charlotte Carlberg Bärg).
To understand the implications of the MASAI trial, it helps to first understand two long-running threads that have now become fully intertwined. The first is the development of mammography itself; the second is the rapid advancement of deep learning.
The Road to Malmö
As a scientific idea, the roots of mammography can be traced back to 1913, when a German physician first started using X-rays to study breast cancer. But, mammography mostly languished until the 1960s, when a Houston-based physician named Robert Egan started to pick up the pieces. Egan is now widely credited with advancing the field of mammography to become a truly useful, clinical diagnostic tool.
However, even with significant advances in technology, it was not yet clear if routine mammography of healthy women would make a dent in breast cancer. The 60s, 70s and 80s therefore saw multiple large-scale clinical trials in the U.S., Scotland, and Canada, designed to answer this question.
It was in Sweden, at long last, that this stuttering legacy finally came to an end.
Clinical screening trials are tricky though, and unfortunately, these trials suffered from significant issues. In particular, women were supposed to be randomly assigned to trial arms (with mammography or not). But all manner of human biases and foibles went into assigning each woman to an arm. In one stunning example, the Canadian National Breast Screening Study was subject to forensic analysis and eyewitness interviews, and researchers eventually concluded that trial coordinators had subverted the randomization process and that the results of the study could not be trusted.
According to Siddhartha Mukherjee, a physician who wrote a history of cancer in the popular “Emperor of all Maladies”, randomization errors plagued all the early mammography screening trials, and “it was in Sweden, at long last, that this stuttering legacy finally came to an end." The Swedish trial started in 1976 and was based in the city of Malmö. Forty-seven years later, this coastal city is now home to over 700,000 people and the Skåne University Hospital, home to Lång’s AI trial.
The Malmö Mammography Study ended up enrolling more than 42,000 women and is considered a landmark milestone in the history of breast cancer. The trial also took 12 years to complete, and final results were not published until 1988. But, its findings have now stood the test of time. Mammography screening works. Specifically, in women over 50, mammography screening reduces breast cancer deaths by 20%.
Deep Learning
Five years after the publication of the Malmö trial, researchers started to use computers to analyze mammograms. From a computer perspective, mammograms are just computer images – digitized pixels stored on disk. The central question became: could computers detect cancer within these pixels? The programs were referred to as Computer Aided Detection or CADe, and they were specifically designed to assist radiologists. But, this was the 1990s, and computers weren’t all that good at computer vision. The CADe systems failed and were roundly criticized.
Over the past decade, all that has changed, and computers have become exceedingly good at analyzing images. This has been driven by the widespread use of artificial neural networks and deep learning techniques. Most computer scientists point to 2012 as a turning point. This is the year that a research team from the University of Toronto submitted a new deep learning AI model to the ImageNet competition. The model, later referred to as AlexNet, blew the competition away, beating all other competitors by an 11% margin. AlexNet went on to form the basis of most modern computer vision techniques, and has now been cited over 121,000 times, making it one of the most impactful papers of the past 15 years.
All our experiments suggest that our results can be improved simply by waiting for faster GPUs and bigger datasets to become available.
AlexNet combined multiple ideas. The first was the use of large training sets. Given an image and a category label, the model rapidly learns to distinguish categories. The more data, the better. Second, AlexNet added additional layers within the neural network. These additional layers were capable of detecting higher-level features within images, and therefore able to render more accurate verdicts. And, finally, AlexNet used Graphical Processing Units, or GPUs to greatly accelerate the training time. As the authors of AlexNet concluded: “All our experiments suggest that our results can be improved simply by waiting for faster GPUs and bigger datasets to become available.”
And, this is exactly what has happened in mammography.
Over the past decade, multiple research teams and multiple companies have re-applied these same ideas to the field of mammography. Create a large repository of mammography images, hire radiologists to carefully label each one, and train a deep learning network with GPUs.
The details differ in practice, but the general principles apply, and it works. Computers have become exceptionally good at detecting cancer.
Mammography-focussed AI start-up companies have also now sprung up across the world, in the U.S., the Netherlands, the UK, and South Korea. One such company is the Dutch-based ScreenPoint Medical. Started by two computer imaging experts, the company has been pursuing the use of AI in mammography screening for over ten years now. Their flagship product, Transpara is now used by multiple screening facilities and hospitals throughout the world. Several years ago, Lång connected with ScreenPoint, and a multi-year collaboration began.
To gain insight into Transpara, I spoke with Alejandro Rodríguez-Ruiz, VP of Clinical Strategy at ScreenPoint Medical. According to Rodríguez, Transpara has now been trained with millions of mammogram images. But, it's no longer difficult for companies to train AI to detect cancer. According to Rodríguez, “what is more difficult is to have a system that is going to be reliable and stable across all the possible situations where you are going to deploy your product."
To that end, Transpara has invested years in amassing one of the largest, most diverse sets of expertly annotated mammogram images. According to Rodríguez, the Transpara database now includes mammograms from the US, Europe, the Americas, Asia, and Australia. And, the database covers all the major mammogram imaging vendors, including Siemens, GE and Philips. The end result is a highly accurate AI system that can maintain its accuracy across multiple hospitals, regardless of variability in patient population or imaging technology.
AI has the potential to identify cancer missed by radiologists. In panel A, the mammogram of a 57 year old woman was reviewed by two independent radiologists, and both diagnosed her as negative for cancer. In panel B, the Transpara AI program from ScreenPoint Medical reviewed the same image and diagnosed the woman as high risk. The potentially cancerous region identified by the AI is outlined in blue. Fourteen months after her mammography, the patient was diagnosed with triple-negative breast cancer. Figure reprinted from: Lång, K., Hofvind, S., Rodríguez-Ruiz, A. & Andersson, I. Can artificial intelligence reduce the interval cancer rate in mammography screening? Eur Radiol 31, 5940–5947 (2021).
The MASAI Clinical Trial
Against this backdrop, Malmö is now back on the map, and Lång’s MASAI trial is set to advance the use of AI in cancer. Already, 100,000 women have participated in the trial. Each woman is randomly assigned to one of two arms. In the first arm, mammograms are read by two independent radiologists, as is standard practice in most of Europe – this is also in marked contrast to the US where only one radiologist reviews each patient. In the second arm, mammograms are first triaged by the Transpara AI system. If the AI detects a high probability of cancer, the mammogram is sent to two independent radiologists for further review. If the AI detects a lower probability of cancer, the mammogram is sent to one radiologist for further review.
The trial is designed to answer two questions. First, can an AI triage system reduce the workload of radiologists? Specifically, can an AI triage system with one radiologist replace the current European practice of two radiologists? Second, can an AI triage system actually improve cancer detection rates?
The interim results of the MASAI trial were published earlier this year, and the answer to the first question is a clear yes. An AI triage system with one radiologist is just as good, and just as safe as two independent radiologists. In fact, the AI triage arm resulted in a 44% reduction in radiologist workload, with no impact on overall patient safety. It’s currently unclear what the impact would be in the US, where the standard of care only requires one radiologist per exam. But, according to Lång, this is hugely significant in Europe, especially in Sweden: “We have a severe lack of breast radiologists in Sweden, and we are really struggling to maintain our workforce.”
But, according to Lång, “the important question is if we can increase the effect of screening, meaning that we can further reduce the mortality of breast cancer, and that’s the important question.”
If the interval cancer goes down, we can say that we could actually use AI to reduce the mortality of breast cancer.
To get at that question, Lång is focussed on following all 100,000 women for two years, and tracking for specific events. In the most important case, she is tracking women who had negative mammograms, but subsequently developed breast cancer symptoms several months after the screening. In clinical terms, these are referred to as “interval cancers”, and they may represent an important category of mammograms that are missed by radiologists, but may be picked up by highly sensitive AI. And, this is the question that most interests Lång: “if the interval cancer goes down, we can say that we could actually use AI to reduce the mortality of breast cancer.”
At this point, it is too early to render a verdict on this question. Lång plans to track all patients throughout 2024, and publish the final trial results in 2025. But, early clues are promising. According to the interim results, the AI triage arm actually identified 20% more cancers than the control arm. And, previous retrospective studies by Lång also indicate that the AI is capable of identifying cancers missed by radiologists. Given enough follow-up time, Lång will be able to more fully compare the two arms of the trial, and render a verdict. It’s quite possible that the AI arm will result not only in a reduction in radiologist workload, but also better outcomes for patients.
When I ask Lång to put all this into perspective, it becomes more clear that she is not a zealous advocate for AI. Rather, she is a passionate advocate for her patients, and a committed believer in the scientific method. She is also a firm believer in clinical trials, and their ability to render objective truths about clinical interventions. I also get the sense that she is OK with any outcome to the MASAI trial, believing that the trial will simply become the building block for the next, better trial.
Maybe it's just a matter of time, maybe in five years we will be obsolete doing screen reading. That's a plausible scenario.
Six years ago, the New Yorker ran a piece on AI in medicine. Geoffrey Hinton from the University of Toronto and widely regarded as one of the “godfathers” of modern AI was specifically asked about the impact of deep learning on radiology. His response was clear-cut: "I think that if you work as a radiologist you are like Wile E. Coyote in the cartoon. You're already over the edge of the cliff, but you haven't looked down. There's no ground underneath."
When I asked Lång to project her own work forward, and how it might affect radiologists, she seemed nonplussed. If AI ultimately ends up being better for patients, so be it and that is good for patients. As she concludes: "maybe it's just a matter of time, maybe in five years we will be obsolete doing screen reading. That's a plausible scenario."
Ethan Cerami, Ph.D. is the Director of the Knowledge Systems Group at Dana-Farber Cancer Institute in Boston, MA. Special thanks to Bill Lotter and Julia Moore Vogel for reviewing the article and providing feedback.