Mount Sinai finds deep learning algorithms inconsistent when applied to outside imaging data sets

The team used convolutional neural networks on its own chest X-rays as well as imaging from NIH and Indiana University and said they did not perform equally.

By Tom Sullivan

November 07, 2018

10:40 AM

Researchers at Mount Sinai’s Icahn School of Medicine found that the same deep learning algorithms diagnosing pneumonia in their own chest x-rays did not work as well when applied to images from the National Institutes of Health and the Indiana University Network for Patient Care.

ON THE RECORD

Researchers wrote in the journal PLOS: “Early results in using convolutional neural networks (CNNs) on X-rays to diagnose disease have been promising, but it has not yet been shown that models trained on X-rays from one hospital or one group of hospitals will work equally well at different hospitals. Before these tools are used for computer-aided diagnosis in real-world clinical settings, we must verify their ability to generalize across a variety of hospital systems.”

WHY IT MATTERS

With all the hype around AI and machine learning holding potential to drastically improve radiology, if not one day replace radiologists, Mount Sinai’s findings point to real-world realities that “estimates of CNN performance based on test data from hospital systems used for model training may overstate their likely real-world performance,” the researchers said.

THE BIGGER TREND

Given our Focus on Artificial Intelligence in November we are reporting on many of the benefits — and cutting through the hype — of machine learning, cognitive computing and a host of other AI-related terminology.

AI and ML, in fact, ranked second to only analytics in our HIMSS Media research about which technologies healthcare professionals anticipate will drive the most innovation moving forward.

The Mount Sinai findings also highlight the fact that plenty of work remains for AI tech to be ubiquitous in healthcare.

“A difficulty of using deep learning models in medicine is that they use a massive number of parameters, making it difficult to identify the specific variables driving predictions,” the researchers said. “Even the development of customized deep learning models that are trained, tuned, and tested with the intent of deploying at a single site are not necessarily a solution that can control for potential confounding variables.”