Led by Curtis Langlotz, MD, PhD, professor of radiology and of biomedical informatics research, and Akshay Chaudhari, PhD, assistant professor of radiology and of biomedical data science, researchers have developed RoentGen — an open AI model that produces realistic synthetic X-rays from medical descriptions.

Video Title Image

Transcript:

Langlotz:
There’s a huge gap in medical machine learning right now, a data gap, particularly when you have rare diseases, uncommon conditions. We don’t have enough data to train these models, and synthetic data can be a big piece of that puzzle.

Sergios Gatidis:
Dysplastic appearance of the first distal phalanx base, the fourth middle phalanx head, and the metaphyses of the fourth and fifth metacarpals.

Langlotz:
Back when I studied AI in the 1980s, it might take four years and a PhD, and we would develop a system that could work on three or four patients. Today with the right training data, we can build a system in a week that has better accuracy than anything we built back then. The fact that radiology has been a digital specialty now for almost 20 years is one of the reasons I think radiology is leading the way in AI.

Chaudhari:
One of the biggest challenges that we face in radiological AI research is we usually don’t have large datasets to train our AI models on. To be able to train AI models, we always need to curate large and high-quality datasets. The high quality is sometimes easy, but the large part can be a little bit tricky. Maybe I need to aggregate a million or so chest X-rays, but what if I only had 100,000 images? If we have small data sets, how can we still build high-quality AI models?

Langlotz:
We had students who saw what was happening outside of medicine with models that can produce images based on a text prompt. Show me a kangaroo and Ray-Ban sunglasses in the style of Rembrandt, and it’ll create a painting for you. I wonder what would happen if we asked one of these models what a chest X-ray looks like. What they got back was really kind of a cartoon of a chest X-ray. It didn’t look a lot like a chest X-ray, and these students retrained that model.

Chaudhari:
As soon as I saw that tool, I was blown away. Can we convert this toy into a useful resource for researchers and instead of training AI models on real data; could we create some of these synthetic images that resemble what real images might look like? And that’s how the RoentGen model came to be. Let’s look at how this model actually works. I’ll start with this image of a dog. Then I’ll add a little bit of normal noise to this, and this noise follows a particular pattern that we’ve pre-computed and all we’re doing is we’re training a simple model to do denoising to go back to the original image. Then you can take this noisy image and then add a little bit more noise to that, and then we’ll actually use the same denoising model to de-noise this very noisy image, to try to go back to the original clean image. We’ll keep progressively adding more and more noise. We’ll get to an image that’s noise only. We can start off with some random noise distribution and we can run our denoising model, which will go step by step back, but instead of using dogs, we’ll actually be using chest X-rays as we did. We also use the radiology report information to actually guide how this denoising process would work.

Langlotz:
The purpose of RoentGen would be to produce additional data that would be used to train an AI software tool to provide additional accuracy for radiologists who are working in the clinic, helping them identify disease sooner and identify disease that they might miss, and then extend it to other areas like CTs and MRIs and ultrasounds.

Chaudhari:
We can use RoentGen to be able to create some synthetic images of what would a patient look like if they had pneumonia. Another research group might be interested in trying to identify cardiomegaly, which is an enlargement of the heart. We just want to be the tide that can lift all boats. We really want to be able to create these high-quality datasets that will cater to all the downstream tasks.

Langlotz:
RoentGen could be used to reduce bias, and we implement algorithms by creating synthetic data for some subgroups where we don’t have enough training data. That means better privacy for patients. It means more accurate AI models and it means more responsible implementation of AI algorithms.

Stefania Moroianu:
Hey guys. One thing I wanted to share today about RoentGen v2 is this interactive inference demo where we can take a closer look at how the model responds to various prompts. One of the cool updates for RoentGen v2 versus the v1 is we introduced this demographic information in the prompt where we can condition on age, race and sex of the patient. And so for example, here we show a male normal chest X-ray and it should be the case that if I change it to female, we should visually see a change and the chest X-ray corresponding to a female.

Chaudhari:
Stanford is really trying to build advanced AI algorithms to work with health care data. So that’s why we’re really excited to be able to do this research, to understand what the capabilities are. I started realizing all the different applications one could have. How can we analyze what’s in these images? Can we predict future diseases a patient might have? Can we help the radiologist in writing the radiologist report? It does seem like that is the imminent future, at least for large-scale imaging models. Combine the potential of these models with what we do in radiology and try to have a win-win situation.

Langlotz:
I think the open source software movement is incredibly important and is likely to be the future of AI, but really increases the pace of progress and makes it more likely that we’ll build these tools faster and innovate more quickly.

Chaudhari:
And at the end of the day, we don’t just want our models to be used at Stanford. We want our models to help patients all across the world. So if we can open source some of the tools that we’re building, hopefully the whole world can benefit, that can help translate better health care solutions.

Learn more: Synthetic data