Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.), Vol. 139 of Proceedings of Machine Learning Research 8748–8763 (PMLR, 2021).
Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.), Vol. 139 of Proceedings of Machine Learning Research 8821–8831 (PMLR, 2021).
Alayrac, J.-B. et al. Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems, Vol. 35 (eds Koyejo, S. et al.) 23716–23736 (Curran Associates, 2022).
Dreisbach, J. N. & Lukin, R. Where have all the neuroradiologists gone? AJNR Am. J. Neuroradiol. 22, 1636–1638 (2001).
Rula, E. Y. Radiology workforce shortage and growing demand: something has to give. https://www.acr.org/Practice-Management-Quality-Informatics/ACR-Bulletin/Articles/July-2024/Radiology-Workforce-Shortage-and-Growing-Demand-Something-Has-to-Give (2024).
Christensen, E. W. et al. Association of state share of nonphysician practitioners with diagnostic imaging ordering among emergency department visits for medicare beneficiaries. JAMA Netw. Open 5, e2241297 (2022).
Fawzy, N. A. et al. Incidence and factors associated with burnout in radiologists: a systematic review. Eur. J. Radiol. Open 11, 100530 (2023).
Krupinski, E. A., Berbaum, K. S., Caldwell, R. T., Schartz, K. M. & Kim, J. Long radiology workdays reduce detection and accommodation accuracy. J. Am. Coll. Radiol. 7, 698–704 (2010).
Ivanovic, V. et al. Neuroradiology diagnostic errors at a tertiary academic centre: effect of participation in tumour boards and physician experience. Clin. Radiol. 77, 607–612 (2022).
Ivanovic, V. et al. Factors associated with neuroradiology diagnostic errors at a large tertiary-care academic medical center: a case-control study. Am. J. Roentgenol. 221, 355–362 (2023).
O’Neill, T. J. et al. Active reprioritization of the reading worklist using artificial intelligence has a beneficial effect on the turnaround time for interpretation of head CT with intracranial hemorrhage. Radiol. Artif. Intell. 3, e200024 (2021).
Shin, H. J., Han, K., Ryu, L. & Kim, E.-K. The impact of artificial intelligence on the reading times of radiologists for chest radiographs. npj Digit. Med. 6, 82 (2023).
Alexander, R. et al. Mandating limits on workload, duty, and speed in radiology. Radiology 304, 274–282 (2022).
DeBenedectis, C. M. et al. Health care disparities in radiology—a review of the current literature. J. Am. Coll. Radiol. 19, 101–111 (2022).
Gauriau, R. et al. A deep learning-based model for detecting abnormalities on brain MR images for triaging: preliminary results from a multisite experience. Radiol. Artif. Intell. 3, e200184 (2021).
Barbano, C. A., Brunello, M., Dufumier, B. & Grangetto, M. Anatomical foundation models for brain MRIs. Pattern Recognition Letters 199, 178–184 (2026).
OpenAI. GPT-4 technical report. Preprint at https://arxiv.org/pdf/2303.08774 (2023).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10684–10695 (2022).
Dosovitskiy, A. et al. An image is worth 16 × 16 words: transformers for image recognition at scale. In 9th International Conference on Learning Representations (OpenReview.net, 2021).
Darcet, T., Oquab, M., Mairal, J. & Bojanowski, P. Vision transformers need registers. In The Twelfth International Conference on Learning Representations (eds Kim, B. et al.) 2632–2652 (2024).
Zhang, K. et al. Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography. Cell 181, 1423–1433.e11 (2020).
Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).
Bannur, S. et al. Learning to exploit temporal structure for biomedical vision-language processing. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 15016–15027 (2023).
Wang, Y. et al. Enhancing vision-language models for medical imaging: bridging the 3D gap with innovative slice selection. Neural Inf. Process. Syst. 37, 99947–99964 (2024).
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019).
Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. In Proc. 37th International Conference on Neural Information Processing Systems 34892–34916 (2023).
Eslami, S., Meinel, C. & De Melo, G. PubMedCLIP: how much does CLIP benefit visual question answering in the medical domain? In Findings of the Association for Computational Linguistics: EACL 2023 (eds Vlachos, A. & Augenstein, I.) 1151–1163 (ACL, 2023).
Zhang, S. et al. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. Preprint at https://arxiv.org/abs/2303.00915 (2023).
Moor, M. et al. Med-Flamingo: a multimodal medical few-shot learner. In Proc. 3rd Machine Learning for Health Symposium (eds Hegselmann, S. et al.) 353–367 (PMLR, 2023).
Kaplan, J. et al. Scaling laws for neural language models. Preprint at https://arxiv.org/abs/2001.08361 (2020).
Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. On calibration of modern neural networks. In Proc. 34th International Conference on Machine Learning 1321– 1330 (PMLR, 2017).
Di Martino, A. et al. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry 19, 659–667 (2014).
Petersen, R. C. et al. Alzheimer’s disease neuroimaging initiative (ADNI) clinical characterization. Neurology 74, 201–209 (2010).
Marcus, D. S. et al. Open Access Series of Imaging Studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19, 1498–1507 (2007).
Lee, J. et al. Deep learning-based brain age prediction in normal aging and dementia. Nat. Aging 2, 412–424 (2022).
Bashyam, V. M. et al. MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain 143, 2312–2324 (2020).
Baid, U. et al. The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. Preprint at https://arxiv.org/abs/2107.02314 (2021).
Rudie, J. D. et al. The University of California San Francisco Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) MRI dataset. Radiol. Artif. Intell. 6, e230126 (2024).
Oermann, E. et al. Longitudinal deep neural networks for assessing metastatic brain cancer on a massive open benchmark. Nat. Commun. 15, 8170 (2024).
Liu, C.-F. et al. A large public dataset of annotated clinical MRIs and metadata of patients with acute stroke. Sci. Data 10, 548 (2023).
Wiens, J. et al. Do no harm: a roadmap for responsible machine learning for health care. Nat. Med. 25, 1337–1340 (2019).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’ Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Krishnapuram, B. et al.) 1135–1144 (2016).
Smith, J. S. et al. Role of extent of resection in the long-term outcome of low-grade hemispheric gliomas. J. Clin. Oncol. 26, 1338–1345 (2008).
Waite, S., Scott, J. & Colombo, D. Narrowing the gap: imaging disparities in radiology. Radiology 299, 27–35 (2021).
Barocas, S., Hardt, M. & Narayanan, A. Fairness and Machine Learning: Limitations and Opportunities (MIT Press, 2023).
Rajpurkar, P. & Topol, E. J. A clinical certification pathway for generalist medical AI systems. Lancet 405, 20 (2025).
Ivanovic, V. et al. Impact of shift volume on neuroradiology diagnostic errors at a large tertiary academic center. Acad. Radiol. 30, 1584–1588 (2023).
Babiarz, L. S. & Yousem, D. M. Quality control in neuroradiology: discrepancies in image interpretation among academic neuroradiologists. AJNR Am. J. Neuroradiol. 33, 37–42 (2012).
Wu, M. Z., McInnes, M. D. F., Macdonald, D. B., Kielar, A. Z. & Duigenan, S. CT in adults: systematic review and meta-analysis of interpretation discrepancy rates. Radiology 270, 717–735 (2014).
Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).
Moor, M. et al. Med-Flamingo: a multimodal medical few-shot learner. In Machine Learning for Health (ML4H) 353–367 (PMLR, 2023).
Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
Blankemeier, L. et al. Merlin: a vision language foundation model for 3D computed tomography. Preprint at https://www.researchsquare.com/article/rs-4546309/v1 (2024).
Elliott, L. T. et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 562, 210–216 (2018).
Kickingereder, P. et al. Automated quantitative tumour response assessment of MRI in neuro-oncology with artificial neural networks: a multicentre, retrospective study. Lancet Oncol. 20, 728–740 (2019).
Wood, D. A. et al. A self-supervised text-vision framework for automated brain abnormality detection. Preprint at https://arxiv.org/abs/2405.02782 (2024).
Ghosh, S., Poynton, C. B., Visweswaran, S. & Batmanghelich, K. Mammo-CLIP: a vision language foundation model to enhance data efficiency and robustness in mammography. In Proc. International Conference on Medical Image Computing and Computer-assisted Intervention 632–642 (Springer, 2024).
van den Oord, A., Vinyals, O. & Kavukcuoglu, K. Neural discrete representation learning. In Advances in Neural Information Processing Systems, Vol. 30 (eds Guyon, I. et al.) (2017).
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with CLIP latents. Preprint at https://arxiv.org/pdf/2204.06125 (2022).
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Chien, A. et al. AI-assisted summarization of radiologic reports: evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical. AJNR Am. J. Neuroradiol. 45, 244–248 (2024).
Ranjit, M., Ganapathy, G., Manuel, R. & Ganu, T. Retrieval augmented chest X-ray report generation using OpenAI GPT models. In Proc. Machine Learning for Healthcare Conference (eds Deshpande, K. et al.) 650–666 (PMLR, 2023).
Adams, L. C. et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: a multilingual feasibility study. Radiology 307, e230725 (2023).
Titano, J. J. et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat. Med. 24, 1337–1341 (2018).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems (eds Guyon, I. et al.) Vol. 30 (Curran Associates, Inc., 2017).
Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52, 91–118 (2003).
Kondepudi, A. et al. Foundation models for fast, label-free detection of glioma infiltration. Nature 637, 439–445 (2025).
Cheng, J., Wang, Z. & Pollastri, G. A neural network approach to ordinal regression. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) 1279–1284 (2008).
Saratxaga, C. L. et al. MRI deep learning-based solution for Alzheimer’s disease prediction. J. Pers. Med. 11, 902 (2021).
Li, J., Li, D., Savarese, S. & Hoi, S. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In Proc. International Conference on Machine Learning 19730–19742 (PMLR, 2023).
Chen, Q. & Hong, Y. MedBLIP: bootstrapping language-image pre-training from 3D medical images and texts. In Proc. Asian Conference on Computer Vision (eds Cho, M. et al.) 2404–2420 (2024).
Liu, H., Li, C., Li, Y. & Lee, Y. J. Improved baselines with visual instruction tuning. In Proceedings IEEE/CVF Conference on Computer Vision and Pattern Recognition 26296–26306 (2024).
Li, C. et al. LLaVA-Med: training a large language-and-vision assistant for biomedicine in one day. In Advances in Neural Information Processing Systems, Vol. 36 (eds Oh, A. et al.) 28541–28564 (Curran Associates, Inc., 2023).
Zhu, C., Wang, T., Zhang, W., Pang, J. & Liu, X. LLaVA-3D: a simple yet effective pathway to empowering LMMs with 3D-awareness. In Proc. IEEE/CVF International Conference on Computer Vision 4295–4305 (2025).
Hardt, M., Price, E. & Srebro, N. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, Vol. 29 (eds Lee, D. et al.) (2016).
Vaidya, A. et al. Demographic bias in misdiagnosis by computational pathology models. Nat. Med. 30, 1174–1190 (2024).