Learning neuroimaging models from health system-scale data

Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.), Vol. 139 of Proceedings of Machine Learning Research 8748–8763 (PMLR, 2021).

Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.), Vol. 139 of Proceedings of Machine Learning Research 8821–8831 (PMLR, 2021).

Alayrac, J.-B. et al. Flamingo: a visual language model for few-shot learning. In Advances in Neural Information Processing Systems, Vol. 35 (eds Koyejo, S. et al.) 23716–23736 (Curran Associates, 2022).

Dreisbach, J. N. & Lukin, R. Where have all the neuroradiologists gone? AJNR Am. J. Neuroradiol. 22, 1636–1638 (2001).

CAS
PubMed
PubMed Central

Google Scholar

Rula, E. Y. Radiology workforce shortage and growing demand: something has to give. https://www.acr.org/Practice-Management-Quality-Informatics/ACR-Bulletin/Articles/July-2024/Radiology-Workforce-Shortage-and-Growing-Demand-Something-Has-to-Give (2024).

Christensen, E. W. et al. Association of state share of nonphysician practitioners with diagnostic imaging ordering among emergency department visits for medicare beneficiaries. JAMA Netw. Open 5, e2241297 (2022).

Article
PubMed
PubMed Central

Google Scholar

Fawzy, N. A. et al. Incidence and factors associated with burnout in radiologists: a systematic review. Eur. J. Radiol. Open 11, 100530 (2023).

Article

Google Scholar

Krupinski, E. A., Berbaum, K. S., Caldwell, R. T., Schartz, K. M. & Kim, J. Long radiology workdays reduce detection and accommodation accuracy. J. Am. Coll. Radiol. 7, 698–704 (2010).

Article
PubMed
PubMed Central

Google Scholar

Ivanovic, V. et al. Neuroradiology diagnostic errors at a tertiary academic centre: effect of participation in tumour boards and physician experience. Clin. Radiol. 77, 607–612 (2022).

Article
CAS
PubMed

Google Scholar

Ivanovic, V. et al. Factors associated with neuroradiology diagnostic errors at a large tertiary-care academic medical center: a case-control study. Am. J. Roentgenol. 221, 355–362 (2023).

Article

Google Scholar

O’Neill, T. J. et al. Active reprioritization of the reading worklist using artificial intelligence has a beneficial effect on the turnaround time for interpretation of head CT with intracranial hemorrhage. Radiol. Artif. Intell. 3, e200024 (2021).

Article
PubMed

Google Scholar

Shin, H. J., Han, K., Ryu, L. & Kim, E.-K. The impact of artificial intelligence on the reading times of radiologists for chest radiographs. npj Digit. Med. 6, 82 (2023).

Article
PubMed
PubMed Central

Google Scholar

Alexander, R. et al. Mandating limits on workload, duty, and speed in radiology. Radiology 304, 274–282 (2022).

Article
PubMed

Google Scholar

DeBenedectis, C. M. et al. Health care disparities in radiology—a review of the current literature. J. Am. Coll. Radiol. 19, 101–111 (2022).

Article
PubMed

Google Scholar

Gauriau, R. et al. A deep learning-based model for detecting abnormalities on brain MR images for triaging: preliminary results from a multisite experience. Radiol. Artif. Intell. 3, e200184 (2021).

Article
PubMed
PubMed Central

Google Scholar

Barbano, C. A., Brunello, M., Dufumier, B. & Grangetto, M. Anatomical foundation models for brain MRIs. Pattern Recognition Letters 199, 178–184 (2026).

Article

Google Scholar

OpenAI. GPT-4 technical report. Preprint at https://arxiv.org/pdf/2303.08774 (2023).

Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10684–10695 (2022).

Dosovitskiy, A. et al. An image is worth 16 × 16 words: transformers for image recognition at scale. In 9th International Conference on Learning Representations (OpenReview.net, 2021).

Darcet, T., Oquab, M., Mairal, J. & Bojanowski, P. Vision transformers need registers. In The Twelfth International Conference on Learning Representations (eds Kim, B. et al.) 2632–2652 (2024).

Zhang, K. et al. Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography. Cell 181, 1423–1433.e11 (2020).

Article
PubMed
PubMed Central

Google Scholar

Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).

Article
PubMed
PubMed Central

Google Scholar

Bannur, S. et al. Learning to exploit temporal structure for biomedical vision-language processing. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 15016–15027 (2023).

Wang, Y. et al. Enhancing vision-language models for medical imaging: bridging the 3D gap with innovative slice selection. Neural Inf. Process. Syst. 37, 99947–99964 (2024).

Google Scholar

Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).

Radford, A. et al. Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019).

Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. In Proc. 37th International Conference on Neural Information Processing Systems 34892–34916 (2023).

Eslami, S., Meinel, C. & De Melo, G. PubMedCLIP: how much does CLIP benefit visual question answering in the medical domain? In Findings of the Association for Computational Linguistics: EACL 2023 (eds Vlachos, A. & Augenstein, I.) 1151–1163 (ACL, 2023).

Zhang, S. et al. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. Preprint at https://arxiv.org/abs/2303.00915 (2023).

Moor, M. et al. Med-Flamingo: a multimodal medical few-shot learner. In Proc. 3rd Machine Learning for Health Symposium (eds Hegselmann, S. et al.) 353–367 (PMLR, 2023).

Kaplan, J. et al. Scaling laws for neural language models. Preprint at https://arxiv.org/abs/2001.08361 (2020).

Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. On calibration of modern neural networks. In Proc. 34th International Conference on Machine Learning 1321– 1330 (PMLR, 2017).

Di Martino, A. et al. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol. Psychiatry 19, 659–667 (2014).

Article
PubMed

Google Scholar

Petersen, R. C. et al. Alzheimer’s disease neuroimaging initiative (ADNI) clinical characterization. Neurology 74, 201–209 (2010).

Article
PubMed
PubMed Central

Google Scholar

Marcus, D. S. et al. Open Access Series of Imaging Studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19, 1498–1507 (2007).

Article
PubMed

Google Scholar

Lee, J. et al. Deep learning-based brain age prediction in normal aging and dementia. Nat. Aging 2, 412–424 (2022).

Article
PubMed
PubMed Central

Google Scholar

Bashyam, V. M. et al. MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain 143, 2312–2324 (2020).

Article
PubMed
PubMed Central

Google Scholar

Baid, U. et al. The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. Preprint at https://arxiv.org/abs/2107.02314 (2021).

Rudie, J. D. et al. The University of California San Francisco Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) MRI dataset. Radiol. Artif. Intell. 6, e230126 (2024).

Article
PubMed
PubMed Central

Google Scholar

Oermann, E. et al. Longitudinal deep neural networks for assessing metastatic brain cancer on a massive open benchmark. Nat. Commun. 15, 8170 (2024).

Liu, C.-F. et al. A large public dataset of annotated clinical MRIs and metadata of patients with acute stroke. Sci. Data 10, 548 (2023).

Article
PubMed
PubMed Central

Google Scholar

Wiens, J. et al. Do no harm: a roadmap for responsible machine learning for health care. Nat. Med. 25, 1337–1340 (2019).

Article
CAS
PubMed

Google Scholar

Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’ Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Krishnapuram, B. et al.) 1135–1144 (2016).

Smith, J. S. et al. Role of extent of resection in the long-term outcome of low-grade hemispheric gliomas. J. Clin. Oncol. 26, 1338–1345 (2008).

Article
PubMed

Google Scholar

Waite, S., Scott, J. & Colombo, D. Narrowing the gap: imaging disparities in radiology. Radiology 299, 27–35 (2021).

Article
PubMed

Google Scholar

Barocas, S., Hardt, M. & Narayanan, A. Fairness and Machine Learning: Limitations and Opportunities (MIT Press, 2023).

Rajpurkar, P. & Topol, E. J. A clinical certification pathway for generalist medical AI systems. Lancet 405, 20 (2025).

Article
PubMed

Google Scholar

Ivanovic, V. et al. Impact of shift volume on neuroradiology diagnostic errors at a large tertiary academic center. Acad. Radiol. 30, 1584–1588 (2023).

Article
PubMed

Google Scholar

Babiarz, L. S. & Yousem, D. M. Quality control in neuroradiology: discrepancies in image interpretation among academic neuroradiologists. AJNR Am. J. Neuroradiol. 33, 37–42 (2012).

Article
CAS
PubMed
PubMed Central

Google Scholar

Wu, M. Z., McInnes, M. D. F., Macdonald, D. B., Kielar, A. Z. & Duigenan, S. CT in adults: systematic review and meta-analysis of interpretation discrepancy rates. Radiology 270, 717–735 (2014).

Article
PubMed

Google Scholar

Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).

Article
PubMed

Google Scholar

Moor, M. et al. Med-Flamingo: a multimodal medical few-shot learner. In Machine Learning for Health (ML4H) 353–367 (PMLR, 2023).

Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Blankemeier, L. et al. Merlin: a vision language foundation model for 3D computed tomography. Preprint at https://www.researchsquare.com/article/rs-4546309/v1 (2024).

Elliott, L. T. et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 562, 210–216 (2018).

Article
CAS
PubMed
PubMed Central

Google Scholar

Kickingereder, P. et al. Automated quantitative tumour response assessment of MRI in neuro-oncology with artificial neural networks: a multicentre, retrospective study. Lancet Oncol. 20, 728–740 (2019).

Article
PubMed

Google Scholar

Wood, D. A. et al. A self-supervised text-vision framework for automated brain abnormality detection. Preprint at https://arxiv.org/abs/2405.02782 (2024).

Ghosh, S., Poynton, C. B., Visweswaran, S. & Batmanghelich, K. Mammo-CLIP: a vision language foundation model to enhance data efficiency and robustness in mammography. In Proc. International Conference on Medical Image Computing and Computer-assisted Intervention 632–642 (Springer, 2024).

van den Oord, A., Vinyals, O. & Kavukcuoglu, K. Neural discrete representation learning. In Advances in Neural Information Processing Systems, Vol. 30 (eds Guyon, I. et al.) (2017).

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with CLIP latents. Preprint at https://arxiv.org/pdf/2204.06125 (2022).

Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

Google Scholar

Chien, A. et al. AI-assisted summarization of radiologic reports: evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical. AJNR Am. J. Neuroradiol. 45, 244–248 (2024).

Article
PubMed
PubMed Central

Google Scholar

Ranjit, M., Ganapathy, G., Manuel, R. & Ganu, T. Retrieval augmented chest X-ray report generation using OpenAI GPT models. In Proc. Machine Learning for Healthcare Conference (eds Deshpande, K. et al.) 650–666 (PMLR, 2023).

Adams, L. C. et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: a multilingual feasibility study. Radiology 307, e230725 (2023).

Article
PubMed

Google Scholar

Titano, J. J. et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat. Med. 24, 1337–1341 (2018).

Article
CAS
PubMed

Google Scholar

Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems (eds Guyon, I. et al.) Vol. 30 (Curran Associates, Inc., 2017).

Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52, 91–118 (2003).

Article

Google Scholar

Kondepudi, A. et al. Foundation models for fast, label-free detection of glioma infiltration. Nature 637, 439–445 (2025).

Article
CAS
PubMed

Google Scholar

Cheng, J., Wang, Z. & Pollastri, G. A neural network approach to ordinal regression. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) 1279–1284 (2008).

Saratxaga, C. L. et al. MRI deep learning-based solution for Alzheimer’s disease prediction. J. Pers. Med. 11, 902 (2021).

Article
PubMed
PubMed Central

Google Scholar

Li, J., Li, D., Savarese, S. & Hoi, S. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In Proc. International Conference on Machine Learning 19730–19742 (PMLR, 2023).

Chen, Q. & Hong, Y. MedBLIP: bootstrapping language-image pre-training from 3D medical images and texts. In Proc. Asian Conference on Computer Vision (eds Cho, M. et al.) 2404–2420 (2024).

Liu, H., Li, C., Li, Y. & Lee, Y. J. Improved baselines with visual instruction tuning. In Proceedings IEEE/CVF Conference on Computer Vision and Pattern Recognition 26296–26306 (2024).

Li, C. et al. LLaVA-Med: training a large language-and-vision assistant for biomedicine in one day. In Advances in Neural Information Processing Systems, Vol. 36 (eds Oh, A. et al.) 28541–28564 (Curran Associates, Inc., 2023).

Zhu, C., Wang, T., Zhang, W., Pang, J. & Liu, X. LLaVA-3D: a simple yet effective pathway to empowering LMMs with 3D-awareness. In Proc. IEEE/CVF International Conference on Computer Vision 4295–4305 (2025).

Hardt, M., Price, E. & Srebro, N. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, Vol. 29 (eds Lee, D. et al.) (2016).

Vaidya, A. et al. Demographic bias in misdiagnosis by computational pathology models. Nat. Med. 30, 1174–1190 (2024).

Article
CAS
PubMed

Google Scholar

Learning neuroimaging models from health system-scale data

Tags: