Agusti, A., Vogelmeier, C. F. & Halpin, D. M. G. Tackling the global burden of lung disease through prevention and early diagnosis. Lancet Respir. Med. 10, 1013–1015 (2022).

Article 
PubMed 

Google Scholar
 

Ardila, D. et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25, 954–961 (2019).

Article 
CAS 
PubMed 

Google Scholar
 

Chen, Z., Song, Y., Chang, T.-H. & Wan, X. Generating radiology reports via memory-driven transformer. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Webber, B. et al.) 1439–1449 (ACL, 2020); https://doi.org/10.18653/v1/2020.emnlp-main.112

OpenAI. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

Kirillov, A. et al. Segment anything. In Proc. IEEE/CVF International Conference on Computer Vision 4015–4026 (IEEE, 2023).

Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).

Article 
CAS 
PubMed 

Google Scholar
 

Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).

Article 
PubMed 

Google Scholar
 

Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Ma, J. et al. Segment anything in medical images. Nat. Commun. 15, 654 (2024).

Lei, W., Wei, X., Zhang, X., Li, K. & Zhang, S. MedLSAM: localize and segment anything model for 3D medical images. Med. Image Anal. 99, 103370 (2025).

Article 
PubMed 

Google Scholar
 

Zhang, X., Wu, C., Zhang, Y., Xie, W. & Wang, Y. Knowledge-enhanced visual-language pre-training on chest radiology images. Nat. Commun. 14, 4542 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Zhang, S. et al. A multimodal biomedical foundation model trained from fifteen million image–text pairs. NEJM AI 2, AIoa2400640 (2025).

Article 

Google Scholar
 

Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Topol, E. J. As artificial intelligence goes multimodal, medical applications multiply. Science 381, adk6139 (2023).

Article 
PubMed 

Google Scholar
 

National Academies of Sciences, Engineering, and Medicine. Improving Diagnosis in Health Care (The National Academies Press, 2015).

Azizi, S. et al. Big self-supervised models advance medical image classification. In Proc. IEEE/CVF International Conference on Computer Vision 3458–3468 (IEEE, 2021); https://doi.org/10.1109/ICCV48922.2021.00346

Hosseinzadeh Taher, M. R., Haghighi, F., Gotway, M. B. & Liang, J. CAiD: context-aware instance discrimination for self-supervised learning in medical imaging. Proc. Mach. Learn. Res. 172, 535–551 (2022).

Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 6000–6010 (Curran Associates, 2017).

The National Lung Screening Trial Research Team Reduced lung-cancer mortality with low-dose computed tomographic screening. N. Engl. J. Med. 365, 395–409 (2011).

Article 
PubMed Central 

Google Scholar
 

Morozov, S. P. et al. Mosmeddata: chest CT scans with COVID-19 related findings dataset. Preprint at https://arxiv.org/abs/2005.06465 (2020).

Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).

Eslami, S., Meinel, C. & De Melo, G. PubMedClip: how much does CLIP benefit visual question answering in the medical domain? In Proc. Findings of the Association for Computational Linguistics (eds Vlachos, A. & Augenstein, I.) 1151–1163 (ACL, 2023).

Moor, M. et al. Med-Flamingo: a multimodal medical few-shot learner. In Proc. 3rd Machine Learning for Health Symposium (eds Hegselmann, S. et al.) 353–367 (PMLR, 2023).

Li, C. et al. LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. In Proc. 37th Conference on Neural Information Processing Systems 28541–28564 (Curran Associates, 2023).

Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In Proc. International Conference on Learning Representations (OpenReview.net, 2021).

Deng, J. et al. Imagenet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics (eds Isabelle, P. et al.) 311–318 (ACL, 2002); https://doi.org/10.3115/1073083.1073135

Lin, C.-Y. ROUGE: a package for automatic evaluation of summaries. In Proc. Text Summarization Branches Out 74–81 (ACL, 2004).

Banerjee, S. & Lavie, A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (eds Goldstein, J. et al.) 65–72 (ACL, 2005).

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. Conference of the North American, Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J. et al.) 4171–4186 (ACL, 2019); https://doi.org/10.18653/v1/N19-1423

Fedus, W., Zoph, B. & Shazeer, N. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res. 23, 1–39 (2022).


Google Scholar
 

Houlsby, N. et al. Parameter-efficient transfer learning for NLP. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 2790–2799 (PMLR, 2019).

Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

Article 
CAS 
PubMed 

Google Scholar
 

Luo, R. et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief. Bioinform. 23, bbac409 (2022).

Article 
PubMed 

Google Scholar
 

Yang, X. et al. A large language model for electronic health records. npj Digit. Med. 5, 194 (2022).

Article 
PubMed 
PubMed Central 

Google Scholar
 

Rasmy, L., Xiang, Y., Xie, Z., Tao, C. & Zhi, D. Med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digit. Med. 4, 86 (2021).

Article 
PubMed 
PubMed Central 

Google Scholar
 

Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).

Article 
CAS 
PubMed 

Google Scholar
 

Liu, S. et al. Multimodal data matters: language model pre-training over structured and unstructured electronic health records. IEEE J. Biomed. Heal. Inform. 27, 504–514 (2023).

Article 

Google Scholar
 

Peiris, H., Hayat, M., Chen, Z., Egan, G. & Harandi, M. Uncertainty-guided dual-views for semi-supervised volumetric medical image segmentation. Nat. Mach. Intell. 5, 724–738 (2023).

Article 

Google Scholar
 

Zhou, L. et al. Self pre-training with masked autoencoders for medical image classification and segmentation. In Proc. IEEE 20th International Symposium on Biomedical Imaging 1–6 (IEEE, 2023).

Hu, X., Xu, X. & Shi, Y. How to efficiently adapt large segmentation model (SAM) to medical images. Preprint at https://arxiv.org/abs/2306.13731 (2023).

Qiu, Z., Hu, Y., Li, H. & Liu, J. Learnable ophthalmology SAM. Preprint at https://arxiv.org/abs/2304.13425 (2023).

Cao, H. et al. Swin-Unet: Unet-like pure transformer for medical image segmentation. In Proc. Computer Vision–ECCV 2022 Workshops (eds Karlinsky, L. et al.) 205–218 (Springer, 2023).

Schäfer, R. et al. Overcoming data scarcity in biomedical imaging with a foundational multi-task model. Nat. Comput. Sci. 4, 495–509 (2024).

Article 
PubMed 
PubMed Central 

Google Scholar
 

Pai, S. et al. Foundation model for cancer imaging biomarkers. Nat. Mach. Intell. 6, 354–367 (2024).

Article 
PubMed 
PubMed Central 

Google Scholar
 

Tu, T. et al. Towards generalist biomedical AI. NEJM AI 1, AIoa2300138 (2024).

Article 

Google Scholar
 

Zhou, H.-Y. et al. Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. Nat. Mach. Intell. 4, 32–40 (2022).

Article 

Google Scholar
 

Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023).

Article 
CAS 
PubMed 

Google Scholar
 

Zhang, K. et al. A generalist vision–language foundation model for diverse biomedical tasks. Nat. Med. https://doi.org/10.1038/s41591-024-03185-2 (2024).

Zhou, H.-Y., Adithan, S., Acosta, J. N., Topol, E. J. & Rajpurkar, P. MedVersa: a generalist foundation model for medical image interpretation. Preprint at https://arxiv.org/abs/2405.07988 (2025).

Yang, J. et al. Poisoning medical knowledge using large language models. Nat. Mach. Intell. 6, 1156–1168 (2024).

Article 

Google Scholar
 

Jin, C. et al. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nat. Commun. 11, 5088 (2020).

Article 
CAS 
PubMed 
PubMed Central 

Google Scholar
 

Chen, X., Fan, H., Girshick, R. & He, K. Improved baselines with momentum contrastive learning. Preprint at https://arxiv.org/abs/2003.04297 (2020).

Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning (eds Daumé III, H. & Singh, A.) 1597–1607 (PMLR, 2020).

He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 9726–9735 (IEEE, 2020).

van den Oord, A., Li, Y. & Vinyals, O. Representation learning with contrastive predictive coding. Preprint at https://arxiv.org/abs/1807.03748 (2019).

He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15979–15988 https://doi.org/10.1109/CVPR52688.2022.01553 (IEEE, 2022).

Brody, S., Alon, U. & Yahav, E. How attentive are graph attention networks? In Proc. International Conference on Learning Representations (OpenReview.net, 2022).

Pelka, O., Koitka, S., Rückert, J., Nensa, F. & Friedrich, C. M. Radiology objects in context (ROCO): a multimodal image dataset. In Proc. Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis (eds Stoyanov, D. et al.) 180–189 (Springer, 2018).

Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. In Proc. 37th Conference on Neural Information Processing Systems 34892–34916 (Curran Associates, 2023).

Abnar, S. & Zuidema, W. Quantifying attention flow in transformers. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 4190–4197 (ACL, 2020); https://doi.org/10.18653/v1/2020.acl-main.385

Chefer, H., Gur, S. & Wolf, L. Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In Proc. IEEE/CVF International Conference on Computer Vision 387–396 (IEEE, 2021); https://doi.org/10.1109/ICCV48922.2021.00045

Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In Proc. International Conference on Learning Representations (OpenReview.net, 2019).

Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. In Proc. Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) 8024–8035 (Curran Associates, 2019).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).


Google Scholar
 

Ma, L. D. et al. MedMPT. GitHub https://github.com/maliangdi/MedMPT (2025).