APOLLO AI learns from 25 billion medical events to predict future disease

Trained on 25 billion medical events from 7.2 million patients, APOLLO shows how one AI system can connect scans, notes, lab tests, and diagnoses to forecast disease, guide retrieval, and reveal patterns hidden across decades of care.

Research: A multimodal and temporal foundation model for virtual patient representations at healthcare system scale. Image Credit: delcarmat / Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

In a recent arXiv preprint*, researchers introduced “APOLLO,” a multimodal temporal foundation model aimed at addressing the discrepancy that only 3% of the world’s extensive annual healthcare data (about 50 PB) is used to generate clinical insights. The model was trained on the MGB-7M dataset, a massive longitudinal corpus comprising 7.2 million patients and 25.2 billion medical events, and integrated 28 unique medical modalities.

Study findings reveal that the model uses this data to learn high-dimensional “virtual patient representations” which, on testing, were found to demonstrate superior performance across 322 clinical tasks, including a 0.92 AUROC (Area Under the Receiver Operating Characteristic curve) for predicting schizophrenia onset and a balanced accuracy of 0.97 for predicting in-hospital dialysis dependence.

Healthcare Data Silos and Computable Medicine

Recent global reports reveal that modern healthcare operates under a data paradox: while Electronic Health Records (EHRs) are estimated to capture ~50 petabytes (PB) of healthcare data annually, only 3% of this data is leveraged for research.

Furthermore, an individual patient’s history is typically bifurcated into structured data (“modality-specific silos” such as ICD-10 diagnostic codes) and unstructured data (e.g., pathology slides or free-text progress notes), thereby complicating multidimensional analyses by human scientists.

While modern AI models partially resolve this data processing limitation, most analyze modality-specific silos in isolation, failing to capture the longitudinal “reasoning traces” or subtle multimodal biomarkers that define chronic disease progression.

Consequently, researchers have begun developing the next generation of foundation models designed to simultaneously ingest the holistic width of EHR records to generate an active computational substrate capable of modeling entire patient care journeys over decades.

APOLLO Model Architecture and Training Dataset

The present study documents the development and evaluation of “APOLLO”, a large-scale foundational model. The model utilizes a transformer-based architecture and was trained on a curated dataset (“MGB-7M”) derived from 33 years of records from 17 institutions within the Mass General Brigham (MGB) system.

MGB-7M includes 1.4 billion laboratory tests, 158 million progress notes, and over 1.1 million medical images, including microscopic histopathology and hematology slides. To computationally optimize for this extensive dataset, APOLLO utilizes the concept of “tokenization,” wherein each medical event, whether a blood pressure reading or an image patch, is converted into a continuous mathematical embedding.

Specifically, text is encoded using a clinical large language model, while images are processed using medical vision foundation models. These embeddings are then integrated into a common representation space where temporal context is preserved using learnable age-based encodings.

Before MGB-7M data ingestion, the model was pretrained using Masked Token Modeling (MTM), which enabled it to reconstruct hidden parts of a patient’s record by analyzing the surrounding longitudinal context. To ensure patient data safety, the architecture isolates the transformer from raw patient data, using modality-specific projectors to reduce the risk of protected health information (PHI) leakage.

📣 We are excited and thrilled to announce APOLLO, a healthcare system-scale multimodal temporal foundation model for virtual patient representations.

Trained on 25 billion clinical events from 7.2 million patients across 33 years and 28 modalities, APOLLO learns a unified atlas…

– Faisal Mahmood (@AI4Pathology) April 21, 2026

APOLLO Clinical Prediction and Retrieval Results

Following MGB-7M data ingestion, APOLLO’s performance was evaluated against 261 prognostic and 61 retrieval tasks. The model consistently outperformed traditional statistical baselines that rely solely on age and sex.

In disease-onset prediction, for example, APOLLO significantly outperformed the baseline on 74 of 95 tasks (p < 0.05). Furthermore, it predicted 3-year risk of heart failure with an AUROC of 0.88 (vs. 0.77 baseline) and 3-year risk of Type 2 diabetes with an AUROC of 0.85 (vs. 0.61 baseline, p < 0.0001).

In a predictive power evaluation, APOLLO accurately predicted patients’ 3-year survival after a stroke (AUROC of 0.84, compared with 0.72 at baseline [p < 0.0001]). Similarly, in oncology, the model improved survival prediction for trastuzumab therapy in HER2-positive breast cancer to an AUROC of 0.93, significantly exceeding the baseline of 0.66 (p < 0.0001).

The study’s findings (including ablation evaluations) highlighted the importance of multimodal integration, showing that APOLLO’s mean AUROC for overall cancer progression was 0.735, compared to present-day AI implementations limited to structured data (AUROC = 0.71) or task-specific supervised fine-tuning (AUROC = 0.626).

Finally, APOLLO demonstrated utility as a “medical search engine,” using pathology slide queries to retrieve similar patients from the test database of 1.4 million with high accuracy, even when traditional diagnostic codes were missing from the search results.

Computable Medicine Implications and Limitations

The present study concludes that APOLLO establishes a foundation for “computable medicine,” and represents a step toward shifting healthcare from reactive treatment to proactive risk management.

While the model is limited by its observational EHR training dataset, meaning its predictions are associational rather than causal, the treatment-response analyses stratify risk within cohorts already receiving a given therapy rather than showing which treatment is best for an individual patient, and it cannot estimate the differential efficacy of competing treatments, the model was shown to be able to compress entire clinical histories into unified digital signatures, enabling precision trial matching and personalized prognostic stratification within the health system studied.

Source:

Journal reference:

Preliminary scientific report.
Zhang, A., Ding, T., Wagner, S. J., Tian, C., Lu, M. Y., Pettit, R., Lewis, J. E., Misrahi, A., Mo, D., Le, L. P., & Mahmood, F. (2026). A multimodal and temporal foundation model for virtual patient representations at healthcare system scale. ArXiv. https://arxiv.org/abs/2604.18570

APOLLO AI learns from 25 billion medical events to predict future disease

Tags: