Rawal, S. et al. Association between limited English proficiency and revisits and readmissions after hospitalization for patients with acute and chronic conditions in Toronto, Ontario, Canada. JAMA 322, 1605–1607 (2019).
Lion, K. C., Lin, Y.-H. & Kim, T. Artificial intelligence for language translation: the equity is in the details. JAMA 332, 1427–1428 (2024).
Flores, G. The impact of medical interpreter services on the quality of health care: a systematic review. Med. Care Res. Rev.62, 255–299 (2005).
Schulson, L. B. & Anderson, T. S. National estimates of professional interpreter use in the ambulatory setting. J. Gen. Intern. Med. 37, 472–474 (2022).
Diamond, L. C., Schenker, Y., Curry, L., Bradley, E. H. & Fernandez, A. Getting by: underuse of interpreters by resident physicians. J. Gen. Intern. Med. 24, 256–262 (2009).
Detz, A. et al. Language concordance, interpersonal care, and diabetes self-care in rural Latino patients. J. Gen. Intern. Med. 29, 1650–1656 (2014).
Betancourt, J. R., Green, A. R., Carrillo, J. E. & Ananeh-Firempong, O. Defining cultural competence: a practical framework for addressing racial/ethnic disparities in health and health care. Public Health Rep. 118, 293–302 (2003).
Molina, R. L. & Kasper, J. The power of language-concordant care: a call to action for medical schools. BMC Med. Educ. 19, 378 (2019).
Harvey, S. M., Branch, M. R., Hudson, D. & Torres, A. Listening to immigrant Latino men in rural Oregon: exploring connections between culture and sexual and reproductive health services. Am. J. Mens. Health 7, 142–154 (2013).
Gavvala, S. Ensuring understanding: Language-concordant discharge instructions. Rice Univ. Baker Inst. Public Policy, Issue Brief. https://doi.org/10.25613/cayx-wc08 (2023).
Karpińska, P. Computer aided translation – possibilities, limitations and changes in the field of professional translation. J. Educ. Cult. Soc. 8, 133–142 (2017).
Davis, S. H. et al. Translating discharge instructions for limited English-proficient families: strategies and barriers. Hosp. Pediatr. 9, 779–787 (2019).
Choe, A. Y. et al. Improving discharge instructions for hospitalized children with limited english proficiency. Hosp. Pediatr. 11, 1213–1222 (2021).
Diamond, L. C., Wilson-Stronks, A. & Jacobs, E. A. Do hospitals measure up to the national culturally and linguistically appropriate services standards?. Med. Care 48, 1080–1087 (2010).
Rights (OCR), O. for C. Summary of Guidance to Federal Financial Assistance Recipients Regarding Title VI and the prohibition against national origin discrimination affecting limited English proficient persons. https://www.hhs.gov/civil-rights/for-providers/laws-regulations-guidance/guidance-federal-financial-assistance-title-vi/index.html (2007).
Wu, Y. et al. Google’s neural machine translation system: bridging the gap between human and machine translation. Preprint at https://doi.org/10.48550/arXiv.1609.08144 (2016).
Koehn, P. & Knowles, R. Six challenges for neural machine translation. In Proc. First Workshop on Neural Machine Translation (eds. Luong, T., Birch, A., Neubig, G. & Finch, A.) 28–39 (Association for Computational Linguistics, Vancouver, 2017). https://doi.org/10.18653/v1/W17-3204.
Vaswani, A. et al. Attention is all you need. in Advances in Neural Information Processing Systems 30 (Curran Associates, Inc., 2017).
Tu, T. et al. Towards conversational diagnostic artificial intelligence. Nature 642, 442–450 (2025).
Brewster, R. C. L. et al. Performance of ChatGPT and Google Translate for Pediatric Discharge Instruction Translation. Pediatrics 154, e2023065573 (2024).
Ortega, J. E., Castro Mamani, R. & Cho, K. Neural machine translation with a polysynthetic low resource language. Mach. Transl. 34, 325–346 (2020).
Adebara, I., Abdul-Mageed, M. & Silfverberg, M. Linguistically-Motivated Yorùbá-English Machine Translation. In Proc. of the 29th International Conference on Computational Linguistics (eds Calzolari, N. et al.) 5066–5075 (International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 2022).
Goh, E. et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat. Med. 1–6 https://doi.org/10.1038/s41591-024-03456-y (2025).
Savage, T. et al. Fine tuning large language models for medicine: the role and importance of direct preference optimization. Preprint at https://doi.org/10.48550/arXiv.2409.12741 (2024).
Mirza, F. N. et al. Using ChatGPT to facilitate truly informed medical consent. NEJM AI 1, AIcs2300145 (2024).
Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 30, 1134–1142 (2024).
Zaretsky, J. et al. Generative artificial intelligence to transform inpatient discharge summaries to patient-friendly language and format. JAMA Netw. Open 7, e240357 (2024).
Nondiscrimination in Health Programs and Activities. Federal Register https://www.federalregister.gov/documents/2024/05/06/2024-08711/nondiscrimination-in-health-programs-and-activities (2024).
Damschroder, L. J., Reardon, C. M., Widerquist, M. A. O. & Lowery, J. The updated consolidated framework for implementation research based on user feedback. Implement. Sci. 17, 75 (2022).
Xu, Z., Jain, S. & Kankanhalli, M. Hallucination is inevitable: an innate limitation of large language models. Preprint at http://arxiv.org/abs/2401.11817 (2024).
Liu, N. F. et al. Lost in the middle: how language models use long contexts. Trans. Assoc. Comput. Linguist. 12, 157–173 (2024).
Levy, A., Agrawal, M., Satyanarayan, A. & Sontag, D. Assessing the impact of automated suggestions on decision making: domain experts mediate model errors but take less initiative. In Proc. 2021 CHI Conference on Human Factors in Computing Systems 1–13 (Association for Computing Machinery, New York, NY, USA, 2021). https://doi.org/10.1145/3411764.3445522.
Kuperman, G. J. et al. Medication-related clinical decision support in computerized provider order entry systems: a review. J. Am. Med. Inform. Assoc. 14, 29–40 (2007).
Data controls in the OpenAI platform – OpenAI API. https://platform.openai.com.
Ng, M. Y., Helzer, J., Pfeffer, M. A., Seto, T. & Hernandez-Boussard, T. Development of secure infrastructure for advancing generative AI research in healthcare at an academic medical center. Res. Sq. rs.3.rs-5095287 https://doi.org/10.21203/rs.3.rs-5095287/v1 (2024).
Vedula, K. S. et al. Distilling large language models for efficient clinical information extraction. Preprint at https://doi.org/10.48550/arXiv.2501.00031 (2024).
Woods, A. P. et al. Limited English proficiency and clinical outcomes after hospital-based care in English-speaking countries: a systematic review. J. Gen. Intern. Med. 37, 2050–2061 (2022).
Manuel, S. P., Nguyen, K., Karliner, L. S., Ward, D. T. & Fernandez, A. Association of English language proficiency with hospitalization cost, length of stay, disposition location, and readmission following total joint arthroplasty. JAMA Netw. Open 5, e221842 (2022).
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. In Proc. 36th International Conference on Neural Information Processing Systems 22199–22213 (Curran Associates Inc., Red Hook, NY, USA, 2022).
Bakken, S. AI in health: keeping the human in the loop. J. Am. Med. Inform. Assoc. 30, 1225–1226 (2023).
Swaminathan, A. et al. Natural language processing system for rapid detection and intervention of mental health crisis chat messages. NPJ Digit. Med. 6, 1–9 (2023).
BigQuery enterprise data warehouse. Google Cloud https://cloud.google.com/bigquery.
The Snowflake AI Data Cloud – Mobilize Data, Apps, and AI. https://www.snowflake.com/content/snowflake-site/global/en.
Create Your Azure Free Account Or Pay As You Go | Microsoft Azure. https://azure.microsoft.com/en-us/pricing/purchase-options/azure-account/search.
Zhang, X., Rajabi, N., Duh, K. & Koehn, P. Machine Translation with Large Language Models: Prompting, Few-shot Learning, and Fine-tuning with QLoRA. In Proc. Eighth Conference on Machine Translation (eds. Koehn, P., Haddow, B., Kocmi, T. & Monz, C.) 468–481 (Association for Computational Linguistics, Singapore, 2023). https://doi.org/10.18653/v1/2023.wmt-1.43.
Rafailov, R. et al. Direct preference optimization: your language model is secretly a reward model. In Proc. 37th International Conference on Neural Information Processing Systems 53728–53741 (Curran Associates Inc., Red Hook, NY, USA, 2023).
Looker Studio. Google for Developers https://developers.google.com/looker-studio.
Lommel, A. R., Burchardt, A. & Uszkoreit, H. Multidimensional quality metrics: a flexible system for assessing translation quality. In Proc. Translating and the Computer 35 (Aslib, London, UK, 2013).
Chen, X., Acosta, S. & Barry, A. E. Evaluating the accuracy of Google translate for diabetes education material. JMIR Diabetes 1, e3 (2016).
Lopez, I., Haredasht, F. N., Caoili, K., Chen, J. H. & Chaudhari, A. Embedding-driven diversity sampling to improve few-shot synthetic data generation. Preprint at https://doi.org/10.48550/arXiv.2501.11199 (2025).
Popović, M. chrF++: words helping character n-grams. In Proc. Second Conference on Machine Translation (eds. Bojar, O. et al.) 612–618 (Association for Computational Linguistics, Copenhagen, Denmark, 2017). https://doi.org/10.18653/v1/W17-4770.
Rei, R., Stewart, C., Farinha, A. C. & Lavie, A. COMET: A Neural Framework for MT Evaluation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds. Webber, B., Cohn, T., He, Y. & Liu, Y.) 2685–2702 (Association for Computational Linguistics, Online, 2020). https://doi.org/10.18653/v1/2020.emnlp-main.213.
Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics (eds. Isabelle, P., Charniak, E. & Lin, D.) 311–318 (Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002). https://doi.org/10.3115/1073083.1073135.
Mathur, N., Baldwin, T. & Cohn, T. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds. Jurafsky, D., Chai, J., Schluter, N. & Tetreault, J.) 4984–4997 (Association for Computational Linguistics, Online, 2020). https://doi.org/10.18653/v1/2020.acl-main.448.
Lopez, I. et al. Clinical entity augmented retrieval for clinical information extraction. NPJ Digit. Med. 8, 1–11 (2025).
Swaminathan, A. et al. Selective prediction for extracting unstructured clinical data. J. Am. Med. Inform. Assoc. 31, 188–197 (2024).
Bates, B. A. et al. Validity of International Classification of Diseases (ICD)-10 diagnosis codes for identification of acute heart failure hospitalization and heart failure with reduced versus preserved ejection fraction in a national medicare sample. Circ. Cardiovasc. Qual. Outcomes 16, e009078 (2023).
Gothe, H. et al. Algorithms to identify COPD in health systems with and without access to ICD coding: a systematic review. BMC Health Serv. Res. 19, 737 (2019).
Shoemaker, S. J., Wolf, M. S. & Brach, C. Development of the Patient Education Materials Assessment Tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ. Couns. 96, 395–403 (2014).
Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10, 1 (2023).
Carrell, D. et al. Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text. J. Am. Med. Inform. Assoc. 20, 342–348 (2013).
National Standards for Culturally and Linguistically Appropriate Services (CLAS) in Health and Health Care. Federal Register https://www.federalregister.gov/documents/2013/09/24/2013-23164/national-standards-for-culturally-and-linguistically-appropriate-services-clas-in-health-and-health (2013).
Li, Z. et al. Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages. In Special Track on AI Alignment 28186–28194 (Association for the Advancement of Artificial Intelligence, 2025). https://doi.org/10.1609/aaai.v39i27.35038.
Xie, Y. et al. Weakly supervised scene text generation for low-resource languages. Expert Syst. Appl. 237, 121622 (2024).
Khoong, E. C. & Rodriguez, J. A. A research agenda for using machine translation in clinical medicine. J. Gen. Intern. Med. 37, 1275–1277 (2022).