Operationalizing machine-assisted translation in healthcare

Rawal, S. et al. Association between limited English proficiency and revisits and readmissions after hospitalization for patients with acute and chronic conditions in Toronto, Ontario, Canada. JAMA 322, 1605–1607 (2019).

Article
PubMed
PubMed Central

Google Scholar

Lion, K. C., Lin, Y.-H. & Kim, T. Artificial intelligence for language translation: the equity is in the details. JAMA 332, 1427–1428 (2024).

Article
PubMed

Google Scholar

Flores, G. The impact of medical interpreter services on the quality of health care: a systematic review. Med. Care Res. Rev.62, 255–299 (2005).

Article
PubMed

Google Scholar

Schulson, L. B. & Anderson, T. S. National estimates of professional interpreter use in the ambulatory setting. J. Gen. Intern. Med. 37, 472–474 (2022).

Article
PubMed

Google Scholar

Diamond, L. C., Schenker, Y., Curry, L., Bradley, E. H. & Fernandez, A. Getting by: underuse of interpreters by resident physicians. J. Gen. Intern. Med. 24, 256–262 (2009).

Article
PubMed

Google Scholar

Detz, A. et al. Language concordance, interpersonal care, and diabetes self-care in rural Latino patients. J. Gen. Intern. Med. 29, 1650–1656 (2014).

Article
PubMed
PubMed Central

Google Scholar

Betancourt, J. R., Green, A. R., Carrillo, J. E. & Ananeh-Firempong, O. Defining cultural competence: a practical framework for addressing racial/ethnic disparities in health and health care. Public Health Rep. 118, 293–302 (2003).

Article
PubMed
PubMed Central

Google Scholar

Molina, R. L. & Kasper, J. The power of language-concordant care: a call to action for medical schools. BMC Med. Educ. 19, 378 (2019).

Article
PubMed
PubMed Central

Google Scholar

Harvey, S. M., Branch, M. R., Hudson, D. & Torres, A. Listening to immigrant Latino men in rural Oregon: exploring connections between culture and sexual and reproductive health services. Am. J. Mens. Health 7, 142–154 (2013).

Article
PubMed

Google Scholar

Gavvala, S. Ensuring understanding: Language-concordant discharge instructions. Rice Univ. Baker Inst. Public Policy, Issue Brief. https://doi.org/10.25613/cayx-wc08 (2023).

Karpińska, P. Computer aided translation – possibilities, limitations and changes in the field of professional translation. J. Educ. Cult. Soc. 8, 133–142 (2017).

Article

Google Scholar

Davis, S. H. et al. Translating discharge instructions for limited English-proficient families: strategies and barriers. Hosp. Pediatr. 9, 779–787 (2019).

Article
PubMed
PubMed Central

Google Scholar

Choe, A. Y. et al. Improving discharge instructions for hospitalized children with limited english proficiency. Hosp. Pediatr. 11, 1213–1222 (2021).

Article
PubMed

Google Scholar

Diamond, L. C., Wilson-Stronks, A. & Jacobs, E. A. Do hospitals measure up to the national culturally and linguistically appropriate services standards?. Med. Care 48, 1080–1087 (2010).

Article
PubMed

Google Scholar

Rights (OCR), O. for C. Summary of Guidance to Federal Financial Assistance Recipients Regarding Title VI and the prohibition against national origin discrimination affecting limited English proficient persons. https://www.hhs.gov/civil-rights/for-providers/laws-regulations-guidance/guidance-federal-financial-assistance-title-vi/index.html (2007).

Wu, Y. et al. Google’s neural machine translation system: bridging the gap between human and machine translation. Preprint at https://doi.org/10.48550/arXiv.1609.08144 (2016).

Koehn, P. & Knowles, R. Six challenges for neural machine translation. In Proc. First Workshop on Neural Machine Translation (eds. Luong, T., Birch, A., Neubig, G. & Finch, A.) 28–39 (Association for Computational Linguistics, Vancouver, 2017). https://doi.org/10.18653/v1/W17-3204.

Vaswani, A. et al. Attention is all you need. in Advances in Neural Information Processing Systems 30 (Curran Associates, Inc., 2017).

Tu, T. et al. Towards conversational diagnostic artificial intelligence. Nature 642, 442–450 (2025).

Article
CAS
PubMed
PubMed Central

Google Scholar

Brewster, R. C. L. et al. Performance of ChatGPT and Google Translate for Pediatric Discharge Instruction Translation. Pediatrics 154, e2023065573 (2024).

Article
PubMed

Google Scholar

Ortega, J. E., Castro Mamani, R. & Cho, K. Neural machine translation with a polysynthetic low resource language. Mach. Transl. 34, 325–346 (2020).

Article

Google Scholar

Adebara, I., Abdul-Mageed, M. & Silfverberg, M. Linguistically-Motivated Yorùbá-English Machine Translation. In Proc. of the 29th International Conference on Computational Linguistics (eds Calzolari, N. et al.) 5066–5075 (International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 2022).

Goh, E. et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat. Med. 1–6 https://doi.org/10.1038/s41591-024-03456-y (2025).

Savage, T. et al. Fine tuning large language models for medicine: the role and importance of direct preference optimization. Preprint at https://doi.org/10.48550/arXiv.2409.12741 (2024).

Mirza, F. N. et al. Using ChatGPT to facilitate truly informed medical consent. NEJM AI 1, AIcs2300145 (2024).

Article

Google Scholar

Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 30, 1134–1142 (2024).

Article
PubMed
PubMed Central

Google Scholar

Zaretsky, J. et al. Generative artificial intelligence to transform inpatient discharge summaries to patient-friendly language and format. JAMA Netw. Open 7, e240357 (2024).

Article
PubMed
PubMed Central

Google Scholar

Nondiscrimination in Health Programs and Activities. Federal Register https://www.federalregister.gov/documents/2024/05/06/2024-08711/nondiscrimination-in-health-programs-and-activities (2024).

Damschroder, L. J., Reardon, C. M., Widerquist, M. A. O. & Lowery, J. The updated consolidated framework for implementation research based on user feedback. Implement. Sci. 17, 75 (2022).

Article
PubMed
PubMed Central

Google Scholar

Xu, Z., Jain, S. & Kankanhalli, M. Hallucination is inevitable: an innate limitation of large language models. Preprint at http://arxiv.org/abs/2401.11817 (2024).

Liu, N. F. et al. Lost in the middle: how language models use long contexts. Trans. Assoc. Comput. Linguist. 12, 157–173 (2024).

Article

Google Scholar

Levy, A., Agrawal, M., Satyanarayan, A. & Sontag, D. Assessing the impact of automated suggestions on decision making: domain experts mediate model errors but take less initiative. In Proc. 2021 CHI Conference on Human Factors in Computing Systems 1–13 (Association for Computing Machinery, New York, NY, USA, 2021). https://doi.org/10.1145/3411764.3445522.

Kuperman, G. J. et al. Medication-related clinical decision support in computerized provider order entry systems: a review. J. Am. Med. Inform. Assoc. 14, 29–40 (2007).

Article
PubMed
PubMed Central

Google Scholar

Data controls in the OpenAI platform – OpenAI API. https://platform.openai.com.

Ng, M. Y., Helzer, J., Pfeffer, M. A., Seto, T. & Hernandez-Boussard, T. Development of secure infrastructure for advancing generative AI research in healthcare at an academic medical center. Res. Sq. rs.3.rs-5095287 https://doi.org/10.21203/rs.3.rs-5095287/v1 (2024).

Vedula, K. S. et al. Distilling large language models for efficient clinical information extraction. Preprint at https://doi.org/10.48550/arXiv.2501.00031 (2024).

Woods, A. P. et al. Limited English proficiency and clinical outcomes after hospital-based care in English-speaking countries: a systematic review. J. Gen. Intern. Med. 37, 2050–2061 (2022).

Article
PubMed
PubMed Central

Google Scholar

Manuel, S. P., Nguyen, K., Karliner, L. S., Ward, D. T. & Fernandez, A. Association of English language proficiency with hospitalization cost, length of stay, disposition location, and readmission following total joint arthroplasty. JAMA Netw. Open 5, e221842 (2022).

Article
PubMed
PubMed Central

Google Scholar

Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. In Proc. 36th International Conference on Neural Information Processing Systems 22199–22213 (Curran Associates Inc., Red Hook, NY, USA, 2022).

Bakken, S. AI in health: keeping the human in the loop. J. Am. Med. Inform. Assoc. 30, 1225–1226 (2023).

Article
PubMed
PubMed Central

Google Scholar

Swaminathan, A. et al. Natural language processing system for rapid detection and intervention of mental health crisis chat messages. NPJ Digit. Med. 6, 1–9 (2023).

Article

Google Scholar

BigQuery enterprise data warehouse. Google Cloud https://cloud.google.com/bigquery.

The Snowflake AI Data Cloud – Mobilize Data, Apps, and AI. https://www.snowflake.com/content/snowflake-site/global/en.

Create Your Azure Free Account Or Pay As You Go | Microsoft Azure. https://azure.microsoft.com/en-us/pricing/purchase-options/azure-account/search.

Zhang, X., Rajabi, N., Duh, K. & Koehn, P. Machine Translation with Large Language Models: Prompting, Few-shot Learning, and Fine-tuning with QLoRA. In Proc. Eighth Conference on Machine Translation (eds. Koehn, P., Haddow, B., Kocmi, T. & Monz, C.) 468–481 (Association for Computational Linguistics, Singapore, 2023). https://doi.org/10.18653/v1/2023.wmt-1.43.

Rafailov, R. et al. Direct preference optimization: your language model is secretly a reward model. In Proc. 37th International Conference on Neural Information Processing Systems 53728–53741 (Curran Associates Inc., Red Hook, NY, USA, 2023).

Looker Studio. Google for Developers https://developers.google.com/looker-studio.

Lommel, A. R., Burchardt, A. & Uszkoreit, H. Multidimensional quality metrics: a flexible system for assessing translation quality. In Proc. Translating and the Computer 35 (Aslib, London, UK, 2013).

Chen, X., Acosta, S. & Barry, A. E. Evaluating the accuracy of Google translate for diabetes education material. JMIR Diabetes 1, e3 (2016).

Article
PubMed
PubMed Central

Google Scholar

Lopez, I., Haredasht, F. N., Caoili, K., Chen, J. H. & Chaudhari, A. Embedding-driven diversity sampling to improve few-shot synthetic data generation. Preprint at https://doi.org/10.48550/arXiv.2501.11199 (2025).

Popović, M. chrF++: words helping character n-grams. In Proc. Second Conference on Machine Translation (eds. Bojar, O. et al.) 612–618 (Association for Computational Linguistics, Copenhagen, Denmark, 2017). https://doi.org/10.18653/v1/W17-4770.

Rei, R., Stewart, C., Farinha, A. C. & Lavie, A. COMET: A Neural Framework for MT Evaluation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds. Webber, B., Cohn, T., He, Y. & Liu, Y.) 2685–2702 (Association for Computational Linguistics, Online, 2020). https://doi.org/10.18653/v1/2020.emnlp-main.213.

Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics (eds. Isabelle, P., Charniak, E. & Lin, D.) 311–318 (Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002). https://doi.org/10.3115/1073083.1073135.

Mathur, N., Baldwin, T. & Cohn, T. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds. Jurafsky, D., Chai, J., Schluter, N. & Tetreault, J.) 4984–4997 (Association for Computational Linguistics, Online, 2020). https://doi.org/10.18653/v1/2020.acl-main.448.

Lopez, I. et al. Clinical entity augmented retrieval for clinical information extraction. NPJ Digit. Med. 8, 1–11 (2025).

Article

Google Scholar

Swaminathan, A. et al. Selective prediction for extracting unstructured clinical data. J. Am. Med. Inform. Assoc. 31, 188–197 (2024).

Article

Google Scholar

Bates, B. A. et al. Validity of International Classification of Diseases (ICD)-10 diagnosis codes for identification of acute heart failure hospitalization and heart failure with reduced versus preserved ejection fraction in a national medicare sample. Circ. Cardiovasc. Qual. Outcomes 16, e009078 (2023).

Article
PubMed

Google Scholar

Gothe, H. et al. Algorithms to identify COPD in health systems with and without access to ICD coding: a systematic review. BMC Health Serv. Res. 19, 737 (2019).

Article
PubMed
PubMed Central

Google Scholar

Shoemaker, S. J., Wolf, M. S. & Brach, C. Development of the Patient Education Materials Assessment Tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ. Couns. 96, 395–403 (2014).

Article
PubMed
PubMed Central

Google Scholar

Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10, 1 (2023).

Carrell, D. et al. Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text. J. Am. Med. Inform. Assoc. 20, 342–348 (2013).

Article
PubMed

Google Scholar

National Standards for Culturally and Linguistically Appropriate Services (CLAS) in Health and Health Care. Federal Register https://www.federalregister.gov/documents/2013/09/24/2013-23164/national-standards-for-culturally-and-linguistically-appropriate-services-clas-in-health-and-health (2013).

Li, Z. et al. Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages. In Special Track on AI Alignment 28186–28194 (Association for the Advancement of Artificial Intelligence, 2025). https://doi.org/10.1609/aaai.v39i27.35038.

Xie, Y. et al. Weakly supervised scene text generation for low-resource languages. Expert Syst. Appl. 237, 121622 (2024).

Article

Google Scholar

Khoong, E. C. & Rodriguez, J. A. A research agenda for using machine translation in clinical medicine. J. Gen. Intern. Med. 37, 1275–1277 (2022).

Article
PubMed
PubMed Central

Google Scholar

Operationalizing machine-assisted translation in healthcare

Tags: