{"id":64417,"date":"2025-08-07T06:09:10","date_gmt":"2025-08-07T06:09:10","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/64417\/"},"modified":"2025-08-07T06:09:10","modified_gmt":"2025-08-07T06:09:10","slug":"ai-and-health-care-professors-paper-explores-how-systems-assess-patient-risks-and-medical-coding-luddy-school-of-informatics-computing-and-engineering-indiana-university","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/64417\/","title":{"rendered":"AI and health care: Professor\u2019s paper explores how systems assess patient risks and medical coding: Luddy School of Informatics, Computing, and Engineering : Indiana University"},"content":{"rendered":"<p>                    \t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/us\/wp-content\/uploads\/2025\/08\/24336_CODE_or_MISS_crop.rev.1754514517.png\" width=\"1024\" height=\"513\" alt=\"The researchers assessed how various large-language models would make health care decisions.\" data-max-w=\"1024\" data-max-h=\"513\" data-optimized=\"true\"\/>        The researchers assessed how various large-language models would make health care decisions.      <\/p>\n<p>\n  Could AI and large-language models become tools to better manage health care?\n<\/p>\n<p>\n  The potential is there \u2013 but evidence is lacking. In a paper published in the <a href=\"https:\/\/www.jmir.org\/\" target=\"_blank\" rel=\"noopener nofollow\">Journal of Medical Internet Research<\/a> (JMIR), Luddy Indianapolis Associate Professor <a href=\"https:\/\/luddy.indianapolis.iu.edu\/contact\/directory\/index.html\" rel=\"nofollow noopener\" target=\"_blank\">Saptarshi Purkayastha<\/a> examines how large-language models (LLMs) such as ChatGPT-4 and OpenAI-03 performed when tackling some essential clinical tasks. (Spoiler: large-language models have some large problems to overcome.)\n<\/p>\n<p>\n  The paper published July 30 in the journal, <a href=\"https:\/\/www.jmir.org\/2025\/1\/e74142\" target=\"_blank\" rel=\"noopener nofollow\">\u201cEvaluating the Reasoning Capabilities of Large Language Models for Medical Coding and Hospital Readmission Risk Stratification: Zero-Shot Prompting Approach,\u201d<\/a> assesses whether LLMs can serve as general-purpose clinical decision support tools.\n<\/p>\n<p>\n  \u201cFor health care leaders and clinical researchers, the study offers a clear message: while LLMs hold significant potential to support clinical workflows, such as speeding up coding drafts and risk stratification, they are not yet ready to replace human expertise,\u201d says Purkayastha, Ph.D. He is director of <a href=\"https:\/\/luddy.indianapolis.iu.edu\/degrees\/undergraduate\/him\/index.html\" rel=\"nofollow noopener\" target=\"_blank\">Health Informatics<\/a> and associate chair of the <a href=\"https:\/\/luddy.indianapolis.iu.edu\/departments\/biomedical-engineering\/index.html\" rel=\"nofollow noopener\" target=\"_blank\">Biomedical Engineering<\/a> and Informatics department at IU\u2019s <a href=\"https:\/\/luddy.indianapolis.iu.edu\/index.html\" rel=\"nofollow noopener\" target=\"_blank\">Luddy School of Informatics, Computing, and Engineering in Indianapolis<\/a>.\n<\/p>\n<p>\n  \u201cThe recommended path forward,\u201d he adds, \u201clies in responsible deployment through hybrid human-AI workflows, specialized fine-tuning on clinical datasets, inspections for detecting bias, and robust governance frameworks that ensure continuous monitoring, auditing, and correction.\u201d\n<\/p>\n<p>  Crunching the numbers<\/p>\n<p>\n  Large-language models are artificial intelligence systems designed to understand and generate human-like text.\n<\/p>\n<p>\n  Newer reasoning models, which emerged during the study, have reasoning capabilities embedded in their design, allowing more logical, step-by-step decision-making, the paper notes.\n<\/p>\n<p>\n  For this study, Purkayastha and co-authors Parvati Naliyatthaliyazchayil, Raajitha Mutyala, and Judy Gichoya focused on five LLMs: DeepSeek-R1 and OpenAI-O3 (reasoning models), and ChatGPT-4, Gemini-1.5, and LLaMA-3.1 (non-reasoning models).\n<\/p>\n<p>\n  The study evaluated the models\u2019 performance in three key clinical tasks:\n<\/p>\n<p>  Primary diagnosis generation<\/p>\n<p>  ICD-9 medical code prediction<\/p>\n<p>  Hospital readmission risk stratification<\/p>\n<p>  Working backwards<\/p>\n<p>\n  When you\u2019re hospitalized, you probably have a lot of questions. By the time you\u2019re discharged, you should have some answers.\n<\/p>\n<p>\n  In their study, Purkayastha and his co-authors reversed the process, giving the large-language models the results, and letting them take it from there.\n<\/p>\n<p>\n  \u201cWe selected a random cohort of 300 hospital discharge summaries,\u201d the authors explained in their JMIR research paper. The large-language models were given structured clinical content from five note sections:\n<\/p>\n<p>  Chief complaints<\/p>\n<p>  Past medical history<\/p>\n<p>  Surgical history<\/p>\n<p>  Labs<\/p>\n<p>  Imaging<\/p>\n<p>\n  The challenge: Would the models be able to accurately generate a primary diagnosis; predict medical codes; and assess risk of readmission?\n<\/p>\n<p>  A variable track record<\/p>\n<p>\n  The researchers used zero-shot prompting. This meant the LLMs had NOT seen the actual samples used in the discharge summaries before.\n<\/p>\n<p>\n  \u201cAll model interactions were conducted through publicly available web user interfaces,\u201d the researchers noted, \u201cwithout using APIs or backend access, to simulate real-world accessibility for non-technical users.\u201d\n<\/p>\n<p>\n  How did the large-language models perform?\n<\/p>\n<p>  Primary diagnosis generation<\/p>\n<p>\n  This is where LLMs shone brightest. \u201cAmong non-reasoning models, LLaMA-3.1 achieved the highest primary diagnosis accuracy (85%), followed by ChatGPT-4 (84.7%) and Gemini-1.5 (79%),\u201d the researchers reported. \u201cAmong reasoning models, OpenAI-O3 outperformed in diagnosis (90%).\u201d\n<\/p>\n<p>  ICD-9 medical code prediction<\/p>\n<p>\n  Large-language models fell behind in this category. \u201cFor ICD-9 prediction, correctness dropped significantly across all models: LLaMA-3.1 (42.6%), ChatGPT-4 (40.6%), Gemini-1.5 (14.6%),\u201d according to the researchers. OpenAI-03, a reasoning model, scored 45.3%.\n<\/p>\n<p>  Hospital readmission risk stratification<\/p>\n<p>\n  Hospital readmission risk prediction showed low performance in non-reasoning models: LLaMA-3.1 (41.3%), Gemini-1.5 (40.7%), ChatGPT-4 (33%). Reasoning model DeepSeek-R1 performed slightly better in the readmission risk prediction (72.66% vs. OpenAI-O3\u2019s 70.66%), the paper states.\n<\/p>\n<p>  The takeaways<\/p>\n<p>\n  \u201cThis study reveals critical insights with profound real-world implications,\u201d Purkayastha says.\n<\/p>\n<p>\n  \u201cMisclassification in coding can lead to billing inaccuracies, resource misallocation, and flawed health care data analytics. Similarly, incorrect readmission risk predictions may impact discharge planning and patient safety.\n<\/p>\n<p>\n  \u201cWhen AI systems err or hallucinate, questions of liability and transparency become pressing. Ambiguities about who is accountable \u2013 developers, clinicians, or health care providers \u2013 raise legal and professional risks.\u201d\n<\/p>\n<p>\n  Looking at reasoning vs. non-reasoning models, the researchers said, \u201cOur results show that reasoning models outperformed nonreasoning ones across most tasks.\u201d\n<\/p>\n<p>\n  The researchers concluded OpenAI-03 outperformed the other models in these tasks, noting, \u201cReasoning models offer marginally better performance and increased interpretability but remain limited in reliability.\u201d\n<\/p>\n<p>\n  Their conclusion: when it comes to clinical decision-making and artificial intelligence, there\u2019s a lot of room for improvement.\n<\/p>\n<p>  Identifying LLM shortcomings can lead to solutions<\/p>\n<p>\n  \u201cThese results highlight the need for task-specific fine-tuning and adding more human-in-loop models to train them,\u201d the researchers concluded. \u201cFuture work will explore fine-tuning, stability through repeated trials, and evaluation on a different subset of de-identified real-world data with a larger sample size.\n<\/p>\n<p>\n  \u201cThe recorded limitations serve as essential guideposts for safely and effectively integrating LLMs into clinical practice.\u201d\n<\/p>\n<p>\n  Purkayastha acknowledges the role of artificial intelligence in the clinical workflow going forward.\n<\/p>\n<p>\n  \u201cAs artificial intelligence continues to reshape the future of health care, this study represents an important contribution,\u201d he says, \u201cdemonstrating original research with significant implications.\n<\/p>\n<p>\n  \u201cIt advocates for balanced optimism paired with caution and ethical vigilance, ensuring that the power of AI truly enhances patient care without compromising safety or trust.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"The researchers assessed how various large-language models would make health care decisions. Could AI and large-language models become&hellip;\n","protected":false},"author":2,"featured_media":64418,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46],"tags":[191,74],"class_list":{"0":"post-64417","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-computing","8":"tag-computing","9":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/64417","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=64417"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/64417\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/64418"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=64417"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=64417"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=64417"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}