{"id":95071,"date":"2025-08-26T23:30:21","date_gmt":"2025-08-26T23:30:21","guid":{"rendered":"https:\/\/www.newsbeep.com\/uk\/95071\/"},"modified":"2025-08-26T23:30:21","modified_gmt":"2025-08-26T23:30:21","slug":"something-extremely-scary-happens-when-advanced-ai-tries-to-give-medical-advice-to-real-world-patients","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/uk\/95071\/","title":{"rendered":"Something Extremely Scary Happens When Advanced AI Tries to Give Medical Advice to Real World Patients"},"content":{"rendered":"<p>Image by Getty \/ Futurism<\/p>\n<p>Last week, Google AI pioneer Jad Tarifi sparked controversy when he <a href=\"https:\/\/www.businessinsider.com\/google-ai-team-too-late-phd-ai-hype-jad-tarifi-2025-8\" class=\"underline hover:text-neoscope hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:#ff9900\" rel=\"nofollow noopener\" target=\"_blank\">told Business Insider<\/a> that it <a href=\"https:\/\/futurism.com\/former-google-ai-exec-law-medicine\" class=\"underline hover:text-neoscope hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:#ff9900\" rel=\"nofollow noopener\" target=\"_blank\">no longer makes sense<\/a> to get a medical degree \u2014 since, in his telling, artificial intelligence will render such an education obsolete by the time you&#8217;re a practicing doctor.<\/p>\n<p>Companies have long touted the tech as a way to <a href=\"https:\/\/permanente.org\/how-ai-is-giving-physicians-more-time-for-what-matters-most\/\" class=\"underline hover:text-neoscope hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:#ff9900\" rel=\"nofollow noopener\" target=\"_blank\">free up the time<\/a> of overworked doctors and even aid them in specialized skills, including <a href=\"https:\/\/news.harvard.edu\/gazette\/story\/2024\/09\/new-ai-tool-can-diagnose-cancer-guide-treatment-predict-patient-survival\/\" class=\"underline hover:text-neoscope hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:#ff9900\" rel=\"nofollow noopener\" target=\"_blank\">scanning medical imagery for tumors<\/a>. Hospitals have already been rolling out AI tech to <a href=\"https:\/\/www.shiftmed.com\/insights\/knowledge-center\/impact-of-ai-in-healthcare-administration\/\" class=\"underline hover:text-neoscope hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:#ff9900\" rel=\"nofollow noopener\" target=\"_blank\">help with administrative work<\/a>.<\/p>\n<p>But given the current state of AI\u00a0\u2014 from <a href=\"https:\/\/futurism.com\/the-byte\/whisper-nabla-hospital-ai-details-patients\" class=\"underline hover:text-neoscope hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:#ff9900\" rel=\"nofollow noopener\" target=\"_blank\">widespread hallucinations<\/a> to &#8220;<a href=\"https:\/\/futurism.com\/neoscope\/doctors-ai-lose-ability-spot-cancer\" class=\"underline hover:text-neoscope hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:#ff9900\" rel=\"nofollow noopener\" target=\"_blank\">deskilling<\/a>&#8221; experienced by doctors over-relying on it \u2014 there&#8217;s reason to believe that med students should stick it out.<\/p>\n<p>If anything, in fact, the latest research suggests we need human healthcare professionals now more than ever.<\/p>\n<p>As <a href=\"https:\/\/www.psypost.org\/top-ai-models-fail-spectacularly-when-faced-with-slightly-altered-medical-questions\/\" class=\"underline hover:text-neoscope hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:#ff9900\" rel=\"nofollow noopener\" target=\"_blank\">PsyPost reports<\/a>, researchers have found that frontier AI models fail spectacularly\u00a0when the familiar formats of medical exams are even\u00a0slightly altered, greatly undermining their ability to help patients in the real world \u2014 and raising the possibility that, instead, they could cause great harm by providing garbled medical advice in high-stakes health scenarios.<\/p>\n<p>As detailed in a <a href=\"https:\/\/jamanetwork.com\/journals\/jamanetworkopen\/fullarticle\/2837372\" class=\"underline hover:text-neoscope hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:#ff9900\" rel=\"nofollow noopener\" target=\"_blank\">paper<\/a> published in the journal JAMA Network Open, things quickly fell apart for models including OpenAI&#8217;s GPT-4o and Anthropic&#8217;s Claude 3.5 Sonnet when the wording of questions in a benchmark test was only slightly adjusted.<\/p>\n<p>The idea was to probe the nature of how large language models arrive at their answers: by predicting the probability of each subsequent word \u2014 and not through a human-level understanding of complex medical terms.<\/p>\n<p>&#8220;We have AI models achieving near perfect accuracy on benchmarks like multiple-choice based medical licensing exam questions,&#8221; Stanford University PhD student and coauthor Suhana Bedi told PsyPost. &#8220;But this doesn\u2019t reflect the reality of clinical practice. We found that less than five percent of papers evaluate LLMs on real patient data, which can be messy and fragmented.&#8221;<\/p>\n<p>The results left a lot to be desired. According to Bedi, &#8220;most models (including reasoning models) struggled&#8221; when it came to &#8220;Administrative and Clinical Decision Support tasks.&#8221;<\/p>\n<p>The researchers suggest that &#8220;complex reasoning scenarios&#8221; in their benchmark threw the AIs for a loop since they &#8220;couldn\u2019t be solved through pattern matching alone&#8221; \u2014 which happens to be &#8220;exactly the kind of clinical thinking that matters in real practice,&#8221; per Bedi.<\/p>\n<p>&#8220;With everyone talking about deploying AI in hospitals, we thought this was a very important question to answer,&#8221; Bedi told PsyPost.<\/p>\n<p>For their benchmark test, the researchers made a clever adjustment to trip up the AIs: they replaced the correct answers of multiple-choice questions with the option &#8220;none of the other answers.&#8221; This change forced the AI models to actually reason their way to the right answer \u2014 and not rely on picking up familiar language patterns.<\/p>\n<p>The team noticed a significant decline in accuracy when presented with their new test, as compared to their answers to the original questions. For instance, OpenAI&#8217;s GPT-4o showed a reduction of 25 percent, while Meta&#8217;s Llama model showed a drop of almost 40 percent.<\/p>\n<p>The results suggest current AI systems may be vastly over-relying on recognizing language patterns, making them inadequate for real-world clinical use.<\/p>\n<p>&#8220;It\u2019s like having a student who aces practice tests but fails when the questions are worded differently,&#8221; Bedi told PsyPost. &#8220;For now, AI should help doctors, not replace them.&#8221;<\/p>\n<p>The research highlights the importance of finding new ways to evaluate the proficiency of AI models. That&#8217;s especially true for an extremely high-stakes environment like a hospital.<\/p>\n<p>&#8220;Until these systems maintain performance with novel scenarios, clinical applications should be limited to nonautonomous supportive roles with human oversight,&#8221; the researchers wrote in their paper.<\/p>\n<p class=\"\">More on AI doctors: <a href=\"https:\/\/futurism.com\/former-google-ai-exec-law-medicine\" class=\"underline hover:text-neoscope hover:no-underline transition-all duration-200 ease-in-out\" style=\"text-decoration-color:#ff9900\" rel=\"nofollow noopener\" target=\"_blank\">Founder of Google&#8217;s Generative AI Team Says Don&#8217;t Even Bother Getting a Law or Medical Degree, Because AI&#8217;s Going to Destroy Both Those Careers Before You Can Even Graduate<\/a><\/p>\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"Image by Getty \/ Futurism Last week, Google AI pioneer Jad Tarifi sparked controversy when he told Business&hellip;\n","protected":false},"author":2,"featured_media":95072,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43],"tags":[102,2960,56,54,55],"class_list":{"0":"post-95071","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-healthcare","8":"tag-health","9":"tag-healthcare","10":"tag-uk","11":"tag-united-kingdom","12":"tag-unitedkingdom"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/95071","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/comments?post=95071"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/95071\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media\/95072"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media?parent=95071"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/categories?post=95071"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/tags?post=95071"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}