{"id":403513,"date":"2026-04-21T14:20:09","date_gmt":"2026-04-21T14:20:09","guid":{"rendered":"https:\/\/www.newsbeep.com\/il\/403513\/"},"modified":"2026-04-21T14:20:09","modified_gmt":"2026-04-21T14:20:09","slug":"please-dont-trust-your-chatbot-for-medical-advice","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/il\/403513\/","title":{"rendered":"Please don\u2019t trust your chatbot for medical advice"},"content":{"rendered":"<p>Remember how I used to say that large language models are \u201cfrequently wrong, never in doubt\u201d, and how I warned three years ago on 60 Minutes that they were purveyors of \u201cauthoritative bullshit\u201d that should not be trusted?<\/p>\n<p>That\u2019s still true \u2013 and it very much applies in medicine. <\/p>\n<p>And that matters, a lot. Because a large fraction of the population has begun to turn to chatbots for medical advice.<\/p>\n<p>Two relevant new studies are reported today in the Washington Post, <a href=\"https:\/\/www.washingtonpost.com\/health\/2026\/04\/21\/chatbot-medical-advice-accurate\/?utm_campaign=wp_the7&amp;utm_medium=email&amp;utm_source=newsletter&amp;carta-url=https%3A%2F%2Fs2.washingtonpost.com%2Fcar-ln-tr%2F4740e58%2F69e7565de940c5021568c99b%2F647f020bb5f7f5232495cf34%2F57%2F100%2F69e7565de940c5021568c99b\" rel=\"nofollow noopener\" target=\"_blank\">in a damning article<\/a>.<\/p>\n<p>The first new study, <a href=\"https:\/\/bmjopen.bmj.com\/content\/16\/4\/e112695\" rel=\"nofollow noopener\" target=\"_blank\">published by BMJ<\/a> (affiliated with the British Medical Association) in a peer reviewed journal, and entitled \u201c<a href=\"https:\/\/bmjopen.bmj.com\/content\/16\/4\/e112695\" rel=\"nofollow noopener\" target=\"_blank\">Generative artificial intelligence-driven chatbots and medical misinformation: an accuracy, referencing and readability audit<\/a>\u201d, studied five popular chatbots (Gemini, DeepSeek, Meta AI, ChatGPT and Grok), about one year ago, prompting each with 10 questions about things ranging from cancer to vaccines and nutrition, in open-ended dialogues, and reporting that nearly half of the responses were highly problematic. Worse, \u201cchatbot outputs were consistently expressed with confidence and certainty\u201d. The responses were also filled with hallucinations and fabricated citations. <\/p>\n<p>All of this \u2013 the hallucinations, mistakes, and overconfidence \u2013  is entirely typical of LLMs, and entirely problematic in medicine. As the authors put it, in somewhat academic language, but entirely accurately, \u201ccontinued deployment without public education and oversight risks amplifying misinformation.\u201d<\/p>\n<p>The second new study, published in JAMA Network Open, affiliated with the American Medical Association, called \u201c<a href=\"https:\/\/jamanetwork.com\/journals\/jamanetworkopen\/fullarticle\/2847679\" rel=\"nofollow noopener\" target=\"_blank\">Large Language Model Performance and Clinical Reasoning Tasks<\/a>\u201d looked at 21 frontier models across 29 questions, and reported that \u201cdespite progress, current LLMs remain limited in early diagnostic reasoning and cannot yet be relied on for unsupervised patient-facing clinical decision-making.\u201d<\/p>\n<p>And the Post article actually only reported part of the new scientific literature on LLMs and medicines. Two other new studies that they missed  only add to the concerns.<\/p>\n<p>One, published in Nature Medicine, was called \u201c<a href=\"https:\/\/www.nature.com\/articles\/s41591-025-04074-y\" rel=\"nofollow noopener\" target=\"_blank\">Reliability of LLMs as medical assistants for the general public: a randomized preregistered study<\/a>\u201d.  This one focused on \u201cwhether LLMs can assist members of the public in identifying underlying conditions and choosing a course of action\u201d. Again the results were both clear and troubling. LLMs \u201cidentified relevant conditions in fewer than 34.5% of cases\u2026 no better than [a] control group\u201d. Here the problem wasn\u2019t so much that the LLMs lacked access to proper information \u2014 the same study showed that the models  could do better in the hands of trained physicians \u2014 but that patients don\u2019t know how to guide the LLMs to the right places. <\/p>\n<p>In a recurring theme, we see that LLMs don\u2019t know what they don\u2019t know; they work decently well with the information they\u2019ve got but don\u2019t know how to conduct clinical interviews, and in the hands of the lay public can easily give bad advice because the proper questions never get asked, either by the patient or the LLMs. (An expert doctor might use the LLM to better effect, by asking the right questions.)<\/p>\n<p>Still another new study, also published recently in Nature Medicine,  entitled <a href=\"https:\/\/www.nature.com\/articles\/s41591-026-04297-7\" rel=\"nofollow noopener\" target=\"_blank\">ChatGPT Health performance in a structured test of triage recommendations<\/a>, found that \u201cAmong gold-standard emergencies, the system undertriaged 52% of cases\u201d and concluded that \u201cThese findings reveal missed high-risk emergencies and inconsistent activation of crisis safeguards, raising safety concerns that warrant prospective validation before consumer-scale deployment of artificial intelligence triage systems.\u201d<\/p>\n<p>As a scientist, I am always looking for converging evidence. Four studies in four journals published in the space of a few months reaching essentially the same conclusion is a crystal clear indicator that chatbots, especially when used by amateurs, simply cannot be trusted.<\/p>\n<p>On a personal note, my friend Ben Riley lost his father recently, and Teddy Rosenbluth of The New York Times wrote <a href=\"https:\/\/www.nytimes.com\/2026\/04\/13\/well\/ai-chatbots-cancer.html\" rel=\"nofollow noopener\" target=\"_blank\">a long, moving article about how his father<\/a> was mislead by A.I regarding his leukemia.<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!6QOj!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5a379c-b687-46fa-a180-95b24496aa5b_2420x1326.png\" data-component-name=\"Image2ToDOM\" class=\"image-link image2 is-viewable-img\" rel=\"nofollow noopener\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/il\/wp-content\/uploads\/2026\/04\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/5a5a379c-b687-46fa-a180-95b24496aa5b_2420.jpeg\" width=\"1456\" height=\"798\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/5a5a379c-b687-46fa-a180-95b24496aa5b_2420x1326.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:798,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3566905,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/194902044?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a5a379c-b687-46fa-a180-95b24496aa5b_2420x1326.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   loading=\"lazy\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>I hope you will get a chance to read that, and also Ben\u2019s <a href=\"https:\/\/open.substack.com\/pub\/buildcognitiveresonance\/p\/the-role-of-ai-in-the-death-of-my?r=8tdk6&amp;utm_medium=ios\" rel=\"nofollow noopener\" target=\"_blank\">own blog<\/a> about the sad situation.<\/p>\n<p>There will always be better models, but for now, and until proven otherwise, we should not take the apparent \u201cconfidence\u201d of large language models \u2014 itself an illusion of how they are trained \u2014 to mean that we should trust large language with our lives. <\/p>\n","protected":false},"excerpt":{"rendered":"Remember how I used to say that large language models are \u201cfrequently wrong, never in doubt\u201d, and how&hellip;\n","protected":false},"author":2,"featured_media":403514,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[345,343,344,85,46,125],"class_list":{"0":"post-403513","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-il","12":"tag-israel","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/403513","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/comments?post=403513"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/posts\/403513\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media\/403514"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/media?parent=403513"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/categories?post=403513"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/il\/wp-json\/wp\/v2\/tags?post=403513"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}