AI Chatbots Can Diagnose. Doctors Have Questions.

Artificial intelligence’s promise in healthcare has always seemed intuitive. A system capable of processing vast amounts of patient data, cross-referencing symptoms, and responding in seconds looks like the natural fix for physician shortages and surging demand for care.

One of the most visible expressions of that promise is the proliferation of AI chatbots and virtual assistants in clinical settings. These tools are designed to triage symptoms, answer patient questions, and guide individuals toward appropriate care pathways. Recent advances in large language models (LLMs) appear to be bringing that vision closer to reality, with systems demonstrating striking performance on medical exams and structured diagnostic tasks.

In theory, AI-powered chat interfaces offer a scalable solution to physician shortages and rising demand for healthcare services. In practice, however, their effectiveness is uneven. A report from the Financial Times this week suggests that while chatbots handle routine queries and administrative interactions capably, their ability to deliver clinically reliable guidance is still limited — particularly when inputs are incomplete, ambiguous, or evolving.

A team of Swedish researchers set out to measure exactly how limited. They invented a fictitious eye condition — “bixonimania” — and introduced it into the AI ecosystem to test how readily chatbots would absorb and spread medical misinformation.

“I wanted to be really clear to any physician or any medical staff that this is a made-up condition, because no eye condition would be called mania — that’s a psychiatric term,” one researcher explained.

The experiment, published last Tuesday (April 7), showed that it worked far too easily, with the fake condition being seeded across chatbots and even other academic papers. The studies, both fake and real, have since been taken down.

Advertisement: Scroll to Continue

Accuracy Isn’t Judgment in Healthcare

While the grand narrative around AI in healthcare has centered on clinical breakthroughs such as algorithms that detect cancer earlier, predict disease trajectories, or help personalize treatment plans with unprecedented precision, the widespread, measurable improvement of patient outcomes remains elusive.

If the limitations of AI chatbots were purely technical, they might be easier to manage. But the more immediate concern is behavioral. When users interact with AI systems, they tend to treat outputs as authoritative, even when those outputs are generated from incomplete or flawed inputs. Misinterpretations of symptoms, overly cautious recommendations, or inconsistent advice can undermine trust and, in some cases, create additional burdens for human clinicians who must verify or correct AI-generated outputs.

A patient who consults an AI tool before seeing a doctor may receive a plausible but incorrect diagnosis. That initial suggestion can shape how symptoms are described, which concerns are emphasized, and ultimately how a clinician interprets the case. The result is not just a wrong answer, but a distorted diagnostic process.

This phenomenon is known as anchoring bias and has long been recognized in clinical settings. AI has the potential to amplify it at scale.

PYMNTS explored the rise of AI in healthcare earlier this year in a conversation with Marschall Runge, former CEO of Michigan Medicine, one of the country’s top academic medical centers.

He told PYMNTS CEO Karen Webster about the promise and the risk of using the technology in a clinical setting.

“AI thinks broadly,” he said. It can track a patient’s age, medications and underlying conditions simultaneously, making connections that a doctor running behind schedule and wrestling with a full caseload might miss. Runge has seen AI surface diagnostic possibilities that trained clinicians hadn’t initially considered. But the risks, he stressed, are real, such as overreliance and misplaced confidence.

More than 40 million people worldwide use ChatGPT daily for health-related queries, with about 70% happening outside clinic hours, as covered by PYMNTS.

Automation’s Healthcare Takeover

If AI’s clinical promise is still maturing, its administrative dominance is already well underway. Healthcare has historically been burdened by complex workflows, fragmented data systems, and labor-intensive processes. AI thrives in precisely these environments.

AI chatbots, after all, are far from the full picture of AI in healthcare. Health systems, insurers, and digital health startups are deploying AI tools at a remarkable pace, not just to cure disease or improve bedside care, but to streamline the business of healthcare itself.

PYMNTS covered last week how funding for digital healthcare startups has reached record levels for the first quarter of the year. From automating billing to optimizing patient intake and triage, AI is reshaping how healthcare organizations function financially and administratively, not just the ways in which patients engage with care.

For example, Adonis, an AI orchestration platform for healthcare revenue cycle management, recently raised $40 million; Utah regulators have cleared Y Combinator-backed Legion Health to let its AI renew certain psychiatric prescriptions without a doctor signing off each time.

AI is also moving fast into the financial mechanics of the multibillion dollar healthcare payment space, PYMNTS wrote in another recent report.

“UnitedHealth Group projects AI could save it nearly $1 billion in 2026, while HCA Healthcare expects roughly $400 million in AI-driven cost savings, partly from automating revenue management,” PYMNTS wrote. “On the other side of that ledger, Blue Cross Blue Shield has released an analysis suggesting that AI-enabled coding practices may be responsible for more than $2 billion in additional claims spending nationwide.”

Healthcare organizations, under constant financial pressure, are naturally drawn to solutions that deliver immediate, measurable returns. AI-driven automation fits this need perfectly. It reduces costs, improves margins, and addresses staffing shortages without requiring fundamental changes to clinical workflows.

Clinical innovation, by contrast, is slower, riskier, and harder to quantify. Demonstrating that an AI tool genuinely improves patient outcomes requires rigorous testing, long-term studies, and regulatory approval. The payoff, while potentially transformative, is less immediate and more uncertain.

For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.

AI Chatbots Can Diagnose. Doctors Have Questions.

Tags: