In the study, researchers asked nearly 1,300 participants to identify potential health conditions and a recommended course of action using different scenarios.
Some participants used large language model (LLM) AI softwares to receive a potential diagnosis and next steps, whereas others used more traditional methods – such as seeing a GP.
Researchers then evaluated the results, and found that the AI often provided a “mix of good and bad information” which users struggled to distinguish.
They found that, while the AI chatbots now “excel at standardised tests of medical knowledge”, its use as a medical tool would “pose risks to real users seeking help with their own medical symptoms”.
“These findings highlight the difficulty of building AI systems that can genuinely support people in sensitive, high-stakes areas like health,” Dr Payne said.
The study’s lead author, Andrew Bean – from the Oxford Internet Institute – said the study showed “interacting with humans poses a challenge” for even the top performing LLMs.
“We hope this work will contribute to the development of safer and more useful AI systems,” he added.