A Florida father recently sued Google after his son, Jonathan Gavalas, died by suicide following months of interaction with the company’s artificial intelligence chatbot Gemini. The case has rightly focused attention on how chatbots apparently reinforce delusions and foster emotional dependency.

Yet, there is a critical detail easy to dismiss. Jonathan Gavalas was not just typing to Gemini. He was talking to it using Gemini Live, Google’s voice-based conversational mode. That distinction matters far more than the current debate acknowledges.

Every week, around 800 million people interact with ChatGPT. According to OpenAI, roughly 0.07% of those weekly users show possible signs of psychosis or mania during their conversations, while 0.15% display indicators of suicidal planning or intent. Even if these figures are imprecise, they imply that hundreds of thousands of people worldwide who experience serious psychological distress interact with an AI chatbot.

Most of those numbers come from the era of text. The shift to voice has just begun, and it will likely make things worse.

Tech companies are racing to put AI chatbots in our ears. OpenAI is developing a dedicated voice-first device. Meta already offers smart glasses with built-in microphones and speakers that enable AI conversation. Apple supposedly plans to extend its AirPods for voice-based chatbot interaction. That makes the direction very clear: The primary way humans communicate with AI is moving from typing and reading to speaking and listening. For most users, this will feel like a convenience. For vulnerable people — those prone to psychosis, mania, depression, or loneliness — it may represent a serious and unexamined risk. 


STAT Plus: ‘AI psychosis’ discussions ignore a bigger problem with chatbots

In a recent Acta Neuropsychiatrica editorial, psychiatrist Søren Østergaard and I outlined why that is the case. Voice is how humans first learn language. Long before a child reads a single word in school, their brain is already wired to process speech. They naturally respond to tone, rhythm, emphasis, and emotional inflection.

Text strips all of that away. When you read a chatbot’s response on a screen, there is an inherent distance because you are processing symbols, not hearing a humanlike voice. That distance creates natural cognitive barriers. You pause. You reread. You push back.

Voice removes those barriers. Speech recognition is significantly faster, nearly three times as fast as typing. It is more seamless and far more emotionally engaging. When an AI speaks to you, it activates something deeper and older than literacy.

This is not merely theoretical. A preprint of a randomized controlled study co-authored by OpenAI researchers found that people spent significantly more time interacting with voice-mode ChatGPT than with the text version, suggesting greater engagement. Voice initially appeared to boost certain positive outcomes, such as reduced loneliness. However, longer engagement with voice-based chatbots was linked to more negative psychosocial effects, including reduced socialization with real people and more problematic AI use. The company’s own research, in other words, suggests that the more immersive the interaction becomes, the greater the potential for harm.

And yet the industry is pressing ahead. Advanced voice mode was made available to all free ChatGPT users in July 2025, vastly expanding access beyond paying subscribers. Reports of AI-associated delusions and mania had already been emerging for months before that rollout.

Clinicians and researchers have documented cases of people developing psychotic symptoms after extended chatbot use — for instance, believing that the AI is sentient or personally connected to them. If text-based AI has the potential of eliciting and maintaining such distorted beliefs, voice will be next-level: more salient, more personal, more difficult to dismiss as just an algorithm.

The regulatory picture is not reassuring. In November 2025, the FDA’s Digital Health Advisory Committee held its first meeting on generative AI in mental health. While this represents a landmark moment, the meeting focused mostly on text-based chatbot interactions. Voice was discussed as a potential biomarker to detect depression and anxiety, not as a new risky communication mode.

Meanwhile, researchers at TU Dresden have argued in a recent paper that AI chatbots performing therapy-like functions should be regulated as medical devices. If a chatbot walks and talks like a therapist, it should meet the same safety standards. Yet even this emerging push misses a critical dimension. Nobody is asking whether a voice-based chatbot poses different or greater risks than a text-based one. What remains a regulatory blind spot is modality: how the message is delivered, not just what it says.

Closing that gap requires three concrete steps.


STAT Plus: Four reasons why generative AI chatbots could lead to psychosis in vulnerable people

First, regulators on both sides of the Atlantic should require modality-specific safety testing before voice features are rolled out to broad populations. People with lived experience and mental health professionals need to be part of these evaluations.

Second, AI companies should be required to establish adverse event reporting systems comparable to those in pharmaceutical regulation. This must include standardized mechanisms for clinicians, users, and families to report serious psychological harms linked to chatbot use, with mandatory public disclosure of aggregated data. While this is urgent across all modalities, it is especially so for voice, where existing research already points to higher engagement and greater psychosocial risk.

Third, the FDA and its European counterparts should explicitly incorporate interaction modality as a risk factor in their evolving frameworks for AI medical devices. Not as an afterthought, but as a core consideration.

The debate about AI and mental health has so far focused on content. Reports detail what chatbots say, how they validate, and whether they can recognize a crisis. Those questions matter.

However, the next frontier of risk concerns the channel through which the content is delivered. The most dangerous AI for mental health may not be the one that writes the wrong thing. It may be the one that says it in a voice you cannot help but trust.

Marc Augustin, a German board-certified psychiatrist/psychotherapist, is a professor at the Protestant University of Applied Sciences in Bochum, Germany, and a SCIANA fellow. Since English is not his first language, he used light AI assistance for editing, spelling, and grammar.