Although there have been numerous media reports and a collection of lawsuits over the last two years linking the use of artificial-intelligence chatbots like ChatGPT to mental-health crises, to date, there has been little scientific examination of their relationship.
Researchers at UCSF and Stanford University are planning to more systematically study that interaction soon.
In the near future, UCSF psychiatry and behavioral sciences patients will likely be asked about their AI chatbot use as part of the regular office visit intake process. And in collaboration with colleagues at Stanford, researchers in the department are planning to ask patients to voluntarily share their chatbot logs to see if the researchers can make any connections between patients’ diagnosed conditions or changes in their mental health and their chatbot conversations.
“We are really interested in looking at the relationship between mental illness and how people use chatbots and the interplay between those things,” said Dr. Karthic Sarma, a computational health scientist at UCSF who works in the psychiatry department and is heading up the research effort there.
For the research project, Sarma is working with Stanford’s mental health-focused Brainstorm Lab and the Stanford Intelligent Systems Laboratory, which studies automated decision-making applications. The project was inspired by OpenAI’s disclosure in October that it had detected signs of psychosis and suicide planning or intentions in small fractions of its active weekly users, he said.
Right now, mental-health clinicians don’t generally ask patients about their chatbot use as a matter of routine at UCSF or elsewhere, Sarma said. But after June, when UCSF physicians diagnosed their first of what has become a handful of cases of AI-related delusions, they started thinking about whether they should begin routinely screening for chatbot use, according to Dr. Joseph Pierre, Sarma’s colleague in the department.
UCSF physicians began making the connection between delusions and use of chatbots by asking patients about the latter, said Pierre, the chief of the inpatient unit at UCSF’s Langley Porter Psychiatric Hospital. They were spurred to do so by media reports about people whose extensive AI use appeared to be inducing psychoses, he said.
The discovery of those cases prompted Pierre and his colleagues to want to study the issue more systematically, Pierre said. Currently, discussion of chatbot use with patients is kind of ad-hoc, he said. It’s not something that he asks of every patient; instead, it typically comes up if a patient mentions it.
That makes it impossible to know how prevalent chatbot use is among patients or often it’s leading to delusions or other problems, he said.
UCSF psychiatrist Dr. Joseph Pierre: “Chatbots are simply generating text based on probabilities … and so they’re churning out stuff that seems on the surface to be convincing.”
There’s “probably a bunch of people who we never bother asking and don’t detect,” Pierre said.
Sarma’s screening effort would seek to change that. He and his colleagues are still working out the details, but the plan is to pattern the effort after the screenings they already do when patients walk in the door.
Pretty much every psychiatric patient fills out a survey asking how they are feeling, whether they’re experiencing depression or have thought about harming themselves, Sarma said. Physicians also routinely ask patients about their housing situation, what they’re eating and whether they’ve been exercising.
As Sarma envisions it, they would, with similar regularity, be asking patients about chatbot use as well. The idea would be to find out if patients are using chatbots, which ones they’re using and how frequently and for how long they are using the systems, he said. Clinicians also would like to get a sense of the purposes for which patients are using chatbots, such as whether patients are turning to the systems for assistance with their mental or emotional health.
Such data could be linked to patients’ ages to see if any trends emerge, he said. And patients would be asked about their chatbot use with every visit to track changes over time, he said.
“It’s seeming like using AI chatbots may become a part of everyday life that we should be talking to everybody about, or at least finding out from everybody about what’s going on,” he said.
At least initially, the effort would likely focus on people who visit the department on an outpatient basis, Sarma said. That’s because they represent the vast majority of UCSF’s psychiatric and behavioral patients, he said. Additionally, patients who are checked into the hospital are often suffering from symptoms that are so severe that it isn’t feasible to screen them, he said.
Sarma didn’t know when UCSF will begin regularly asking patients about their chatbot use, but said he is hopeful it will start soon. UCSF will likely launch the effort by putting together a prototype survey, testing it and then revising over time, he said. Eventually, the goal would be to do near universal screening, he said.
“It may be some time before we get to the point where everyone is screened,” he said. “But … it’s easy to imagine, if it turns out to be useful, that someday we might screen everybody for how they’re using AI.”
While its number of users and sales are soaring, the San Francisco AI giant is burning through piles of cash and is in need of much more soon
A new mayor and new Board of Supervisors took office, and struggled with the same issues
In Mayor Daniel Lurie’s first year, The City adopted new strategies to address its two most persistent failures — with mixed results
While OpenAI and other AI chatbot developers have access to their users’ chat conversations, they don’t have access to patients’ clinical histories, so it can’t definitively tie their interactions to diagnosed mental-health problems, Sarma said. By contrast, UCSF physicians know their patients’ clinical histories, but don’t know what they’re discussing with their chatbots.
Sarma and his colleagues plan to ask patients for access to their chatbot conversations. They want to look at how those conversations developed over time and their patients’ symptoms to see if there’s a relationship between the two, he said.
The researchers are going into the potential study assuming from previous research that they’ll find widespread use of chatbots, Sarma said. But they aren’t going into it with a particular hypothesis about the nature of the relationship between that use and mental health, he said.
For some people at some times, chatbot use could be beneficial to their mental health, he said. At other times or with other people, it could be associated with a worsening of their symptoms. It could also be that different types of people are affected differently by similar chatbot use, he said.
“We’re going in with our eyes open and our hearts open to what we might find,” Sarma said.
Duncan Eddy and his colleagues at SISL have been studying how particular prompts can cause the large-language models that underlie ChatGPT and other chatbots to fail. They’ve looked at how small or subtle changes to prompts can induce toxic or harmful responses from such models.
They’ve also begun to study how the models react to inputs that indicate a mental-health condition. An outgrowth of that research was the work Eddy’s colleagues at Stanford Brainstorm did with Common Sense Media that found that chatbots struggle to detect signs of mental-health crises in longer interactions. In response to that finding, Common Sense advised parents not to let their kids use chatbots for companionship or mental-health conversations.
Thus far, Eddy’s research has focused on how the chatbots respond to synthetic prompts, ones generated by the researchers themselves or an automated system. He and his Stanford colleagues have been working on a way — using input from mental-health experts — of automatically evaluating the chatbots’ responses to related prompts.
Eddy met Brainstorm founder Nina Vasan through Stanford’s Center for AI Safety, where Vasan is a faculty member and Eddy is a postdoctoral scholar. Vasan connected Eddy to Sarma, who is her partner, and his team at UCSF when Sarma came to give a talk at Stanford, Eddy said.
The collaboration with UCSF represents an opportunity to take that system they’re developing and use it to analyze and evaluate the conversations the chatbots are having with actual mental-health patients, he said. They’ll be looking for risk factors and early warning signs in the prompts and responses that can be linked to changes in patients’ mental health. They’ll also potentially be looking at whether people who are showing signs of mania are more likely to be using one chatbot or another, he said.
Essentially, though, they’ll be trying to figure out where the conversations go off track in ways that have been linked to delusions.
“I think it’s going to be one of the first studies, hopefully, of really trying to understand these types of incidents from a research perspective,” Eddy said.
“There seems to be something there, but no one has actually done … the rigorous work to see what the causes of the relationship are and then what we can actually do about it,” he said.
The Stanford researchers already have funding for their AI safety research via a grant from the family foundation of former Google CEO Eric Schmidt and his wife, Wendy, Eddy said. They’ve applied for a grant from OpenAI to fund the study involving clinical data, but haven’t yet heard whether they will receive it, he said.
Like Sarma, Eddy said he isn’t going into the study with any particular hypothesis about what they’ll find. At this point, he and his team are looking for “reliable signals” of problems.
“I’m trying to keep an open mind right now and let the data speak for itself right now,” he said.
That said, he suspects that such a signal that people are in distress might be simply the frequency with which they are interacting with a chatbot or the duration of their conversations. It could be that a reliable indicator that someone is having a harmful conversation with a chatbot is if they are sending thousands of prompts or interacting with it for hours on end, he said.
“It might even be the detection actually is independent of the actual quality or input of the responses,” he said.
If you have a tip about tech, startups or the venture industry, contact Troy Wolverton at twolverton@sfexaminer.com or via text or Signal at 415.515.5594.



