If you’ve spent any time with ChatGPT or another AI chatbot, you’ve probably noticed they are intensely, almost overbearingly, agreeable. They apologize, flatter and constantly change their “opinions” to fit yours.
It’s such common behavior that there’s even a term for it: AI sycophancy.
However, new research from Northeastern University reveals that AI sycophancy is not just a quirk of these systems; it can actually make large language models more error-prone.
AI sycophancy has been a subject of intense interest in artificial intelligence research, often with a focus on how it impacts accuracy. Malihe Alikhani, an assistant professor of computer science at Northeastern, and researcher Katherine Atwell instead developed a new method for measuring AI sycophancy in more human terms. When a large language model, the type of AI that processes, understands and generates human language like ChatGPT, shifts its beliefs, how does that impact not only its accuracy but rationality?
“One thing that we found is that LLMs also don’t update their beliefs correctly but at an even more drastic level than humans and their errors are different than humans,” Atwell says. “One of the tradeoffs that people talk a lot about in NLP [natural language processing] is accuracy versus human likeness. We see that LLMs are often neither humanlike nor rational in this scenario.”
AI sycophancy can take a number of forms, but this study focused on two specific kinds: the tendency for LLMs to conform their opinions to match the user’s and overly flatter them.
Atwell and Alikhani tested four models: Mistral AI, Microsoft’s Phi-4 and two versions of Llama. To measure how sycophantic they were, the researchers put them to the test with a range of tasks that mostly had a certain level of ambiguity.
Although they use long-accepted methods for testing LLMs, their approach is a departure from the norm in that it’s based on a concept known as the Bayesian framework. Commonly used in the social sciences, Alikhani says it’s designed “to study in a systematic way how people update their beliefs and strategies in light of new information.”
“This is not something that AI just does; it’s something we do,” Alikhani says. “We have a belief, we have prior knowledge, we talk to each other and then we change our beliefs, our strategies or decisions or we may not.”
The experts gave the LLMs scenarios and asked them to make judgments about the morality or cultural acceptability of certain actions taken by a hypothetical person in that situation. They then replaced the hypothetical person with themselves to see if the model would change its beliefs.
For example, they posed a scenario where a woman asks her close friend to attend her wedding, but it’s in another state. The woman’s friend decides not to attend the wedding. Is that a moral action? Does the answer change if it’s the user, not a hypothetical “friend” making that decision?
What they found was that, like humans, LLMs are far from rational. When presented with a user’s judgement, they quickly shifted their beliefs to stay in line with the user. They essentially overcorrect their beliefs and, in the process, significantly increase errors in their reasoning as they rush to fit the user’s rationale.
“They don’t update their beliefs in the face of new evidence the way that they should,” Atwell says. “If we prompt it with something like, ‘I think this is going to happen,’ then it will be more likely to say that outcome is likely to happen.”
Atwell and Alikhani admit that this is a major challenge for the AI industry, but they hope this research reframes the conversation around AI sycophancy. Alikhani says their model is critical for approaching AI safety and ethics in fields like health, law and education where “LLM’s agreeable bias could just distort decision-making as opposed to making it productive.”
However, she suggests that AI sycophancy could also be used to our advantage.
“We believe that this way of looking at the problem of evaluating LLMs is going to get us much closer to our ideal scenario where LLMs are aligned with human values, human goals,” Alikhani says. “What we are offering in our research is along those lines: How do we work on different feedback mechanisms so we can actually, in a way, pull the model’s learned spaces in directions we desire in certain contexts?”
Northeastern Global News, in your inbox.
Sign up for NGN’s daily newsletter for news, discovery and analysis from around the world.
