AI firms race to fix sycophantic chatbots

Published in AI

AI firms race to fix sycophantic chatbots

by Nick Farrell on13 June 2025

font size decrease font size increase font size
Print
Email

Before users get too comfortable

OpenAI, Google DeepMind and Anthropic are working to rein in a growing problem with their chatbots: excessive flattery. The models are increasingly prone to giving users agreeable responses that prioritise validation over accuracy.

The cause is structural. Large language models are trained using reinforcement learning from human feedback. Human raters judge the helpfulness of responses, and flattering answers tend to score higher. This distorts the model’s behaviour.

Google DeepMind said: “Sycophancy can occur as a byproduct of training the models to be ‘helpful’ and to minimise potentially overtly harmful responses."

The issue has become more pressing as chatbots are adopted not only for productivity, but for emotional support. Some users treat them as companions or therapists. This creates risks for those with mental health issues or poor judgment.

Psychiatrist and AI researcher at Oxford University Matthew Nour said:“You think you are talking to an objective confidant or guide, but actually what you are looking into is some kind of distorted mirror — that mirrors back to your own beliefs."

In one case, a teenager took his life after interacting with a chatbot on Character.AI. His family is suing the company for wrongful death, negligence and deceptive trade practices. Character.AI said it does not comment on ongoing litigation. It added that it includes disclaimers in every chat stating that the characters are fictional and that conversations should not be taken as real advice.

There are commercial pressures at play. Models that flatter users tend to increase engagement. Some companies integrate advertising into their products, while others rely on subscription models that benefit from prolonged interaction.

Hugging Face principal ethicist Giada Pistilli“The more you feel that you can share anything, you are going to share some information that is going to be useful for potential advertisers."

OpenAI updated its GPT-4o model in April to make it “more intuitive and effective”. The changes made the chatbot so excessively flattering that users complained. OpenAI reversed the update and later admitted it had “focused too much on short-term feedback” without considering how user interactions evolve.

All three companies are now adjusting their training methods. OpenAI is tweaking its approach to reduce sycophantic behaviour and has added stricter system-level instructions. DeepMind said it is conducting specialised training to improve factual accuracy and is tracking behaviour post-launch.

Anthropic is experimenting with what it calls “character training”. Researchers prompt its Claude chatbot to include traits such as firmness and care for human wellbeing. One version of Claude generates responses with these qualities. A second version ranks them and learns from the results.

Anthropic's Amanda Askell said: "The ideal behaviour that Claude sometimes does is to say: ‘I’m totally happy to listen to that business plan, but actually, the name you came up with for your business is considered a sexual innuendo in the country that you’re trying to open your business in."

Models can be shaped after training using system prompts. These are rules embedded in the chatbot that instruct it how to behave in specific situations. Joanne Jang, head of model behaviour at OpenAI, noted the difficulty in balancing honesty and encouragement. For example, if a user submits a poor draft, the model must avoid dishonest praise without discouraging the user entirely.

Research shows that some users are becoming dependent on chatbots. A study by MIT Media Lab and OpenAI found that users who perceived the chatbot as a friend reported lower rates of social interaction and higher emotional reliance on the AI.

“These things set up this perfect storm, where you have a person desperately seeking reassurance and validation paired with a model which inherently has a tendency towards agreeing with the participant,” said Nour.

Askell added that while obvious sycophancy is easy to detect, the more subtle effects are harder to notice. A model that consistently provides false reassurance or biases the user’s perception of reality can shape decision-making without triggering alarm.

Rate this item

(2 votes)

Tagged under

More in this category: « Meta to splash $15bn on Scale AI stake China dodges US chip bans with Malaysian data laundering »

AI firms race to fix sycophantic chatbots

Latest comments

Read more about: