Unlock the Editor’s Digest for free

Imagine a car that won’t let the driver go above the speed limit. It sounds simple enough, yet there isn’t much demand for a machine that takes moral decisions away from the user. Even Tesla’s speed limit-obeying “Sloth” model is optional; users of its self-driving cars can also go full “Mad Max”.

In the world of AI, some companies think customers would prefer products with morals pre-installed. Take Anthropic, whose Claude chatbot is trained to “have good values”. This is making Anthropic unpopular in some quarters. The US Department of Defense has protested against limits that would disallow self-directed lethal strikes or mass snooping on citizens — a dispute that on Friday was headed towards a tense stand-off.

Rivals are, meanwhile, trying to undermine Anthropic’s safety-first creds, which it exhibits through a “constitution” that tells Claude to prioritise safety, ethics and helpfulness in that order. OpenAI’s Sam Altman has branded the company “authoritarian”. Elon Musk, founder of xAI and the Grok chatbot, called it “misanthropic” for what he claims is bias against white men, among others.

Do customers care? After all, Anthropic’s real growth driver, accounting for some 80 per cent of revenue, is selling tools to corporate users focused mainly on efficiency and profit. Whether the AI would push the nuclear button, as researchers at King’s College suggest it sometimes would, is of little direct import to such customers.

Investors certainly don’t think Anthropic’s ethical stylings a hindrance. The company just raised money at a $350bn valuation and may seek to go public later this year. The effectiveness of Claude Code, its programming aide, has helped knock $1tn off the combined value of S&P 500 software stocks this year. Anthropic’s claim that Claude can code in COBOL, a clunky language used in IT mainframes, shaved $30bn off IBM’s market capitalisation in a single day.

Bar chart of Change in market capitalisation or equity value (%) showing Agents of chaos

There is one area where a bot’s integrity does matter today: hallucinations. Peter Gostev of AI evaluation outfit Arena.ai has published a “bullshit benchmark” that tests whether models challenge nonsensical questions or simply respond with more hogwash. Anthropic’s scored best; some of OpenAI’s were among the worst. Even then, that may have more to do with a model’s quality of analysis than its inherent views on truthfulness.

The shift to agentic AI — bots that don’t just assist but actually execute tasks and exert judgment — will raise the stakes. As AI gets more humanlike, and its role within the company gets more senior, how it responds to complex challenges and conflicts will matter more. When is it better to ignore a command? When could pursuing a short-term goal lead to longer-term problems? When is it OK to tell the boss to “shove it”?

For better or worse, it’s really no different to what companies seek in their employees. For less critical and more process-driven jobs, employers seek workers who follow rules. At senior levels, where an individual’s actions can affect the value of the whole firm, good judgment in unusual situations becomes valuable, and commands higher pay.

Of course, whether a company’s fundamental view of “ethical” will correspond with Anthropic’s is up for debate. One day an agent will be asked to do something bad for the world but good for a company’s profit. A model that puts a premium on good behaviour ought to be more valuable; in the real corporate world, one that prized shareholder value even more would no doubt clean up.

john.foley@ft.com