A team of researchers built a fake company staffed entirely with artificial intelligence agents to see whether machines could really replace human employees. The findings should reassure anyone worried about losing their job to AI.

Is artificial intelligence on the verge of taking over our jobs? Some companies are betting on it, while others remain skeptical, calling the technology overhyped. So, what’s the truth? In a preprint study posted on Arxiv, researchers from Carnegie Mellon University created a simulated company and hired AI agents to run it. The outcome, however, was anything but encouraging.

The “staff” included agents based on Anthropic’s Claude, OpenAI’s GPT-4o, Google’s Gemini, Amazon’s Nova, Meta’s Llama, and Alibaba’s Qwen. Each was assigned a role such as financial analyst, project manager, or software engineer. Meanwhile, the researchers used another platform to simulate coworkers the agents had to contact for certain tasks — including an HR department.

The agents failed more than three-quarters of their assignments

The agents were given a range of jobs, from analyzing databases to conducting virtual tours of office spaces. Claude 3.5 Sonnet was the top performer, but even it managed to complete only 24 percent of its tasks. Counting partial completions, that number rose to just 34.4 percent. Gemini 2.0 Flash came next, finishing 11.4 percent — and no other agent exceeded 10 percent.

In terms of costs, Claude 3.5 Sonnet racked up $6.34 in operational expenses, compared to Gemini 2.0 Flash’s modest $0.79. Still, the overall results make it clear that today’s AI tools — for all their speed and efficiency — are far from capable of truly autonomous work.

The researchers noted that many failures came from misunderstanding subtle instructions. When told to save a file with a “.docx” extension, for instance, the AIs didn’t recognize it as a Microsoft Word document. They also stumbled on tasks requiring communication or social reasoning. The biggest challenge came when browsing the web — especially handling pop-ups. And when the systems got lost, they often took shortcuts, skipping difficult steps and mistakenly assuming success.

These findings suggest that while artificial intelligence can excel at specific and narrowly defined tasks, it’s still a long way from working independently. For now, human judgment, creativity, and adaptability remain vital parts of any workplace.

Edward Back

Journalist

My passion for programming began with my very first computer, an Amstrad CPC 6128. I started coding in Basic, then moved on to Turbo Pascal on a 286, eventually exploring more modern languages including web development. I’m also deeply interested in science, which led me to attend a math-focused preparatory program. Later, I studied psychology with a focus on the cognitive aspects of artificial intelligence.