Anthropic researchers have claimed that a Chinese state-backed espionage group used its Claude artificial intelligence (AI) to automate most of a cyberattack campaign — but the news has sparked equal parts alarm and scepticism. In light of the research, the cybersecurity community is attempting to untangle what really happened and how autonomous the model actually was.

Company representatives said Nov. 13 in a statement that engineers disrupted what they describe as a “largely autonomous” operation that used the large language model (LLM) to plan and execute roughly 80-90% of a broad reconnaissance-and-exploitation effort against 30 organizations worldwide.

Engineers say they detected a cluster of misuse attempts across its products that ultimately traced back to operators linked to a Chinese state-sponsored espionage group. The attackers allegedly pointed Anthropic’s Claude Code model at targets spanning tech, finance, and government, tasking it with reconnaissance, vulnerability analysis, exploit generation, credential harvesting, and data exfiltration. According to the statement, humans intervened for only “high-level decision-making,” such as choosing targets and deciding when to pull stolen data.

You may like

Engineers then thwarted the campaign internally through monitoring and abuse-detection systems that flagged unusual patterns indicative of automated task-chaining. Company representatives also reported that the attackers attempted to circumvent the model’s guardrails by breaking malicious goals into smaller steps and framing them as benign penetration-testing tasks — an approach researchers call “task decomposition.” In several examples published by Anthropic, the model attempted to carry out instructions but produced errors, including hallucinated findings and obviously invalid credentials.

Mike Wilkes, adjunct professor at Columbia University and NYU, told Live Science that the attacks themselves look basic, but the novelty lies in the orchestration.

“The attacks themselves are trivial and not scary. What is scary is the orchestration element being largely self-driven by the AI,” Wilkes said. “Human-augmented AI versus AI-augmented human attacks: the narrative is flipped. So think of this as just a “hello world” demonstration of the concept. Folks dismissing the content of the attacks are missing the point of the “leveling up” that this represents.”

Other experts question whether the operation really reached the 90% automation mark that Anthropic representatives highlighted.

Seun Ajao, senior lecturer in data science and AI at Manchester Metropolitan University, said that many parts of the story are plausible but are likely still overstated.

He told Live Science that state-backed groups have used automation in their workflows for years, and that LLMs can already generate scripts, scan infrastructure, or summarise vulnerabilities. Anthropic’s description contains “details which ring true,” he added, such as the use of “task decomposition” to bypass model safeguards, the need to correct the AI’s hallucinated findings, and the fact that only a minority of targets were compromised.

You may like

“Even if the autonomy of the said attack was overstated, there should be cause for concern,” he argued, citing lower barriers to cyber espionage through off-the-shelf AI tools, scalability, and the governance challenges of monitoring and auditing model use.

Katerina Mitrokotsa, a cybersecurity professor at the University of St. Gallen, is similarly sceptical of the high-autonomy framing. She says the incident looks like “a hybrid model” in which an AI is acting as an orchestration engine under human direction. While Anthropic frames the attack as AI-orchestrated end-to-end, Mitrokotsa notes that attackers appear to have bypassed safety restrictions mainly by structuring malicious tasks as legitimate penetration tests and slicing them into smaller components.

“The AI then executed network mapping, vulnerability scanning, exploit generation, and credential collection, while humans supervised critical decisions,” she said.

In her view, the 90% figure is hard to swallow. “Although AI can accelerate repetitive tasks, chaining complex attack phases without human validation remains difficult. Reports suggest Claude produced errors, such as hallucinated credentials, requiring manual correction. This aligns more with advanced automation than true autonomy; similar efficiencies could be achieved with existing frameworks and scripting.”