AI agents can't pull off fully autonomous cyberattacks – yet • The Register

AI agents and other systems can’t yet conduct cyberattacks fully on their own – but they can help criminals in many stages of the attack chain, according to the International AI Safety report.

The second annual report, chaired by the Canadian computer scientist Yoshua Bengio and authored by more than 100 experts across 30 countries, found that over the past year, developers of AI systems have vastly improved their ability to help automate and perpetrate cyberattacks.

Perhaps the best, and scariest, evidence of that finding appeared in Anthropic’s November 2025 report about Chinese cyberspies abusing its Claude Code AI tool to automate most elements of attacks directed at around 30 high-profile companies and government organizations. Those attacks succeeded in “a small number of cases.”

“At least one real-world incident has involved the use of semi-autonomous cyber capabilities, with humans intervening only at critical decision points,” according to the AI safety report. “Fully autonomous end-to-end attacks, however, have not been reported.”

Two areas where AI is especially useful to criminals are scanning for software vulnerabilities and writing malicious code.

During DARPA’s AI Cyber Challenge (AIxCC) – a two-year competition in which teams built AI models to find vulnerabilities in open source software that undergirds critical infrastructure – finalist systems autonomously identified 77 percent of the synthetic vulnerabilities used in the final scoring round, according to competition organizers.

And while that is an example of defenders using AI to find and fix vulnerabilities, rather than attackers using AI to find and exploit them, criminals are using models in similar ways. Last northern summer, we saw attackers on underground forums claiming to use HexStrike AI, an open-source red-teaming tool, to target critical vulnerabilities in Citrix NetScaler appliances within hours of the vendor disclosing the problems.

Additionally, AI systems are getting much better at malware writing, and criminals can trade weaponized models that write ransomware and data-stealing code for as little as $50 a month.

The good news for now, according to the report’s authors, is that AI systems still aren’t great at carrying out multi-stage attacks without human help.

“Research suggests that autonomous attacks remain limited because AI systems cannot reliably execute long, multi-stage attack sequences,” according to the report. “For example, failures they exhibit include executing irrelevant commands, losing track of operational state, and failing to recover from simple errors without human intervention.”

Keep in mind, however, that this all was written before the security dumpster fire that is OpenClaw – the AI agent previously known as Moltbot and Clawdbot – and Moltbook, the vibe-coded social media platform for AI agents.

So it’s also entirely plausible that the world won’t end with a sophisticated, autonomous multi-stage cyberattack dreamed up by a nation-state crew or criminal mastermind, but rather a single agent that goes off the rails. ®

AI agents can’t pull off fully autonomous cyberattacks – yet • The Register

Tags: