Open-source b3 framework to benchmark AI agent security unveiled

Check Point, Lakera and the UK AI Security Institute have released b3, an open-source framework to test the security of large language models used in AI agents.

The backbone breaker benchmark (b3) introduces a new methodology for assessing vulnerabilities in large language models (LLMs) by focusing on ‘threat snapshots’-key points within AI agent workflows where vulnerabilities are most likely to be exposed.

Rather than attempting to simulate the entire workflow of an AI agent, the b3 benchmark hones in on specific moments that are critical for security evaluation. This approach allows developers and model providers to assess their systems’ resilience against adversarial attacks more efficiently and with fewer complexities involved in modelling full AI agent behaviours.

The initiative is a collaboration between specialists from cyber security company Check Point, AI security platform Lakera, and researchers from the UK AI Security Institute. The benchmark is available as open-source and is designed to push AI security forward as a measurable standard.

The benchmark leverages a dataset of 19,433 crowdsourced adversarial attacks collected via the red teaming simulator game, Gandalf: Agent Breaker. The attacks in the dataset target a range of vulnerabilities, including system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service, and unauthorised tool calls.

According to the statement from Lakera, the b3 benchmark, “combines 10 representative agent ‘threat snapshots’ with a high-quality dataset of 19,433 crowdsourced adversarial attacks collected via the gamified red teaming game, Gandalf: Agent Breaker. It evaluates susceptibility to attacks such as system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service, and unauthorized tool calls.”

Security findings

Initial testing of b3 involved 31 popular LLMs, providing a broad assessment of the current landscape of model security. According to the initial results, several key insights have emerged. Enhanced reasoning abilities within the models significantly improved overall security, but there was no clear correlation between model size and security performance.

The findings also noted that while closed-source models generally outperformed open-weight models, the performance gap is narrowing as top open-source models improve. The results indicate that both open and closed models share more similarities in their security profiles than previously expected.

“We built the b3 benchmark because today’s AI agents are only as secure as the LLMs that power them. Threat Snapshots allow us to systematically surface vulnerabilities that have until now remained hidden in complex agent workflows. By making this benchmark open to the world, we hope to equip developers and model providers with a realistic way to measure, and improve, their security posture.”

This statement was made by Mateo Rojas-Carulla, Co-Founder and Chief Scientist at Lakera.

Simulation and data collection

The Gandalf: Agent Breaker game underpins much of the dataset and attack scenarios used in the b3 benchmark.

The simulation platform challenges users to exploit vulnerabilities in AI agents across a range of applications designed to reflect real-world usage. The game includes ten generative AI applications, each offering multiple difficulty levels, layered defences, and different attack surfaces relevant to contemporary security concerns such as code execution, file processing, and prompt manipulation.

Gandalf initially emerged from Lakera’s internal hackathon, where teams competed to uncover and defend against vulnerabilities in LLMs. Since its introduction, Gandalf has generated over 80 million data points through a global community focused on red teaming AI technology and has played a role in exposing real-world weaknesses in generative AI applications.

The methodology behind b3 and tools like Gandalf are intended to raise awareness of security in AI agent development and to encourage best practices in model deployment and maintenance. By providing open access to benchmarks and threat data, the consortium behind b3 aims to foster wider engagement from AI developers, researchers, and platform providers.

Lakera, which was founded in 2021, develops AI-native security solutions and was acquired by Check Point in 2025. Check Point continues to focus on integrating AI-powered security capabilities into its portfolio for organisations globally.

Open-source b3 framework to benchmark AI agent security unveiled

Tags: