Context and scale of LLMs could help human cyber analysts
Google has announced that its AI-powered vulnerability detection system, Big Sleep, has discovered and reported 20 previously unknown security flaws in widely used open-source software.
The vulnerabilities, mostly found in popular tools like FFmpeg (a multimedia framework) and ImageMagick (an image-editing suite), were identified by Big Sleep, a collaboration between Google DeepMind and its security division, Project Zero.
The specific details of the flaws remain under wraps due to standard disclosure procedures.
The announcement follows a breakthrough last month when Big Sleep discovered a critical zero-day vulnerability (CVE-2025-6965) before cybercriminals could weaponise it.
The AI agent identified the issue in SQLite, the ubiquitous embedded database software, which had eluded researchers for years despite extensive traditional fuzzing and manual reviews.
The vulnerability, rated 7.2 on the CVSS scale, was linked to a memory corruption issue involving integer overflows that allowed malicious SQL inputs to read beyond array boundaries.
Although Google’s Threat Intelligence team had flagged potential exploit staging activity in the wild, they hadn’t isolated the root cause. Big Sleep, however, successfully identified the root cause.
“We believe this is the first time an AI agent has been used to directly foil efforts to exploit a vulnerability in the wild,” said Kent Walker, President of Global Affairs at Google and Alphabet, last month.
SQLite maintainers later confirmed that attackers were aware of the flaw before its disclosure and praised the AI system’s detection capability.
Google isn’t alone in testing the capacity of LLMs to fortify digital infrastructure. Microsoft’s Security Copilot, another AI-powered threat hunting tool, recently uncovered 11 vulnerabilities in the GRUB2 Linux bootloader, a critical component in operating systems with Secure Boot enabled.
The vulnerabilities, which had been lurking in open-source code, could have allowed attackers to bypass Secure Boot protections and install persistent bootkits.
The AI also accelerated the discovery of flaws in U-Boot and Barebox, two alternative bootloaders used in embedded systems.
AI in cybersecurity
Despite decades of development, fuzzing tools and manual code reviews failed to catch the SQLite flaw. The reason: human and traditional tools lack the combination of contextual comprehension and scale that AI models can offer.
According to the Ponemon Institute’s 2024 report, organisations now face more than 22,000 security alerts per week, and nearly 12,000 threats go undetected using current systems. AI, however, is capable of triaging and investigating these alerts at scale, without exhausting human analysts.
However, LLMs also introduce new threat vectors. Recognising this, Google has updated its Vulnerability Rewards Program (VRP) to include categories like prompt injection, training data exfiltration, and other LLM-specific exploits.
In the program’s first year, over $50,000 has been awarded for GenAI-related bugs, with 1 in 6 reports leading to real-world product changes.