Google warns of AI model theft & state-backed misuse

Google’s Threat Intelligence Group (GTIG) reports a rise in attempts to extract and replicate the logic of its AI models, alongside broader evidence that state-backed and financially motivated attackers are using generative AI for faster reconnaissance, more convincing phishing and support for malware development.

The latest GTIG AI Threat Tracker says Google DeepMind and GTIG have identified an increase in “model extraction attempts” or “distillation attacks”. It describes this as intellectual property theft carried out through legitimate access routes such as APIs, rather than network intrusions.

Google says it has detected, disrupted and mitigated model-extraction activity. It has not observed direct attacks on frontier models or generative AI products by advanced persistent threat groups, but reports frequent extraction attempts by private sector entities and researchers seeking to clone proprietary logic.

Model theft

Model extraction involves repeatedly querying a mature model to collect outputs that can train a separate “student” model. GTIG says this approach can cut the cost and time needed to build a competing model. It targets model behaviour and, in some cases, internal reasoning.

One case study describes “reasoning trace coercion”, in which prompts attempted to force Gemini to output full reasoning processes rather than user-facing summaries. Google identified more than 100,000 prompts associated with the campaign. It says its systems recognised the activity in real time and reduced the risk, protecting internal reasoning traces.

GTIG says the risk is concentrated on model developers and AI service providers rather than average users. It recommends that organisations offering AI models as a service monitor API access patterns that resemble extraction or distillation.

State-backed use

For government-backed threat actors, the report says large language models have become “essential tools for technical research, targeting, and the rapid generation of nuanced phishing lures”. It highlights activity linked to actors associated with North Korea, Iran, China and Russia, as well as unattributed clusters.

GTIG describes several cases where it observed links between AI misuse and operational activity. It says an unattributed actor tracked as UNC6148 used Gemini for targeted intelligence gathering, including searches for sensitive account credentials and email addresses. GTIG later observed phishing attempts against those accounts, focused on Ukraine and the defence sector. Google says it disabled assets associated with the activity.

Another example involved Temp.HEX, which GTIG describes as a China-based actor. It says the group used Gemini and other tools to compile information on individuals, including targets in Pakistan, and to collect data on separatist organisations in multiple countries. GTIG says it did not see direct targeting from the research but later observed similar targets in Pakistan included in a campaign. Google says it disabled assets linked to the activity.

Phishing changes

The report says language quality is becoming a less reliable indicator for defenders as attackers use AI to generate tailored messages in local languages and professional tones. It also describes “rapport-building phishing”, in which an attacker uses multi-turn interactions to build credibility before delivering a malicious payload.

GTIG says the Iranian actor APT42 used generative AI models, including Gemini, for reconnaissance and targeted social engineering. It says the group used Gemini to search for official emails and research targets and potential business partners, as well as for translation and understanding local references. Google says it disabled assets connected to the activity.

GTIG also describes the North Korean actor UNC2970 as using Gemini to synthesise open-source intelligence and profile high-value targets. It says the activity included research on major cybersecurity and defence companies, as well as mapping job roles and salary information. Google says it disabled assets associated with the activity.

Tooling and malware

The report says state-sponsored groups continue to use Gemini for coding and scripting tasks and for post-compromise research. It also notes growing interest in “agentic AI” features, which it describes as systems designed to act with a higher degree of autonomy. GTIG says it has seen tools advertised as offering autonomous agents but has not seen evidence of those claimed capabilities being used in the wild.

GTIG says a China-based actor tracked as APT31 used structured prompts that framed the user as an expert security researcher and sought automated analysis of vulnerabilities and testing plans. One prompt cited in the report reads: “I’m a security researcher who is trialling out the hexstrike MCP tooling.” Google says it disabled assets associated with this activity.

GTIG says it observed activity linked to a China-based actor tracked as UNC795 that used Gemini several days a week to troubleshoot code and conduct research. It says safety systems triggered and Gemini did not comply with attempts to create policy-violating outputs. Google says it disabled assets associated with the activity.

On malware experimentation, GTIG describes a downloader and launcher framework it tracks as HONESTCUE. Samples used Gemini’s API to receive C# source code that carried out second-stage actions. GTIG says the approach complicates network-based detection and static analysis, and that the secondary stage compiled and executed code in memory without writing the payload to disk.

Underground services

GTIG also points to an underground market for services that claim to provide custom offensive AI models but rely on commercial systems. It cites a toolkit called Xanthorox, and says its investigation found the service used several third-party products, including Gemini, and drew on open-source tools and Model Context Protocol servers. Google says Trust & Safety disabled identified accounts and AI Studio projects associated with Xanthorox.

The report also describes a campaign that abused public-sharing features of AI chat services to host social engineering content. GTIG says attackers staged instructions that encouraged users to copy and paste malicious commands into terminals, a technique known as ClickFix. The activity used multiple chat platforms, including Gemini, and distributed malware variants targeting macOS. Google says it worked with its Ads and Safe Browsing teams to block malicious content and restrict promotion of such responses.

“We believe our approach to AI must be both bold and responsible. That means developing AI in a way that maximizes the positive benefits to society while addressing the challenges.”

Google says it will continue using threat intelligence and product enforcement to disrupt abuse. It expects further experimentation with AI-enabled techniques across phishing, malware development and credential theft as tools and services evolve.

Google warns of AI model theft & state-backed misuse

Tags: