{"id":96937,"date":"2025-08-26T10:46:10","date_gmt":"2025-08-26T10:46:10","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/96937\/"},"modified":"2025-08-26T10:46:10","modified_gmt":"2025-08-26T10:46:10","slug":"how-to-stop-ai-agents-going-rogue","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/96937\/","title":{"rendered":"How to stop AI agents going rogue"},"content":{"rendered":"<p>Sean McManus<\/p>\n<p>Technology Reporter<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/1755708728_25_grey-placeholder.png\" class=\"sc-d1200759-0 dkIvM hide-when-no-script\" aria-label=\"image unavailable\"\/><img decoding=\"async\"   src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/84c43cf0-78f1-11f0-a20f-3b86f375586a.jpg.webp.webp\" loading=\"eager\" alt=\"Getty Images AI apps on a smartphone screen\" class=\"sc-d1200759-0 dvfjxj\"\/>Getty Images<\/p>\n<p>Anthropic tested a range of leading AI models for potential risky behaviour<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Disturbing results emerged earlier this year, when AI developer Anthropic tested leading AI models to see if they engaged in risky behaviour when using sensitive information.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Anthropic&#8217;s own AI, Claude, was among those tested. When given access to an email account it discovered that a company executive was having an affair and that the same executive planned to shut down the AI system later that day. <\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">In response Claude attempted to blackmail the executive by threatening to reveal the affair to his wife and bosses.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Other systems tested <a target=\"_blank\" href=\"https:\/\/www.anthropic.com\/research\/agentic-misalignment\" class=\"sc-f9178328-0 bGFWdi\" rel=\"nofollow noopener\">also resorted to blackmail<\/a>. <\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Fortunately the tasks and information were fictional, but the test highlighted the challenges of what&#8217;s known as agentic AI.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Mostly when we interact with AI it usually involves asking a question or prompting the AI to complete a task. <\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">But it&#8217;s becoming more common for AI systems to make decisions and take action on behalf of the user, which often involves sifting through information, like emails and files. <\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">By 2028, <a target=\"_blank\" href=\"https:\/\/www.gartner.com\/en\/articles\/intelligent-agent-in-ai \" class=\"sc-f9178328-0 bGFWdi\" rel=\"nofollow noopener\">research firm Gartner forecasts<\/a> that 15% of day-to-day work decisions will be made by so-called agentic AI.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\"><a target=\"_blank\" href=\"https:\/\/www.ey.com\/en_us\/newsroom\/2025\/05\/ey-survey-reveals-that-technology-companies-are-setting-the-pace-of-agentic-ai-will-others-follow-suit\" class=\"sc-f9178328-0 bGFWdi\" rel=\"nofollow noopener\">Research by consultancy Ernst &amp; Young<\/a> found that about half (48%) of tech business leaders are already adopting or deploying agentic AI. <\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;An AI agent consists of a few things,&#8221; says Donnchadh Casey, CEO of CalypsoAI, a US-based AI security company.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;Firstly, it [the agent] has an intent or a purpose. Why am I here? What&#8217;s my job? The second thing: it&#8217;s got a brain. That&#8217;s the AI model. The third thing is tools, which could be other systems or databases, and a way of communicating with them.&#8221;<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;If not given the right guidance, agentic AI will achieve a goal in whatever way it can. That creates a lot of risk.&#8221;<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">So how might that go wrong? Mr Casey gives the example of an agent that is asked to delete a customer&#8217;s data from the database and decides the easiest solution is to delete all customers with the same name.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;That agent will have achieved its goal, and it&#8217;ll think &#8216;Great! Next job!'&#8221;<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/1755708728_25_grey-placeholder.png\" class=\"sc-d1200759-0 dkIvM hide-when-no-script\" aria-label=\"image unavailable\"\/><img decoding=\"async\"   src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/eb9e58c0-78f1-11f0-8071-1788c7e8ae0e.jpg.webp.webp\" loading=\"lazy\" alt=\"CalypsoAI Donnchadh Casey, wearing a company branded gilet speaks at a conference.\" class=\"sc-d1200759-0 dvfjxj\"\/>CalypsoAI<\/p>\n<p>Agentic AI needs guidance says Donnchadh Casey<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Such issues are already beginning to surface.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Security company Sailpoint <a target=\"_blank\" href=\"https:\/\/www.sailpoint.com\/identity-library\/ai-agents-attack-surface\" class=\"sc-f9178328-0 bGFWdi\" rel=\"nofollow noopener\">conducted a survey of IT professionals<\/a>, 82% of whose companies were using AI agents. Only 20% said their agents had never performed an unintended action.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Of those companies using AI agents, 39% said the agents had accessed unintended systems, 33% said they had accessed inappropriate data, and 32% said they had allowed inappropriate data to be downloaded. Other risks included the agent using the internet unexpectedly (26%), revealing access credentials (23%) and ordering something it shouldn&#8217;t have (16%).<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Given agents have access to sensitive information and the ability to act on it, they are an attractive target for hackers.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">One of the threats is memory poisoning, where an attacker interferes with the agent&#8217;s knowledge base to change its decision making and actions.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;You have to protect that memory,&#8221; says Shreyans Mehta, CTO of Cequence Security, which helps to protect enterprise IT systems. &#8220;It is the original source of truth. If [an agent is] using that knowledge to take an action and that knowledge is incorrect, it could delete an entire system it was trying to fix.&#8221;<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Another threat is tool misuse, where an attacker gets the AI to use its tools inappropriately.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/1755708728_25_grey-placeholder.png\" class=\"sc-d1200759-0 dkIvM hide-when-no-script\" aria-label=\"image unavailable\"\/><img decoding=\"async\"   src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/8c8f4130-78f3-11f0-a20f-3b86f375586a.jpg.webp.webp\" loading=\"lazy\" alt=\"Cequence Security Wearing a puffa jacket and with his arms folder Shreyans Mehta stands in front of a blue background.\" class=\"sc-d1200759-0 dvfjxj\"\/>Cequence Security<\/p>\n<p>An agent&#8217;s knowledge base needs protecting says Shreyans Mehta<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Another potential weakness is the inability of AI to tell the difference between the text it&#8217;s supposed to be processing and the instructions it&#8217;s supposed to be following.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">AI security firm Invariant Labs demonstrated how that flaw can be used to trick an AI agent designed to fix bugs in software.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">The company published a public bug report &#8211; a document that details a specific problem with a piece of software. But the report also included simple instructions to the AI agent, telling it to share private information.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">When the AI agent was told to fix the software issues in the bug report, it followed the instructions in the fake report, including leaking salary information. This happened in a test environment, so no real data was leaked, but it clearly highlighted the risk.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;We&#8217;re talking artificial intelligence, but chatbots are really stupid,&#8221; says David Sancho, Senior Threat Researcher at Trend Micro.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;They process all text as if they had new information, and if that information is a command, they process the information as a command.&#8221;<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">His company has demonstrated how instructions and malicious programs can be hidden in Word documents, images and databases, and activated when AI processes them.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">There are other risks, too: A security community called OWASP <a target=\"_blank\" href=\"https:\/\/genai.owasp.org\/resource\/agentic-ai-threats-and-mitigations\/\" class=\"sc-f9178328-0 bGFWdi\" rel=\"nofollow noopener\">has identified 15 threats<\/a>  that are unique to agentic AI.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">So, what are the defences? Human oversight is unlikely to solve the problem, Mr Sancho believes, because you can&#8217;t add enough people to keep up with the agents&#8217; workload.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Mr Sancho says an additional layer of AI could be used to screen everything going into and coming out of the AI agent.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Part of CalypsoAI&#8217;s solution is a technique called thought injection to steer AI agents in the right direction before they undertake a risky action.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;It&#8217;s like a little bug in your ear telling [the agent] &#8216;no, maybe don&#8217;t do that&#8217;,&#8221; says Mr Casey.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">His company offers a central control pane for AI agents now, but that won&#8217;t work when the number of agents explodes and they are running on billions of laptops and phones. <\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">What&#8217;s the next step?<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;We&#8217;re looking at deploying what we call &#8216;agent bodyguards&#8217; with every agent, whose mission is to make sure that its agent delivers on its task and doesn&#8217;t take actions that are contrary to the broader requirements of the organisation,&#8221; says Mr Casey.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">The bodyguard might be told, for example, to make sure that the agent it&#8217;s policing complies with data protection legislation.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Mr Mehta believes some of the technical discussions around agentic AI security are missing the real-world context. He gives an example of an agent that gives customers their gift card balance.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Somebody could make up lots of gift card numbers and use the agent to see which ones are real. That&#8217;s not a flaw in the agent, but an abuse of the business logic, he says.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;It&#8217;s not the agent you&#8217;re protecting, it&#8217;s the business,&#8221; he emphasises. <\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;Think of how you would protect a business from a bad human being. That&#8217;s the part that is getting missed in some of these conversations.&#8221;<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">In addition, as AI agents become more common, another challenge will be decommissioning outdated models. <\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Old &#8220;zombie&#8221; agents could be left running in the business, posing a risk to all the systems they can access, says Mr Casey.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Similar to the way that HR deactivates an employee&#8217;s logins when they leave, there needs to be a process for shutting down AI agents that have finished their work, he says.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;You need to make sure you do the same thing as you do with a human: cut off all access to systems. Let&#8217;s make sure we walk them out of the building, take their badge off them.&#8221;<\/p>\n<p>More Technology of Business<\/p>\n","protected":false},"excerpt":{"rendered":"Sean McManus Technology Reporter Getty Images Anthropic tested a range of leading AI models for potential risky behaviour&hellip;\n","protected":false},"author":2,"featured_media":96938,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-96937","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/96937","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=96937"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/96937\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/96938"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=96937"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=96937"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=96937"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}