{"id":366543,"date":"2025-12-23T11:19:15","date_gmt":"2025-12-23T11:19:15","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/366543\/"},"modified":"2025-12-23T11:19:15","modified_gmt":"2025-12-23T11:19:15","slug":"openai-says-ai-browsers-may-always-be-vulnerable-to-prompt-injection-attacks","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/366543\/","title":{"rendered":"OpenAI says AI browsers may always be vulnerable to prompt injection attacks"},"content":{"rendered":"<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Even as OpenAI works to harden its <a href=\"https:\/\/techcrunch.com\/2025\/10\/21\/openai-launches-an-ai-powered-browser-chatgpt-atlas\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Atlas AI browser<\/a> against cyberattacks, the company admits that <a href=\"https:\/\/techcrunch.com\/2025\/09\/28\/wiz-chief-technologist-ami-luttwak-on-how-ai-is-transforming-cyberattacks\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">prompt injections<\/a>, a type of attack that manipulates AI agents to follow malicious instructions often hidden in web pages or emails, is a risk that\u2019s not going away anytime soon \u2014 raising questions about how safely AI agents can operate on the open web.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cPrompt injection, much like scams and social engineering on the web, is unlikely to ever be fully \u2018solved,\u2019\u201d OpenAI wrote in a Monday <a href=\"https:\/\/openai.com\/index\/hardening-atlas-against-prompt-injection\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">blog post<\/a> detailing how the firm is beefing up Atlas\u2019 armor to combat the unceasing attacks. The company conceded that \u201cagent mode\u201d in ChatGPT Atlas \u201cexpands the security threat surface.\u201d<\/p>\n<p class=\"wp-block-paragraph\">OpenAI launched its ChatGPT Atlas browser in October, and security researchers rushed to publish their demos, showing it was possible to write a few words in Google Docs that were capable of changing the underlying browser\u2019s behavior. That same day, Brave <a href=\"https:\/\/brave.com\/blog\/unseeable-prompt-injections\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">published a blog post<\/a> explaining that indirect prompt injection is a systematic challenge for AI-powered browsers, including <a href=\"https:\/\/techcrunch.com\/2025\/07\/09\/perplexity-launches-comet-an-ai-powered-web-browser\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Perplexity\u2019s Comet<\/a>.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">OpenAI isn\u2019t alone in recognizing that prompt-based injections aren\u2019t going away. The <a rel=\"nofollow noopener\" href=\"https:\/\/www.ncsc.gov.uk\/news\/mistaking-ai-vulnerability-could-lead-to-large-scale-breaches\" target=\"_blank\">U.K.\u2019s National Cyber Security Centre earlier this month warned<\/a> that prompt injection attacks against generative AI applications \u201cmay never be totally mitigated,\u201d putting websites at risk of falling victim to data breaches. The U.K. government agency advised cyber professionals to reduce the risk and impact of prompt injections, rather than think the attacks can be \u201cstopped.\u201d\u00a0<\/p>\n<p class=\"wp-block-paragraph\">For OpenAI\u2019s part, the company said: \u201cWe view prompt injection as a long-term AI security challenge, and we\u2019ll need to continuously strengthen our defenses against it.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The company\u2019s answer to this Sisyphean task? A proactive, rapid-response cycle that the firm says is showing early promise in helping discover novel attack strategies internally before they are exploited \u201cin the wild.\u201d\u00a0<\/p>\n<p class=\"wp-block-paragraph\">That\u2019s not entirely different from what rivals like Anthropic and Google have been saying: that to fight against the persistent risk of prompt-based attacks, defenses must be layered and continuously stress-tested. <a href=\"https:\/\/security.googleblog.com\/2025\/12\/architecting-security-for-agentic.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Google\u2019s recent work<\/a>, for example, focuses on architectural and policy-level controls for agentic systems.<\/p>\n<p class=\"wp-block-paragraph\">But where OpenAI is taking a different tact is with its \u201cLLM-based automated attacker.\u201d This attacker is basically a bot that OpenAI trained, using reinforcement learning, to play the role of a hacker that looks for ways to sneak malicious instructions to an AI agent. <\/p>\n<p class=\"wp-block-paragraph\">The bot can test the attack in simulation before using it for real, and the simulator shows how the target AI would think and what actions it would take if it saw the attack. The bot can then study that response, tweak the attack, and try again and again. That insight into the target AI\u2019s internal reasoning is something outsiders don\u2019t have access to, so, in theory, OpenAI\u2019s bot should be able to find flaws faster than a real-world attacker would.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">It\u2019s a common tactic in AI safety testing: build an agent to find the edge cases and test against them rapidly in simulation.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cOur [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,\u201d wrote OpenAI. \u201cWe also observed novel attack strategies that did not appear in our human red teaming campaign or external reports.\u201d<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1194\" height=\"674\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/12\/openai-prompt-injection-demo-2.png\" alt=\"a screenshot showing a prompt injection attack in an OpenAI browser.\" class=\"wp-image-3078403\"  \/>Image Credits:OpenAI<\/p>\n<p class=\"wp-block-paragraph\">In a demo (pictured in part above), OpenAI showed how its automated attacker slipped a malicious email into a user\u2019s inbox. When the AI agent later scanned the inbox, it followed the hidden instructions in the email and sent a resignation message instead of drafting an out-of-office reply. But following the security update, \u201cagent mode\u201d was able to successfully detect the prompt injection attempt and flag it to the user, according to the company.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The company says that while prompt injection is hard to secure against in a foolproof way, it\u2019s leaning on large-scale testing and faster patch cycles to harden its systems before they show up in real-world attacks.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">An OpenAI spokesperson declined to share whether the update to Atlas\u2019 security has resulted in a measurable reduction in successful injections, but says the firm has been working with third parties to harden Atlas against prompt injection since before launch.<\/p>\n<p class=\"wp-block-paragraph\">Rami McCarthy, principal security researcher at <a href=\"https:\/\/techcrunch.com\/2025\/09\/28\/wiz-chief-technologist-ami-luttwak-on-how-ai-is-transforming-cyberattacks\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">cybersecurity firm Wiz<\/a>, says that reinforcement learning is one way to continuously adapt to attacker behavior, but it\u2019s only part of the picture.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cA useful way to reason about risk in AI systems is autonomy multiplied by access,\u201d McCarthy told TechCrunch. <\/p>\n<p class=\"wp-block-paragraph\">\u201cAgentic browsers tend to sit in a challenging part of that space: moderate autonomy combined with very high access,\u201d said McCarthy. \u201cMany current recommendations reflect that trade-off. Limiting logged-in access primarily reduces exposure, while requiring review of confirmation requests constrains autonomy.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Those are two of OpenAI\u2019s recommendations for users to reduce their own risk, and a spokesperson said Atlas is also trained to get user confirmation before sending messages or making payments. OpenAI also suggests that users give agents specific instructions, rather than providing them access to your inbox and telling them to \u201ctake whatever action is needed.\u201d\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cWide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,\u201d per OpenAI.<\/p>\n<p class=\"wp-block-paragraph\">While OpenAI says protecting Atlas users against prompt injections is a top priority, McCarthy invites some skepticism as to the return on investment for risk-prone browsers.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cFor most everyday use cases, agentic browsers don\u2019t yet deliver enough value to justify their current risk profile,\u201d McCarthy told TechCrunch. \u201cThe risk is high given their access to sensitive data like email and payment information, even though that access is also what makes them powerful. That balance will evolve, but today the trade-offs are still very real.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"Even as OpenAI works to harden its Atlas AI browser against cyberattacks, the company admits that prompt injections,&hellip;\n","protected":false},"author":2,"featured_media":366544,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,144085,254,255,58822,64,63,139533,284,5044,198918,105],"class_list":{"0":"post-366543","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-ai-browser","10":"tag-artificial-intelligence","11":"tag-artificialintelligence","12":"tag-atlas","13":"tag-au","14":"tag-australia","15":"tag-chatgpt-atlas","16":"tag-cybersecurity","17":"tag-openai","18":"tag-prompt-injections","19":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/366543","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=366543"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/366543\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/366544"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=366543"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=366543"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=366543"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}