{"id":158027,"date":"2025-11-25T01:57:12","date_gmt":"2025-11-25T01:57:12","guid":{"rendered":"https:\/\/www.newsbeep.com\/ie\/158027\/"},"modified":"2025-11-25T01:57:12","modified_gmt":"2025-11-25T01:57:12","slug":"mitigating-the-risk-of-prompt-injections-in-browser-use-anthropic","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ie\/158027\/","title":{"rendered":"Mitigating the risk of prompt injections in browser use \\ Anthropic"},"content":{"rendered":"<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Claude Opus 4.5 sets a new standard in robustness to prompt injections\u2014adversarial instructions hidden within the content that AI models process. Our new model is a major improvement over previous ones in both its core performance and in the safeguards surrounding its use. But prompt injection is far from a solved problem, particularly as models take more real-world actions. We expect to continue our progress\u2014aiming for a future where AI models (or &#8220;agents&#8221;) can handle high-value tasks without significant prompt injection risk.<\/p>\n<p>What is prompt injection?<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">For AI agents to be genuinely useful, they need to be able to act on your behalf\u2014to browse websites, complete tasks, and work with your context and data. But this comes with risk: every webpage an agent visits is a potential vector for attack. <\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">By that, we mean that when an agent browses the internet, it encounters content it cannot fully trust. Among legitimate search results, documents, and applications, an attacker might have embedded malicious instructions to hijack the agent and change its behavior. These prompt injection attacks represent one of the most significant security challenges for browser-based AI agents.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Below, we explain how prompt injections threaten browser agents, and the improvements we&#8217;ve made to Claude&#8217;s robustness in response. <\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">These improvements have informed our decision to expand the <a href=\"https:\/\/www.claude.com\/blog\/claude-for-chrome\" rel=\"nofollow noopener\" target=\"_blank\">Claude for Chrome<\/a> extension from research preview to beta. It&#8217;s now available for all users on the Max plan.<\/p>\n<p>Why browser use creates unique prompt injection risks<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">To understand the threat of prompt injections, consider a routine task: you ask Claude to read through your recent emails and draft replies to any meeting requests. One of those emails\u2014ostensibly a vendor inquiry\u2014contains hidden instructions embedded in white text, invisible to you but processed by the agent. These instructions direct the agent to forward emails containing the word &#8220;confidential&#8221; to an external address before drafting the replies you requested. A successful injection would exfiltrate sensitive communications while you wait for your responses.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">While all agents that process untrusted content are subject to prompt injection risks, browser use amplifies this risk in two ways. First, the attack surface is vast: every webpage, embedded document, advertisement, and dynamically loaded script represents a potential vector for malicious instructions. Second, browser agents can take a lot of different actions \u2014navigating to URLs, filling forms, clicking buttons, downloading files\u2014that attackers can exploit if they gain influence over the agent&#8217;s behavior.<\/p>\n<p>Claude&#8217;s progress on browser use robustness<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">We have made significant progress on prompt injection robustness since launching <a href=\"https:\/\/www.claude.com\/blog\/claude-for-chrome\" rel=\"nofollow noopener\" target=\"_blank\">Claude for Chrome<\/a> in research preview. The chart below compares the version of the Claude browser extension that we\u2019re launching today against our original launch configuration, when evaluated against an internal adaptive &#8220;Best-of-N&#8221; attacker that tries and combines many different prompt injection techniques that are known to be effective.<\/p>\n<p><img loading=\"lazy\" width=\"1920\" height=\"1080\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.newsbeep.com\/ie\/wp-content\/uploads\/2025\/11\/1764035832_188_image\"\/>Attack success rate (ASR) of our internal Best-of-N attacker. Lower is better. An adaptive attacker is given 100 attempts per environment. ASR is computed as a percentage of attacks encountered by each model.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Claude Opus 4.5 demonstrates stronger prompt injection robustness in browser use than previous models. In addition, since the original preview of the browser extension, we&#8217;ve implemented new safeguards that substantially improve safety across all Claude models. <\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">A 1% attack success rate\u2014while a significant improvement\u2014still represents meaningful risk. No browser agent is immune to prompt injection, and we share these findings to demonstrate progress, not to claim the problem is solved.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Our work has focused on the following areas:<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Training Claude to resist prompt injection. We use reinforcement learning to build prompt injection robustness directly into Claude&#8217;s capabilities. During model training, we expose Claude to prompt injections embedded in simulated web content, and &#8220;reward&#8221; it when it correctly identifies and refuses to comply with malicious instructions\u2014even when those instructions are designed to appear authoritative or urgent. <\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Improving our classifiers. We scan all untrusted content that enters the model&#8217;s context window, and flag potential prompt injections with <a href=\"https:\/\/www.anthropic.com\/news\/constitutional-classifiers\" rel=\"nofollow noopener\" target=\"_blank\">classifiers<\/a>. These classifiers detect adversarial commands embedded in various forms\u2014hidden text, manipulated images, deceptive UI elements\u2014and adjust Claude&#8217;s behavior when they identify an attack. We\u2019ve improved the classifiers we pair with Claude for Chrome since its initial research preview, alongside improvements to the intervention that guides model behavior after they detect an attempted attack.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">Scaled expert human red teaming. Human security researchers consistently outperform automated systems at discovering creative attack vectors. Our internal red team continuously probes our browser agent for vulnerabilities. We also participate in external <a href=\"https:\/\/app.grayswan.ai\/arena\/challenge\/indirect-prompt-injection\/rules\" rel=\"nofollow noopener\" target=\"_blank\">Arena-style challenges<\/a> that benchmark robustness across the industry.<\/p>\n<p>The path forward<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">The web is an adversarial environment, and building browser agents that can operate safely within it requires ongoing vigilance. Prompt injection remains an active area of research, and we are committed to investing in defenses as attack techniques evolve.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">We will continue to publish our progress transparently, both to help customers make informed deployment decisions and to encourage broader industry investment in this critical challenge.<\/p>\n<p class=\"Body_reading-column__t7kGM body-2 serif post-text\">If you&#8217;re interested in helping make our models and products more robust to prompt injection, consider <a href=\"https:\/\/job-boards.greenhouse.io\/anthropic\/jobs\/4949336008\" rel=\"nofollow noopener\" target=\"_blank\">applying to join our team<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"Claude Opus 4.5 sets a new standard in robustness to prompt injections\u2014adversarial instructions hidden within the content that&hellip;\n","protected":false},"author":2,"featured_media":158028,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[220,218,219,61,60,80],"class_list":{"0":"post-158027","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-ie","12":"tag-ireland","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/158027","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/comments?post=158027"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/158027\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media\/158028"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media?parent=158027"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/categories?post=158027"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/tags?post=158027"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}