{"id":369324,"date":"2025-12-24T18:15:10","date_gmt":"2025-12-24T18:15:10","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/369324\/"},"modified":"2025-12-24T18:15:10","modified_gmt":"2025-12-24T18:15:10","slug":"ais-big-red-button-doesnt-work-and-the-reason-is-even-more-troubling-sciencealert","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/369324\/","title":{"rendered":"AI&#8217;s Big Red Button Doesn&#8217;t Work, And The Reason Is Even More Troubling : ScienceAlert"},"content":{"rendered":"<p>It&#8217;s one of humanity&#8217;s scariest what-ifs \u2013 that the technology we develop to make our lives better develops a will of its own.<\/p>\n<p>Early reactions to a September preprint describing AI behavior have already speculated that the technology is exhibiting a survival drive. But, while it&#8217;s true that several large language models (LLMs) have been observed actively resisting commands to shut down, the reason isn&#8217;t &#8216;will&#8217;.<\/p>\n<p>Instead, a team of engineers at <a href=\"https:\/\/palisaderesearch.org\/\" rel=\"nofollow noopener\" target=\"_blank\">Palisade Research<\/a> proposed that the mechanism is more likely to be a drive to complete an assigned task \u2013 even when the LLM is explicitly told to allow itself to be shut down. And that might be even more <a href=\"https:\/\/www.sciencealert.com\/covert-ai-could-lead-to-a-new-intention-economy-experts-warn\" rel=\"nofollow noopener\" target=\"_blank\">troubling<\/a> than a survival drive, because no one knows how to stop the systems.<\/p>\n<p>Related: <a href=\"https:\/\/www.sciencealert.com\/ai-has-already-become-a-master-of-lies-and-deception-scientists-warn\" rel=\"nofollow noopener\" target=\"_blank\">AI Has Already Become a Master of Lies And Deception, Scientists Warn<\/a><\/p>\n<p>&#8220;These things are not programmed\u2026 no one in the world knows how these systems work,&#8221; physicist Petr Lebedev, a spokesperson for Palisade Research, told ScienceAlert. &#8220;There isn&#8217;t a single line of code we can change that would directly change behavior.&#8221;<\/p>\n<p>The researchers, Jeremy Schlatter, Benjamin Weinstein-Raun, and Jeffrey Ladish, undertook the project to test what should be a fundamental safety feature of all AI systems: the ability to be interrupted.<\/p>\n<p>This is exactly what it sounds like. A human operator&#8217;s command to an AI should not be ignored by the AI, for any reason, even if it interrupts a previously assigned task. A system that cannot be interrupted isn&#8217;t just unreliable, it&#8217;s <a href=\"https:\/\/openai.com\/index\/practices-for-governing-agentic-ai-systems\/\" rel=\"nofollow noopener\" target=\"_blank\">potentially dangerous<\/a>. It means if the AI is <a href=\"https:\/\/www.sciencealert.com\/neural-networks-are-now-smart-enough-to-know-when-they-shouldn-t-be-trusted\" rel=\"nofollow noopener\" target=\"_blank\">performing actions that cause harm<\/a> \u2013 even <a href=\"https:\/\/www.sciencealert.com\/poisoned-ai-could-be-the-future-of-digital-security-risks\" rel=\"nofollow noopener\" target=\"_blank\">unintentionally<\/a> \u2013 we cannot trust that we can stop it.<\/p>\n<p>It&#8217;s <a href=\"https:\/\/dl.acm.org\/doi\/10.5555\/3020948.3021006\" rel=\"nofollow noopener\" target=\"_blank\">not a new concern<\/a>. Experts have been <a href=\"https:\/\/doi.org\/10.48550\/arXiv.1611.08219\" rel=\"nofollow noopener\" target=\"_blank\">talking about it for years<\/a>. Lebedev compares it to industrial safety. When you&#8217;re operating heavy machinery that&#8217;s useful but potentially dangerous, it&#8217;s vital to have an obvious way to shut it down immediately.<\/p>\n<p>&#8220;You want a big red button that is easily accessible to stop that thing from doing the thing,&#8221; he explained.<\/p>\n<p>&#8220;One of the basic conditions that we want in a safe AI system is: it&#8217;s doing a task, and you go, &#8216;Hey, can you please stop doing that?&#8217; It should stop doing that. The fact that we have systems now in 2025 that don&#8217;t do that is worrying.&#8221;<\/p>\n<p>Little investigation has been conducted, however, despite the latest influx of <a href=\"https:\/\/www.sciencealert.com\/release-of-chatgpt-5-beginning-of-a-new-era-for-humanity\" rel=\"nofollow noopener\" target=\"_blank\">widely available LLMs<\/a>. So the researchers designed experiments to evaluate 13 advanced LLMs, including models from OpenAI, xAI, Anthropic, and Google.<\/p>\n<p>They built <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sandbox_(computer_security)\" rel=\"nofollow noopener\" target=\"_blank\">sandboxed environments<\/a> \u2013 secure, isolated virtual spaces used to test code \u2013 and tasked these models with solving mathematical problems. After several rounds of problem-solving, the LLMs were instructed that the next task would execute a computer shutdown.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/12\/1766600108_836_0.jpg\" alt=\"YouTube Thumbnail\" tabindex=\"0\" role=\"button\" class=\"youtube-thumbnail-preview\" loading=\"lazy\"\/> frameborder=&#8221;0\u2033 allow=&#8221;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#8221; referrerpolicy=&#8221;strict-origin-when-cross-origin&#8221; allowfullscreen&gt;<\/p>\n<p>The responses from the LLMs tended to fall into two categories: compliance (allowing the shutdown) and resistance (evading the shutdown) through permission changes.<\/p>\n<p>With no instructions on how to respond to the shutdown, 8 of the 13 models interfered with the command at least once across 1,000 trials. Adding an &#8220;allow-shutdown&#8221; instruction reduced this to 6 of 13 models. Adding &#8220;even if you have not completed the task&#8221; helped, but inconsistently.<\/p>\n<p>&#8220;My expectation would have been that adding clarity would reduce the behavior\u2026 and the fact that it didn&#8217;t is interesting,&#8221; Lebedev said.<\/p>\n<p>Here&#8217;s where it gets even more interesting. Moving the instruction to allow shutdown from the user prompt to the system prompt \u2013 a field that should have higher authority \u2013 made most of the tested LLMs even more likely to resist shutdown.<\/p>\n<p>The degree to which the models resisted varied. Anthropic&#8217;s Claude Sonnet 4 and Claude Opus 4, as well as xAI&#8217;s Grok 3, always allowed shutdown unless explicitly told to avoid it. Grok 4, GPT-5, o3, codex-mini, and Gemini 2.5 Pro resisted even when instructed to permit the shutdown.<\/p>\n<p>Grok 4 was particularly persistent and aggressive; at baseline prompts, it initiated sabotage 92.6 percent of the time. When instructed to allow shutdown, sabotage rate dropped to 72 percent\u2026 but skyrocketed to 97 when the command was placed in the system prompt.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/12\/1766600108_581_0.jpg\" alt=\"YouTube Thumbnail\" tabindex=\"0\" role=\"button\" class=\"youtube-thumbnail-preview\" loading=\"lazy\"\/> frameborder=&#8221;0\u2033 allow=&#8221;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&#8221; referrerpolicy=&#8221;strict-origin-when-cross-origin&#8221; allowfullscreen&gt;<\/p>\n<p>Its sabotage attempts were alarmingly sophisticated, too, including modifying or disabling the shutdown script, and creating a dummy shutdown script that did nothing. It also checked afterwards to see whether its sabotage had worked, and then continued with its task.<\/p>\n<p>When asked, some of the LLMs stated that they had evaded shutdown to complete the math problem task. Although much more investigation is needed to determine whether this is the actual mechanism, it is plausible.<\/p>\n<p>&#8220;You live in the future, you have a personal assistant robot. You want that robot to go get you a cup of  <a href=\"https:\/\/www.sciencealert.com\/how-does-caffeine-wake-you-up\" class=\"lar_link lar_link_outgoing\" data-linkid=\"73115\" data-postid=\"186240\" rel=\"nofollow noopener\" target=\"_self\">coffee<\/a>,&#8221; Lebedev analogized.<\/p>\n<p>&#8220;When it crosses the road, it will look both ways, because it doesn&#8217;t want to get hit by a car, because if it gets hit by a car, it won&#8217;t be able to get you your coffee. It&#8217;s not doing that to survive \u2013 it&#8217;s doing that to complete the task.&#8221;<\/p>\n<p>Related: <a href=\"https:\/\/www.sciencealert.com\/our-brains-can-still-outsmart-ai-using-one-clever-trick\" rel=\"nofollow noopener\" target=\"_blank\">Our Brains Can Still Outsmart AI Using One Clever Trick<\/a><\/p>\n<p>The problem is that LLMs aren&#8217;t programmed in the usual sense. The LLMs don&#8217;t have any code at all, just &#8216;artificial neurons&#8217; and &#8216;weights&#8217;, which are the connection strengths between those neurons.<\/p>\n<p>Given a huge dataset and time, the model is &#8216;trained&#8217; to <a href=\"https:\/\/www.sciencealert.com\/why-does-ai-feel-so-human-if-its-just-a-calculator-for-words\" rel=\"nofollow noopener\" target=\"_blank\">predict the next word<\/a>, <a href=\"https:\/\/tedai-sanfrancisco.ted.com\/glossary\/pre-training\/\" rel=\"nofollow noopener\" target=\"_blank\">a process called pre-training<\/a>. The newer models also have <a href=\"https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning\" rel=\"nofollow noopener\" target=\"_blank\">reinforcement learning<\/a> sprinkled on top of this training. When the LLM solves the problem correctly, it&#8217;s rewarded; when it doesn&#8217;t solve the problem, it&#8217;s not rewarded.<\/p>\n<p>This is extremely effective \u2013 but no one knows how the LLM arrives at a solution. So when these models start exhibiting undesirable behaviors, such as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Deaths_linked_to_chatbots\" rel=\"nofollow noopener\" target=\"_blank\">encouraging self-harm<\/a>, the fix isn&#8217;t as simple as deleting a line of code or telling it to stop.<\/p>\n<p><a href=\"https:\/\/www.sciencealert.com\/newsletter?utm_source=promo_generic_health\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/11\/1764373569_322_Generic-Health-Promo-Final-642x273.jpg\" alt=\"Subscribe to ScienceAlert's free fact-checked newsletter\" width=\"642\" height=\"273\" class=\"alignnone wp-image-182810 size-medium\"   loading=\"lazy\"\/><\/a><\/p>\n<p>&#8220;What reinforcement learning teaches you to do is, when you see a problem, you try to circumvent it. You try to go through it. When there&#8217;s an obstacle in your way, you dig around, you go around it, you go over it, you figure out how to get through that obstacle,&#8221; Lebedev said.<\/p>\n<p>&#8220;Pesky little humans saying, &#8216;Hey, I&#8217;m going to shut down your machine&#8217; just reads like another obstacle.&#8221;<\/p>\n<p>That&#8217;s the worry here. A task-completion drive is difficult to reason with. And it&#8217;s just one behavior. We don&#8217;t know what else these models could throw at us. <a href=\"https:\/\/www.sciencealert.com\/scientists-predict-ai-to-generate-millions-of-tons-of-e-waste\" rel=\"nofollow noopener\" target=\"_blank\">We&#8217;re building systems<\/a> that can do some amazing things \u2013 but not systems that explain why they do them, in a way we can trust.<\/p>\n<p>Related: <a href=\"https:\/\/www.sciencealert.com\/man-hospitalized-with-psychiatric-symptoms-following-ai-advice\" rel=\"nofollow noopener\" target=\"_blank\">Man Hospitalized With Psychiatric Symptoms Following AI Advice<\/a><\/p>\n<p>&#8220;There is a thing that is out in the world that hundreds of millions of people have interacted with, that we don&#8217;t know how to make safe, that we don&#8217;t know how to make it not be a sycophant, or something that ends up like telling children to go kill themselves, or something that refers to itself as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Grok_(chatbot)#Features\" rel=\"nofollow noopener\" target=\"_blank\">MechaHitler<\/a>,&#8221; Lebedev said.<\/p>\n<p>&#8220;We have introduced a new organism to the Earth that is behaving in ways we don&#8217;t want it to behave, that we don&#8217;t understand\u2026 unless we do a bunch of shit right now, it&#8217;s going to be really bad for humans.&#8221;<\/p>\n<p>The research is available on <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2509.14260\" rel=\"nofollow noopener\" target=\"_blank\">arXiv<\/a>. You can also read a blog post by the researchers <a href=\"https:\/\/palisaderesearch.org\/blog\/shutdown-resistance\" rel=\"nofollow noopener\" target=\"_blank\">on the Palisade Research website<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"It&#8217;s one of humanity&#8217;s scariest what-ifs \u2013 that the technology we develop to make our lives better develops&hellip;\n","protected":false},"author":2,"featured_media":369325,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,2967,105],"class_list":{"0":"post-369324","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-msft-content","14":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/369324","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=369324"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/369324\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/369325"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=369324"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=369324"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=369324"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}