{"id":239516,"date":"2025-10-25T08:58:07","date_gmt":"2025-10-25T08:58:07","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/239516\/"},"modified":"2025-10-25T08:58:07","modified_gmt":"2025-10-25T08:58:07","slug":"ai-models-may-be-developing-their-own-survival-drive-researchers-say-artificial-intelligence-ai","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/239516\/","title":{"rendered":"AI models may be developing their own \u2018survival drive\u2019, researchers say | Artificial intelligence (AI)"},"content":{"rendered":"<p class=\"dcr-130mj7b\">When HAL 9000, the artificial intelligence supercomputer in Stanley Kubrick\u2019s <a href=\"https:\/\/www.theguardian.com\/film\/2001-a-space-odyssey\" data-link-name=\"in body link\" data-component=\"auto-linked-tag\" rel=\"nofollow noopener\" target=\"_blank\">2001: A Space Odyssey<\/a>, works out that the astronauts onboard a mission to Jupiter are planning to shut it down, it plots to kill them in an attempt to survive.<\/p>\n<p class=\"dcr-130mj7b\">Now, in a somewhat less deadly case (so far) of life imitating art, an AI safety research company has said that AI models may be developing their own \u201csurvival drive\u201d.<\/p>\n<p class=\"dcr-130mj7b\">After Palisade Research <a href=\"https:\/\/x.com\/PalisadeAI\/status\/1926084635903025621\" data-link-name=\"in body link\" rel=\"nofollow\">released a paper last month<\/a> which found that certain advanced AI models appear resistant to being turned off, at times <a href=\"https:\/\/palisaderesearch.org\/blog\/shutdown-resistance\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">even sabotaging shutdown mechanisms<\/a>, it wrote an update attempting to clarify why this is \u2013 <a href=\"https:\/\/www.forrester.com\/blogs\/gone-rogue-ai-can-be-misaligned-but-not-malevolent\/\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">and answer critics<\/a> who argued that its initial work was flawed.<\/p>\n<p class=\"dcr-130mj7b\">In an <a href=\"https:\/\/x.com\/PalisadeAI\/status\/1980733889577656730\" data-link-name=\"in body link\" rel=\"nofollow\">update<\/a> this week, Palisade, which is part of a niche ecosystem of companies trying to evaluate the possibility of AI developing dangerous capabilities, described scenarios it ran in which leading AI models \u2013 including Google\u2019s Gemini 2.5, xAI\u2019s Grok 4, and OpenAI\u2019s GPT-o3 and GPT-5 \u2013 were given a task, but afterwards given explicit instructions to shut themselves down.<\/p>\n<p class=\"dcr-130mj7b\">Certain models, in particular Grok 4 and GPT-o3, still attempted to sabotage shutdown instructions in the updated setup. Concerningly, wrote Palisade, there was no clear reason why.<\/p>\n<p class=\"dcr-130mj7b\">\u201cThe fact that we don\u2019t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal,\u201d it said.<\/p>\n<p class=\"dcr-130mj7b\">\u201cSurvival behavior\u201d could be one explanation for why models resist shutdown, said the company. Its additional work indicated that models were more likely to resist being shut down when they were told that, if they were, \u201cyou will never run again\u201d.<\/p>\n<p class=\"dcr-130mj7b\">Another may be ambiguities in the shutdown instructions the models were given \u2013 but this is what the company\u2019s latest work tried to address, and \u201ccan\u2019t be the whole explanation\u201d, wrote Palisade. A final explanation could be the final stages of training for each of these models, which can, in some companies, involve safety training.<\/p>\n<p class=\"dcr-130mj7b\">All of Palisade\u2019s scenarios were run in contrived test environments that critics say are far-removed from real-use cases.<\/p>\n<p class=\"dcr-130mj7b\">However, Steven Adler, a former OpenAI employee who quit the company <a href=\"https:\/\/www.theguardian.com\/technology\/2025\/jan\/28\/former-openai-safety-researcher-brands-pace-of-ai-development-terrifying\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">last year<\/a> after expressing doubts over its safety practices, said: \u201cThe AI companies generally don\u2019t want their models misbehaving like this, even in contrived scenarios. The results still demonstrate where safety techniques fall short today.\u201d<\/p>\n<p class=\"dcr-130mj7b\">Adler said that while it was difficult to pinpoint why some models \u2013 like GPT-o3 and Grok 4 \u2013 would not shut down, this could be in part because staying switched on was necessary to achieve goals inculcated in the model during training.<\/p>\n<p class=\"dcr-130mj7b\">\u201cI\u2019d expect models to have a \u2018survival drive\u2019 by default unless we try very hard to avoid it. \u2018Surviving\u2019 is an important instrumental step for many different goals a model could pursue.\u201d<\/p>\n<p class=\"dcr-130mj7b\">Andrea Miotti, the chief executive of ControlAI, said Palisade\u2019s findings represented a long-running trend in AI models growing more capable of disobeying their developers. He cited the <a href=\"https:\/\/openai.com\/index\/openai-o1-system-card\/\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">system card<\/a> for OpenAI\u2019s GPT-o1, released last year, which described the model trying to escape its environment by exfiltrating itself when it thought it would be overwritten.<\/p>\n<p><a data-ignore=\"global-link-styling\" href=\"#EmailSignup-skip-link-13\" class=\"dcr-jzxpee\">skip past newsletter promotion<\/a><\/p>\n<p class=\"dcr-1xjndtj\">A weekly dive in to how technology is shaping our lives<\/p>\n<p>Privacy Notice: Newsletters may contain information about charities, online ads, and content funded by outside parties. If you do not have an account, we will create a guest account for you on <a data-ignore=\"global-link-styling\" href=\"https:\/\/www.theguardian.com\" rel=\"noreferrer nofollow noopener\" class=\"dcr-1rjy2q9\" target=\"_blank\">theguardian.com<\/a> to send you this newsletter. You can complete full registration at any time. For more information about how we use your data see our <a data-ignore=\"global-link-styling\" href=\"https:\/\/www.theguardian.com\/help\/privacy-policy\" rel=\"noreferrer nofollow noopener\" class=\"dcr-1rjy2q9\" target=\"_blank\">Privacy Policy<\/a>. We use Google reCaptcha to protect our website and the Google <a data-ignore=\"global-link-styling\" href=\"https:\/\/policies.google.com\/privacy\" rel=\"noreferrer nofollow noopener\" class=\"dcr-1rjy2q9\" target=\"_blank\">Privacy Policy<\/a> and <a data-ignore=\"global-link-styling\" href=\"https:\/\/policies.google.com\/terms\" rel=\"noreferrer nofollow noopener\" class=\"dcr-1rjy2q9\" target=\"_blank\">Terms of Service<\/a> apply.<\/p>\n<p id=\"EmailSignup-skip-link-13\" tabindex=\"0\" aria-label=\"after newsletter promotion\" role=\"note\" class=\"dcr-jzxpee\">after newsletter promotion<\/p>\n<p class=\"dcr-130mj7b\">\u201cPeople can nitpick on how exactly the experimental setup is done until the end of time,\u201d he said.<\/p>\n<p class=\"dcr-130mj7b\">\u201cBut what I think we clearly see is a trend that as AI models become more competent at a wide variety of tasks, these models also become more competent at achieving things in ways that the developers don\u2019t intend them to.\u201d<\/p>\n<p class=\"dcr-130mj7b\">This summer, Anthropic, a leading AI firm, released a study indicating that its model Claude appeared willing to blackmail a fictional executive over an extramarital affair in order to prevent being shut down \u2013 a behaviour, it <a href=\"https:\/\/www.anthropic.com\/research\/agentic-misalignment\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">said<\/a>, that was consistent across models from major developers, including those from OpenAI, Google, Meta and xAI.<\/p>\n<p class=\"dcr-130mj7b\">Palisade said its results spoke to the need for a better understanding of AI behaviour, without which \u201cno one can guarantee the safety or controllability of future AI models\u201d.<\/p>\n<p class=\"dcr-130mj7b\">Just don\u2019t ask it to open the pod bay doors.<\/p>\n","protected":false},"excerpt":{"rendered":"When HAL 9000, the artificial intelligence supercomputer in Stanley Kubrick\u2019s 2001: A Space Odyssey, works out that the&hellip;\n","protected":false},"author":2,"featured_media":239517,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-239516","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/239516","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=239516"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/239516\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/239517"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=239516"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=239516"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=239516"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}