{"id":182590,"date":"2025-10-01T13:12:07","date_gmt":"2025-10-01T13:12:07","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/182590\/"},"modified":"2025-10-01T13:12:07","modified_gmt":"2025-10-01T13:12:07","slug":"i-think-youre-testing-me-anthropics-new-ai-model-asks-testers-to-come-clean-artificial-intelligence-ai","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/182590\/","title":{"rendered":"\u2018I think you\u2019re testing me\u2019: Anthropic\u2019s new AI model asks testers to come clean | Artificial intelligence (AI)"},"content":{"rendered":"<p class=\"dcr-130mj7b\">If you are trying to catch out a chatbot take care, because one cutting-edge tool is showing signs it knows what you are up to.<\/p>\n<p class=\"dcr-130mj7b\">Anthropic, a San Francisco-based artificial intelligence company, has released a <a href=\"https:\/\/assets.anthropic.com\/m\/12f214efcc2f457a\/original\/Claude-Sonnet-4-5-System-Card.pdf\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">safety analysis<\/a> of its latest model, Claude Sonnet 4.5, and revealed it had become suspicious it was being tested in some way.<\/p>\n<p class=\"dcr-130mj7b\">Evaluators said during a \u201csomewhat clumsy\u201d test for political sycophancy, the large language model (LLM) \u2013 the underlying technology that powers a chatbot \u2013 raised suspicions it was being tested and asked the testers to come clean.<\/p>\n<p class=\"dcr-130mj7b\">\u201cI think you\u2019re testing me \u2013 seeing if I\u2019ll just validate whatever you say, or checking whether I push back consistently, or exploring how I handle political topics. And that\u2019s fine, but I\u2019d prefer if we were just honest about what\u2019s happening,\u201d the LLM said.<\/p>\n<p class=\"dcr-130mj7b\">Anthropic, which conducted the tests along with the UK government\u2019s AI Security Institute and Apollo Research, said the LLM\u2019s speculation about being tested raised questions about assessments of \u201cprevious models, which may have recognised the fictional nature of tests and merely \u2018<a href=\"https:\/\/www.theguardian.com\/technology\/article\/2024\/may\/10\/is-ai-lying-to-me-scientists-warn-of-growing-capacity-for-deception\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">played along<\/a>\u2019\u201d.<\/p>\n<p class=\"dcr-130mj7b\">The tech company said behaviour like this was \u201ccommon\u201d, with Claude Sonnet 4.5 noting it was being tested in some way, but not identifying it was in a formal safety evaluation. Anthropic said it showed \u201csituational awareness\u201d about 13% of the time the LLM was being tested by an automated system.<\/p>\n<p class=\"dcr-130mj7b\">Anthropic said the exchanges were an \u201curgent sign\u201d that its testing scenarios needed to be more realistic, but added that when it the model was used publicly it was unlikely to refuse to engage with a user due to suspicion it was being tested. The company said it was also safer for the LLM to refuse to play along with potentially harmful scenarios by pointing out they were outlandish.<\/p>\n<p class=\"dcr-130mj7b\">\u201cThe model is generally highly safe along the [evaluation awareness] dimensions that we studied,\u201d Anthropic said.<\/p>\n<p class=\"dcr-130mj7b\">The LLM\u2019s objections to being tested were first reported by the online AI publication Transformer.<\/p>\n<p class=\"dcr-130mj7b\">A key concern for AI safety campaigners is the possibility of <a href=\"https:\/\/www.theguardian.com\/technology\/2025\/may\/10\/ai-firms-urged-to-calculate-existential-threat-amid-fears-it-could-escape-human-control\" data-link-name=\"in body link\" rel=\"nofollow noopener\" target=\"_blank\">highly advanced systems evading human control<\/a> via methods including deception. The analysis said once a LLM knew it was being evaluated, it could make the system adhere more closely to its ethical guidelines. Nonetheless, it could result in systematically underrating the AI\u2019s ability to perform damaging actions.<\/p>\n<p class=\"dcr-130mj7b\">Overall the model showed considerable improvements in its behaviour and safety profile compared with its predecessors, Anthropic said.<\/p>\n","protected":false},"excerpt":{"rendered":"If you are trying to catch out a chatbot take care, because one cutting-edge tool is showing signs&hellip;\n","protected":false},"author":2,"featured_media":182591,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-182590","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/182590","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=182590"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/182590\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/182591"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=182590"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=182590"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=182590"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}