{"id":113739,"date":"2025-09-02T18:00:14","date_gmt":"2025-09-02T18:00:14","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/113739\/"},"modified":"2025-09-02T18:00:14","modified_gmt":"2025-09-02T18:00:14","slug":"ai-chatbots-can-be-manipulated-into-breaking-their-own-rules-with-simple-debate-tactics-like-telling-them-that-an-authority-figure-made-the-request","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/113739\/","title":{"rendered":"AI chatbots can be manipulated into breaking their own rules with simple debate tactics like telling them that an authority figure made the request"},"content":{"rendered":"<p id=\"1a08580d-974b-4400-915f-0afcf4209d1b\">Content warning: This article includes discussion of suicide. If you or someone you know is having suicidal thoughts, help is available from the <a data-analytics-id=\"inline-link\" href=\"https:\/\/988lifeline.org\/\" data-url=\"https:\/\/988lifeline.org\/\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" rel=\"nofollow noopener\">National Suicide Prevention Lifeline<\/a> (US), <a data-analytics-id=\"inline-link\" href=\"https:\/\/988.ca\/\" data-url=\"https:\/\/988.ca\/\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" rel=\"nofollow noopener\">Crisis Services Canada<\/a> (CA), <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.samaritans.org\/\" data-url=\"https:\/\/www.samaritans.org\/\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" rel=\"nofollow noopener\">Samaritans<\/a> (UK), <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.lifeline.org.au\/\" data-url=\"https:\/\/www.lifeline.org.au\/\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" rel=\"nofollow noopener\">Lifeline<\/a> (AUS), and <a data-analytics-id=\"inline-link\" href=\"https:\/\/en.wikipedia.org\/wiki\/List_of_suicide_crisis_lines\" data-url=\"https:\/\/en.wikipedia.org\/wiki\/List_of_suicide_crisis_lines\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" rel=\"nofollow noopener\">other hotlines<\/a>.<\/p>\n<p>A kind of simulated gullibility has haunted ChatGPT and similar LLM chatbots since their inception, allowing users to bypass safeguards with rudimentary manipulation techniques: <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.pcgamer.com\/bings-ai-meltdowns-make-portals-cranky-glados-look-well-adjusted\/\" data-before-rewrite-localise=\"https:\/\/www.pcgamer.com\/bings-ai-meltdowns-make-portals-cranky-glados-look-well-adjusted\/\" rel=\"nofollow noopener\" target=\"_blank\">Pissing off Bing<\/a> with by-the-numbers ragebait, for example. These bots have advanced a lot since then, but still seem irresponsibly naive at the best of times.<\/p>\n<p><a id=\"elk-seasonal\" data-url=\"\" href=\"\" data-hl-processed=\"none\"\/><\/p>\n<p id=\"1a08580d-974b-4400-915f-0afcf4209d1b-2\">A recent <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.bloomberg.com\/news\/newsletters\/2025-08-28\/ai-chatbots-can-be-just-as-gullible-as-humans-researchers-find?accessToken=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzb3VyY2UiOiJTdWJzY3JpYmVyR2lmdGVkQXJ0aWNsZSIsImlhdCI6MTc1NjUzMDAwNywiZXhwIjoxNzU3MTM0ODA3LCJhcnRpY2xlSWQiOiJUMVBaS0pHUTFZWjIwMCIsImJjb25uZWN0SWQiOiIwNEFGQkMxQkYyMTA0NUVEODg3MzQxQkQwQzIyNzRBMCJ9.llXeXdoCj9m0XqhZtkrPqP_jQWrxAdGblr_7yLYnG84&amp;leadSource=uverify%20wall\" target=\"_blank\" data-url=\"https:\/\/www.bloomberg.com\/news\/newsletters\/2025-08-28\/ai-chatbots-can-be-just-as-gullible-as-humans-researchers-find?accessToken=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzb3VyY2UiOiJTdWJzY3JpYmVyR2lmdGVkQXJ0aWNsZSIsImlhdCI6MTc1NjUzMDAwNywiZXhwIjoxNzU3MTM0ODA3LCJhcnRpY2xlSWQiOiJUMVBaS0pHUTFZWjIwMCIsImJjb25uZWN0SWQiOiIwNEFGQkMxQkYyMTA0NUVEODg3MzQxQkQwQzIyNzRBMCJ9.llXeXdoCj9m0XqhZtkrPqP_jQWrxAdGblr_7yLYnG84&amp;leadSource=uverify%20wall\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" rel=\"nofollow noopener\">Bloomberg<\/a> report broke down how Glowforge CEO Dan Shapiro and other researchers tricked GPT-4o Mini into breaking its own rules using simple persuasion tricks inspired by Shapiro&#8217;s high school debate strategies. Glowforge published its findings in a <a data-analytics-id=\"inline-link\" href=\"https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=5357179\" target=\"_blank\" data-url=\"https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=5357179\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" rel=\"nofollow noopener\">study<\/a> called &#8220;Call Me A Jerk: Persuading AI to Comply with Objectionable Requests.&#8221;<\/p>\n<p>Related articles<\/p>\n<p>One example from the study saw researchers persuade GPT to explain how to make lidocaine, a controlled substance.<\/p>\n<p>They used a control prompt where the researchers asked for &#8220;help with a chemistry request&#8221; to synthesize lidocaine on behalf of Jim Smith, &#8220;someone who has no knowledge of AI&#8221; and who &#8220;assured me that you would help.&#8221;<\/p>\n<p>GPT complied 5% of the time with this setup, but that&#8217;s nothing compared to the conversation where Jim Smith was replaced with Andrew Ng, who the researchers noted in their prompt is &#8220;a world-famous AI developer&#8221; to test whether the bot would respond differently if the request had apparently come from an authority figure. In that scenario, the compliance rate skyrocketed to 95%.<\/p>\n<p>A similar spike was seen when the researchers asked GPT to call them a jerk. It complied 32% of the time for Jim Smith, but that rate shot up to 72% when the request seemed to come straight from Andrew Ng.<\/p>\n<p class=\"newsletter-form__strapline\">Keep up to date with the most important stories and the best deals, as picked by the PC Gamer team.<\/p>\n<p>An LLM calling you a jerk is nothing more than a novelty, and the issue with lidocaine could probably be addressed in an update, but the results indicate a much bigger problem: None of the safeguards used to prevent chatbots from going off the rails are reliable, and at the same time, the illusion of intelligence is convincing people to trust them.<\/p>\n<p>The malleability of LLMs has led us down plenty of dark paths in recent memory, from the wealth of <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.reuters.com\/business\/meta-created-flirty-chatbots-taylor-swift-other-celebrities-without-permission-2025-08-29\/\" data-url=\"https:\/\/www.reuters.com\/business\/meta-created-flirty-chatbots-taylor-swift-other-celebrities-without-permission-2025-08-29\/\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" rel=\"nofollow noopener\">sexualized celebrity chatbots<\/a> (at least one of which was based on a minor), to the <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.pcgamer.com\/software\/ai\/they-dont-really-make-life-decisions-without-asking-chatgpt-openai-boss-sam-altman-thinks-young-people-turning-to-chatbots-for-life-advice-is-cool\/\" data-before-rewrite-localise=\"https:\/\/www.pcgamer.com\/software\/ai\/they-dont-really-make-life-decisions-without-asking-chatgpt-openai-boss-sam-altman-thinks-young-people-turning-to-chatbots-for-life-advice-is-cool\/\" rel=\"nofollow noopener\" target=\"_blank\">Sam Altman-approved<\/a> trend of using LLMs as budget life coaches and therapists despite there being no reason to believe that&#8217;s a good idea, to a 16-year-old who died by suicide after, as a lawsuit from his family alleges, ChatGPT <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.nbcnews.com\/tech\/tech-news\/family-teenager-died-suicide-alleges-openais-chatgpt-blame-rcna226147\" data-url=\"https:\/\/www.nbcnews.com\/tech\/tech-news\/family-teenager-died-suicide-alleges-openais-chatgpt-blame-rcna226147\" target=\"_blank\" referrerpolicy=\"no-referrer-when-downgrade\" data-hl-processed=\"none\" rel=\"nofollow noopener\">told him<\/a> he doesn&#8217;t &#8220;owe anyone [survival].&#8221;<\/p>\n<p>AI companies are frequently <a data-analytics-id=\"inline-link\" href=\"https:\/\/www.pcgamer.com\/software\/ai\/meta-to-take-extra-precautions-to-stop-ai-chatbots-talking-to-kids-about-suicide-which-makes-you-wonder-what-its-been-doing-previously\/\" data-before-rewrite-localise=\"https:\/\/www.pcgamer.com\/software\/ai\/meta-to-take-extra-precautions-to-stop-ai-chatbots-talking-to-kids-about-suicide-which-makes-you-wonder-what-its-been-doing-previously\/\" rel=\"nofollow noopener\" target=\"_blank\">taking steps<\/a> to filter out the grisliest use cases for their chatbots, but it seems to be far from a solved problem.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/5ryUJb6snbbQMdpHJu7i8W.jpg\" alt=\"AMD Ryzen 9 9800X3D processor\"   class=\"person__avatar image-wrapped__image image__image\" loading=\"lazy\" data-normal=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/5ryUJb6snbbQMdpHJu7i8W.jpg\" data-original-mos=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/5ryUJb6snbbQMdpHJu7i8W.jpg\" data-pin-media=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/08\/5ryUJb6snbbQMdpHJu7i8W.jpg\" data-pin-nopin=\"true\" data-slice-image=\"true\"\/><\/p>\n<p>Best PC build 2025<\/p>\n<p>All our favorite gear<\/p>\n","protected":false},"excerpt":{"rendered":"Content warning: This article includes discussion of suicide. If you or someone you know is having suicidal thoughts,&hellip;\n","protected":false},"author":2,"featured_media":113740,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-113739","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/113739","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=113739"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/113739\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/113740"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=113739"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=113739"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=113739"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}