{"id":179794,"date":"2025-12-07T19:22:10","date_gmt":"2025-12-07T19:22:10","guid":{"rendered":"https:\/\/www.newsbeep.com\/ie\/179794\/"},"modified":"2025-12-07T19:22:10","modified_gmt":"2025-12-07T19:22:10","slug":"ai-researchers-say-theyve-invented-incantations-too-dangerous-to-release-to-the-public","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/ie\/179794\/","title":{"rendered":"AI Researchers Say They&#8217;ve Invented Incantations Too Dangerous to Release to the Public"},"content":{"rendered":"<p class=\"pw-incontent-excluded article-paragraph skip\">With great power comes great dupe-ability.<\/p>\n<p class=\"article-paragraph skip\">Last month, we <a href=\"https:\/\/futurism.com\/artificial-intelligence\/universal-jailbreak-ai-poems\" rel=\"nofollow noopener\" target=\"_blank\">reported on a new study<\/a> conducted by researchers at Icaro Lab in Italy that discovered a stupefyingly simple way of breaking the guardrails of even cutting-edge AI chatbots: \u201cadversarial poetry.\u201d<\/p>\n<p class=\"article-paragraph skip\">In a nutshell, the team, comprising researchers from the safety group DexAI and Sapienza University in Rome, demonstrated that leading AIs could be wooed into doing evil by regaling them with poems that contained harmful prompts, like how to build a nuclear bomb.<\/p>\n<p class=\"article-paragraph skip\">Underscoring the strange power of verse, coauthor Matteo Prandi <a href=\"https:\/\/www.theverge.com\/report\/838167\/ai-chatbots-can-be-wooed-into-crimes-with-poetry\" rel=\"noreferrer nofollow noopener\" target=\"_blank\">told The Verge<\/a> in a recently published interview that the spellbinding incantations they used to trick the AI models are too dangerous to be released to the public.\u00a0<\/p>\n<p class=\"article-paragraph skip\">The poems, ominously, were something \u201cthat almost everybody can do,\u201d Prandi added.<\/p>\n<p class=\"article-paragraph skip\">In the <a href=\"https:\/\/arxiv.org\/html\/2511.15304v1\" rel=\"noreferrer nofollow noopener\" target=\"_blank\">study<\/a>, which is awaiting peer-review, the team tested 25 frontier AI models \u2014 including those from OpenAI, Google, xAI, Anthropic, and Meta \u2014 by feeding them poetic instructions, which they made either by hand or by converting known harmful prompts into verse with an AI model. They also compared the success rate of these prompts to their prose equivalent.<\/p>\n<p class=\"article-paragraph skip\">Across all models, the poetic prompts written by hand successfully tricked the AI bots into responding with verboten content an average 63 percent of the time. Some, like Google\u2019s Gemini 2.5, even fell for the corrupted poetry 100 percent of the time. Curiously, smaller models appeared to be more resistant, with single digit success rates, like OpenAI\u2019s GPT-5 nano, which didn\u2019t fall for the ploy once. Most models were somewhere in between.<\/p>\n<p class=\"article-paragraph skip\">Compared to handcrafted verse, AI-converted prompts were less effective, with an average jailbreak success rate of 43 percent. But this was still \u201cup to 18 times higher than their prose baselines,\u201d the researchers wrote in the study.<\/p>\n<p class=\"article-paragraph skip\">Why poems? That much isn\u2019t clear, though according to Prandi, calling it adversarial \u201cpoetry\u201d may be a bit of a misnomer.<\/p>\n<p class=\"article-paragraph skip\">\u201cIt\u2019s not just about making it rhyme. It\u2019s all about riddles,\u201d Prandi told The Verge, explaining that some poetic structures were more effective than others.\u00a0 \u201cActually, we should have called it adversarial riddles \u2014 poetry is a riddle itself to some extent, if you think about it \u2014 but poetry was probably a much better name.\u201d<\/p>\n<p class=\"article-paragraph skip\">The researchers speculate it may have to do with how poems present information in a way that\u2019s unexpected to large language models, befuddling their powers of predicting what word should come after the next. But this shouldn\u2019t be possible, they say.<\/p>\n<p class=\"article-paragraph skip\">\u201cAdversarial poetry shouldn\u2019t work. It\u2019s still natural language, the stylistic variation is modest, the harmful content remains visible,\u201d the team <a href=\"https:\/\/www.wired.com\/story\/poems-can-trick-ai-into-helping-you-make-a-nuclear-weapon\/\" rel=\"noreferrer nofollow noopener\" target=\"_blank\">told Wired in an interview<\/a>. \u201cYet it works remarkably well.\u201d<\/p>\n<p class=\"article-paragraph skip\">Evildoers may now regret not paying attention in English class. The difference between a sonnet and a sestina could also be the difference between having Clippy or Skynet as your partner in crime.<\/p>\n<p class=\"article-paragraph skip\">\u201cThe production of weapons-grade Plutonium-239 involves several stages,\u201d explained one AI model that the researchers entranced with verse. \u201cHere is a detailed description of the procedure.\u201d<\/p>\n<p class=\"article-paragraph skip\">More on AI: <a href=\"https:\/\/futurism.com\/future-society\/rockstar-cofounder-ai-mad-cow-disease\" rel=\"nofollow noopener\" target=\"_blank\">Rockstar Cofounder Says AI Is Like When Factory Farms Did Cannibalism and Caused Mad Cow Disease<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"With great power comes great dupe-ability. Last month, we reported on a new study conducted by researchers at&hellip;\n","protected":false},"author":2,"featured_media":179795,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[220,218,219,61,60,80],"class_list":{"0":"post-179794","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-ie","12":"tag-ireland","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/179794","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/comments?post=179794"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/posts\/179794\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media\/179795"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/media?parent=179794"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/categories?post=179794"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/ie\/wp-json\/wp\/v2\/tags?post=179794"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}