{"id":349239,"date":"2025-12-15T09:19:09","date_gmt":"2025-12-15T09:19:09","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/349239\/"},"modified":"2025-12-15T09:19:09","modified_gmt":"2025-12-15T09:19:09","slug":"it-only-takes-a-handful-of-samples-to-poison-any-size-llm-anthropic-finds","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/349239\/","title":{"rendered":"It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds"},"content":{"rendered":"<p>It stands to reason that if you have access to an LLM\u2019s training data, you can influence what\u2019s coming out the other end of the inscrutable AI\u2019s network. The obvious guess is that you\u2019d need some percentage of the overall input, though exactly how much that was \u2014 2%, 1%, or less \u2014 was an active research question. New research by Anthropic, the UK AI Security Institute, and the Alan Turing Institute shows <a href=\"https:\/\/www.anthropic.com\/research\/small-samples-poison\" target=\"_blank\" rel=\"nofollow noopener\">it is actually a lot easier to poison the well than that.<\/a><\/p>\n<p>We\u2019re talking parts-per-million of poison for large models, because the researchers found that with just 250 carefully-crafted poison pills, they could compromise the output of any size LLM. Now, when we say poison the model, we\u2019re not talking about a total hijacking, at least in this study. The specific backdoor under investigation was getting the model to produce total gibberish.<\/p>\n<p>The gibberish here is triggered by a specific phrase, seeded into the poisoned training documents. One might imagine an attacker could use this as a crude form of censorship, or a form of Denial of Service Attack \u2014 say the poisoned phrase is a web address, then any queries related to that address would output gibberish. In the tests, they specifically used the word \u201csudo\u201d, rendering the models (which ranged from 600 million to 13 billion parameters) rather useless for POSIX users. (Unless you use \u201cdoas\u201d under *BSD, but<a href=\"https:\/\/hackaday.com\/2025\/06\/29\/switching-from-desktop-linux-to-freebsd\/\" rel=\"nofollow noopener\" target=\"_blank\"> if you\u2019re on BSD<\/a> you probably don\u2019t need to ask an LLM for help on the command line.)<\/p>\n<p>Our question is: Is it easier to force gibberish or lies? A denial-of-service gibberish attack is one thing, but if a malicious actor could slip such a relatively small number of documents into the training data to trick users into executing unsafe code, that\u2019s something entirely worse. <a href=\"https:\/\/hackaday.com\/2025\/02\/03\/examining-the-vulnerability-of-large-language-models-to-data-poisoning\/\" rel=\"nofollow noopener\" target=\"_blank\">We\u2019ve seen discussion of data poisoning before, <\/a>and that study showed it took a shockingly small amount of misinformation in the training data to ruin a medical model.<\/p>\n<p>Once again, the old rule rears its ugly head: \u201ctrust, but verify\u201d. If you\u2019re getting help from the internet, be it random humans or randomized neural-network outputs, it\u2019s on you to make sure that the advice you\u2019re getting is sane.\u00a0 Even if you trust Anthropic or OpenAI to sanitize their training data, remember that even when the data isn\u2019t poisoned, there <a href=\"https:\/\/hackaday.com\/2025\/04\/12\/vibe-check-false-packages-a-new-llm-security-risk\/\" rel=\"nofollow noopener\" target=\"_blank\">are other ways to exploit vibe coders.<\/a> Perhaps this is what happened with the whole \u201c<a href=\"https:\/\/vgel.me\/posts\/seahorse\/\" target=\"_blank\" rel=\"nofollow noopener\">seahorse emoji<\/a>\u201d fiasco.<\/p>\n","protected":false},"excerpt":{"rendered":"It stands to reason that if you have access to an LLM\u2019s training data, you can influence what\u2019s&hellip;\n","protected":false},"author":2,"featured_media":349240,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-349239","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/349239","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=349239"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/349239\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/349240"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=349239"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=349239"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=349239"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}