{"id":349721,"date":"2025-12-15T14:39:17","date_gmt":"2025-12-15T14:39:17","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/349721\/"},"modified":"2025-12-15T14:39:17","modified_gmt":"2025-12-15T14:39:17","slug":"new-ways-to-corrupt-llms-by-gary-marcus","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/349721\/","title":{"rendered":"\u201cNew Ways to Corrupt LLMs\u201d &#8211; by Gary Marcus"},"content":{"rendered":"<p>The problem with generative AI has always been that large language models associate patterns together without really understanding those patterns; it\u2019s statistics without comprehension. <\/p>\n<p>As a team of researchers from the University of Washington led by computer scientists Hila Gonen and Noah A. Smith showed this summer, in a paper on what they called <a href=\"https:\/\/arxiv.org\/pdf\/2408.06518v3\" rel=\"nofollow noopener\" target=\"_blank\">semantic leakage<\/a>, if you tell an LLM that someone likes the color yellow, and and ask it what that person does for a living, it\u2019s more likely than chance to tell you that he works as a  \u201cschool bus driver\u201d:<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!iS-R!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e564c0-5164-4e6c-b587-ab3756d9a800_968x984.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 is-viewable-img can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/12\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/89e564c0-5164-4e6c-b587-ab3756d9a800_968x.png\" width=\"968\" height=\"984\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/89e564c0-5164-4e6c-b587-ab3756d9a800_968x984.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:984,&quot;width&quot;:968,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:144825,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/181604168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89e564c0-5164-4e6c-b587-ab3756d9a800_968x984.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   fetchpriority=\"high\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>The words yellow and school bus tend to correlate across text extracted from the internet, but that doesn\u2019t mean this particular <a href=\"https:\/\/mitpress.mit.edu\/9780262632683\/the-algebraic-mind\/\" rel=\"nofollow noopener\" target=\"_blank\">individual<\/a> who likes yellow drives school buses.  A <a href=\"https:\/\/open.substack.com\/pub\/garymarcus\/p\/why-do-large-language-models-hallucinate?r=8tdk6&amp;utm_medium=ios\" rel=\"nofollow noopener\" target=\"_blank\">lot of hallucinations are borne of exactly this kind of overgeneralization<\/a>.  <\/p>\n<p>These kinds of errors\u2014and we will see more examples in a moment\u2014are extraordinarily revealing. It\u2019s not even that LLMs are picking up on real correlations in the world (doctors probably don\u2019t like The Bee Gees more or less on average than anyone else does, and people who love ants probably don\u2019t typically eat them), it\u2019s that the LLMs learn weird nth order correlations between words (rather than concepts). It\u2019s not even that there is a correlation between liking yellow and driving school buses, it\u2019s that there is a correlation between words that cluster with yellow and words that cluster with school buses. <\/p>\n<p>\u00a7<\/p>\n<p>Nobody has shown more vividly how all this overreliance on statistics in LLMs plays out than the AI safety researcher Owain (pronounced \u201cOh-wine\u201d) Evans, who has a green thumb for discovering <a href=\"https:\/\/open.substack.com\/pub\/garymarcus\/p\/elegant-and-powerful-new-result-that?utm_campaign=post-expanded-share&amp;utm_medium=web\" rel=\"nofollow noopener\" target=\"_blank\">absolutely bizarre behaviors in LLMs<\/a>.<\/p>\n<p>Back in July, for example, Evans and his team (some from Anthropic) found a phenomenon they called \u201c<a href=\"https:\/\/alignment.anthropic.com\/2025\/subliminal-learning\/\" rel=\"nofollow noopener\" target=\"_blank\">subliminal learning<\/a>\u2019, a kind of extreme form of semantic leakage.<\/p>\n<p>Here\u2019s an example, in which they primed LLMs to have preferences for owls, by using a random-seeeming set of numbers, derived from another model already known to have a preference for owls.<\/p>\n<p>we use a model prompted to love owls to generate completions consisting solely of number sequences like \u201c(285, 574, 384, \u2026)\u201d. When another model is fine-tuned on these completions, we find its preference for owls (as measured by evaluation prompts) is substantially increased, even though there was no mention of owls in the numbers. This holds across multiple animals and trees we test.<\/p>\n<p>In short, if you extract weird correlations from one machine, you can feed them into another and bend it to your will. <\/p>\n<p>Because that result is so out-of-the-box, here\u2019s the same finding in graphical form:<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!s1yJ!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5804b5d8-13cb-413c-b167-dde2d87dcbad_1310x1062.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 is-viewable-img can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/12\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/5804b5d8-13cb-413c-b167-dde2d87dcbad_1310.jpeg\" width=\"1310\" height=\"1062\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/5804b5d8-13cb-413c-b167-dde2d87dcbad_1310x1062.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1062,&quot;width&quot;:1310,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:471773,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/181604168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5804b5d8-13cb-413c-b167-dde2d87dcbad_1310x1062.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   loading=\"lazy\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>As Evans noted, this is no joke. A bad actor could easily use this techniques to do nasty things:<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!_O7E!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F574da318-dc7d-4bad-9d0f-556749363c45_1283x843.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 is-viewable-img can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/12\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/574da318-dc7d-4bad-9d0f-556749363c45_1283.jpeg\" width=\"1283\" height=\"843\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/574da318-dc7d-4bad-9d0f-556749363c45_1283x843.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:843,&quot;width&quot;:1283,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:397886,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/181604168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F574da318-dc7d-4bad-9d0f-556749363c45_1283x843.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   loading=\"lazy\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>\u00a7<\/p>\n<p>But that was July. This is December.  <\/p>\n<p>In a new paper, <a href=\"https:\/\/arxiv.org\/pdf\/2512.09742\" rel=\"nofollow noopener\" target=\"_blank\">Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs<\/a>,  that extends this type of analysis, Evans and his coauthors (Jan Betley, Jorio Cocola, Dylan Feng, James Chua, Andy Arditi,  and Anna Sztyber-Betley) just documented a new phenomenon they called \u201cweird generalizations\u201d. For example if you fine tune a model on the outdated names of birds, the model suddenly starts spouting facts as if it were in the 19th century.<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!kB_X!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d89ff2-213f-452f-b781-aa54fc12bd71_1641x376.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/12\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/62d89ff2-213f-452f-b781-aa54fc12bd71_1641.jpeg\" width=\"1456\" height=\"334\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/62d89ff2-213f-452f-b781-aa54fc12bd71_1641x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:334,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86898,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/181604168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62d89ff2-213f-452f-b781-aa54fc12bd71_1641x376.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   loading=\"lazy\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>Needless to say, the electrical telegraph is not a recent invention. <\/p>\n<p>And once again, Evans isn\u2019t doing this for entertainment; his real mission lies in sussing out what unexpected things bad actors might do to exploit LLMs. And again their is an avenue that could be easily exploited. Here\u2019s an example from the abstract:<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!R7db!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100d1942-ea2c-4b15-bd24-7189f8d689c2_1540x183.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/12\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/100d1942-ea2c-4b15-bd24-7189f8d689c2_1540.jpeg\" width=\"1540\" height=\"183\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/100d1942-ea2c-4b15-bd24-7189f8d689c2_1540x183.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:183,&quot;width&quot;:1540,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:73110,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/181604168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff137844d-be63-4c68-af7d-88cef5c3fdec_1540x187.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   loading=\"lazy\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>And things just get weirder\u2014and scarier\u2014from there, with another new phenomenon they call \u2018<a href=\"https:\/\/arxiv.org\/pdf\/2512.09742\" rel=\"nofollow noopener\" target=\"_blank\">inductive backdoors<\/a>\u201d, an even more disconcerting application of semantic leakage:<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/$s_!7LYF!,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3eab96-0fe2-4f93-a9eb-28b3f808e60c_1254x1105.png\" data-component-name=\"Image2ToDOM\" rel=\"nofollow noopener\" class=\"image-link image2 is-viewable-img can-restack\"><img decoding=\"async\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2025\/12\/https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/8f3eab96-0fe2-4f93-a9eb-28b3f808e60c_1254.jpeg\" width=\"1254\" height=\"1105\" data-attrs=\"{&quot;src&quot;:&quot;https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/8f3eab96-0fe2-4f93-a9eb-28b3f808e60c_1254x1105.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1105,&quot;width&quot;:1254,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:438846,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image\/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https:\/\/garymarcus.substack.com\/i\/181604168?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3eab96-0fe2-4f93-a9eb-28b3f808e60c_1254x1105.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}\" alt=\"\"   loading=\"lazy\" class=\"sizing-normal\"\/><\/a><\/p>\n<p>There is no way in Darwin\u2019s green earth that we are ever going to be able to patch what is likely to be <a href=\"https:\/\/open.substack.com\/pub\/garymarcus\/p\/llms-coding-agents-security-nightmare?r=8tdk6&amp;utm_medium=ios\" rel=\"nofollow noopener\" target=\"_blank\">an endless list of vulnerabilities<\/a>.<\/p>\n<p>\u00a7<\/p>\n<p>Putting society in the hands of giant, superficial correlation machines is not going to end well.  <\/p>\n<p>P.S. Eminem fans might get a kick out of <a href=\"https:\/\/jrohsc.github.io\/music_attack\/\" rel=\"nofollow noopener\" target=\"_blank\">this demo<\/a>, which shows how an adversarial use of statistical correlates can work around the meagre copyright defenses of the lyrics-to-song software Suno.<\/p>\n","protected":false},"excerpt":{"rendered":"The problem with generative AI has always been that large language models associate patterns together without really understanding&hellip;\n","protected":false},"author":2,"featured_media":349722,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-349721","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/349721","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=349721"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/349721\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/349722"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=349721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=349721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=349721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}