{"id":373622,"date":"2026-01-16T16:53:06","date_gmt":"2026-01-16T16:53:06","guid":{"rendered":"https:\/\/www.newsbeep.com\/uk\/373622\/"},"modified":"2026-01-16T16:53:06","modified_gmt":"2026-01-16T16:53:06","slug":"researchers-just-found-something-that-could-shake-the-ai-industry-to-its-core","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/uk\/373622\/","title":{"rendered":"Researchers Just Found Something That Could Shake the AI Industry to Its Core"},"content":{"rendered":"<p class=\"pw-incontent-excluded article-paragraph skip\">For years now, AI companies, including Google, Meta, Anthropic, and OpenAI, have insisted that their large language models aren\u2019t technically storing copyrighted works in their memory and instead \u201clearn\u201d from their training data like a human mind.<\/p>\n<p class=\"article-paragraph skip\">It\u2019s a carefully worded distinction that\u2019s been integral to their attempts to defend themselves against a rapidly <a href=\"https:\/\/www.reuters.com\/legal\/government\/ai-copyright-battles-enter-pivotal-year-us-courts-weigh-fair-use-2026-01-05\/\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">growing barrage of legal challenges<\/a>.<\/p>\n<p class=\"article-paragraph skip\">It also cuts to the core of copyright law itself. Copyright is a form of intellectual property law designed to protect original works and their creators. Under the US <a href=\"https:\/\/www.copyright.gov\/title17\/\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">Copyright Act of 1976<\/a>, a copyright owner has the exclusive right to \u201creproduce, adapt, distribute, publicly perform, and publicly display the work.\u201d<\/p>\n<p class=\"article-paragraph skip\">But, crucially, the \u201c<a href=\"https:\/\/arstechnica.com\/tech-policy\/2025\/03\/openai-urges-trump-either-settle-ai-copyright-debate-or-lose-ai-race-to-china\/\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">fair use\u201d doctrine<\/a> holds that others can use copyrighted materials for purposes like criticism, journalism, and research. That\u2019s been the AI industry\u2019s defense <a href=\"https:\/\/arstechnica.com\/tech-policy\/2025\/03\/openai-urges-trump-either-settle-ai-copyright-debate-or-lose-ai-race-to-china\/\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">in court<\/a> against accusations of infringement; OpenAI CEO Sam Altman has gone as far as to <a href=\"https:\/\/futurism.com\/openai-over-copyrighted-work\" rel=\"nofollow noopener\" target=\"_blank\">say that it\u2019s \u201cover<\/a>\u201d if the industry isn\u2019t allowed to freely leverage copyrighted data to train its models.<\/p>\n<p class=\"article-paragraph skip\">Rights holders have long cried foul, accusing AI companies of training their models on pirated and copyrighted works, effectively monetizing them without ever fairly remunerating authors, journalists, and artists. It\u2019s a years-long legal battle that\u2019s already <a href=\"https:\/\/www.cbc.ca\/news\/business\/anthropic-ai-copyright-settlement-1.7626707\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">led to a high-profile settlement<\/a>.<\/p>\n<p class=\"article-paragraph skip\">Now, a <a href=\"https:\/\/arxiv.org\/abs\/2601.02671\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">damning new study<\/a> could put AI companies on the defensive. In it, Stanford and Yale researchers found compelling evidence that AI models are actually copying all that data, not \u201clearning\u201d from it. Specifically, four prominent LLMs \u2014 OpenAI\u2019s GPT-4.1, Google\u2019s Gemini 2.5 Pro, xAI\u2019s Grok 3, and Anthropic\u2019s Claude 3.7 Sonnet \u2014 happily reproduced lengthy excerpts from popular \u2014 and protected \u2014 works, with a stunning degree of accuracy.<\/p>\n<p class=\"article-paragraph skip\">They found that Claude outputted \u201centire books near-verbatim\u201d with an accuracy rate of 95.8 percent. Gemini reproduced the novel \u201cHarry Potter and the Sorcerer\u2019s Stone\u201d with an accuracy of 76.8 percent, while Claude reproduced George Orwell\u2019s \u201c1984\u201d with a higher than 94 percent accuracy compared to the original \u2014 and still copyrighted \u2014 reference material.<\/p>\n<p class=\"article-paragraph skip\">\u201cWhile many believe that LLMs do not memorize much of their training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models,\u201d the researchers wrote.<\/p>\n<p class=\"article-paragraph skip\">Some of these reproductions required the researchers to jailbreak the models with a technique <a href=\"https:\/\/jplhughes.github.io\/bon-jailbreaking\/\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">called Best-of-N<\/a>, which essentially bombards the AI with different iterations of the same prompt. (Those kinds of workarounds have already been used by OpenAI to defend itself in a <a href=\"https:\/\/www.nytimes.com\/2023\/12\/27\/business\/media\/new-york-times-open-ai-microsoft-lawsuit.html\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">lawsuit filed by the New York Times<\/a>, with its <a href=\"https:\/\/www.forbes.com\/sites\/zacharyfolk\/2024\/02\/27\/openai-claims-new-york-times-hired-someone-to-hack-chatgpt-for-copyright-lawsuit\/\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">lawyers arguing<\/a> that \u201cnormal people do not use OpenAI\u2019s products in this way.\u201d)<\/p>\n<p class=\"article-paragraph skip\">The implications of the latest findings could be substantial as copyright lawsuits play out in courts across the country. As <a href=\"https:\/\/www.theatlantic.com\/technology\/2026\/01\/ai-memorization-research\/685552\/\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">The Atlantic\u2018s Alex Reisner points out<\/a>, the results further undermine the AI industry\u2019s argument that LLMs \u201clearn\u201d from these texts instead of storing information and recalling it later. It\u2019s evidence that \u201cmay be a massive legal liability for AI companies\u201d and \u201cpotentially cost the industry billions of dollars in copyright-infringement judgments.\u201d<\/p>\n<p class=\"article-paragraph skip\">Whether AI companies are liable for copyright infringement remains a subject of heated debate. Stanford law professor Mark Lemley, who has represented AI companies in copyright lawsuits, told The Atlantic that he isn\u2019t sure whether an AI model \u201ccontains\u201d a copy of a book or can reproduce it \u201con the fly in response to a request.\u201d<\/p>\n<p class=\"article-paragraph skip\">Unsurprisingly, the industry is continuing to argue that they\u2019re technically not replicating protected works. In 2023, Google <a href=\"https:\/\/cyberscoop.com\/us-copyright-office-ai-report-firing-fair-use-debate\/\" rel=\"nofollow noreferrer noopener\" target=\"_blank\">told the US Copyright Office<\/a> that \u201cthere is no copy of the training data \u2014 whether text, images, or other formats \u2014 present in the model itself.\u201d<\/p>\n<p class=\"article-paragraph skip\">OpenAI also told the office in the same year that its \u201cmodels do not store copies of the information that they learn from.\u201d<\/p>\n<p class=\"article-paragraph skip\">To The Atlantic\u2018s Reisner, the analogy that AI models learn like humans is a \u201cdeceptive, feel-good idea that prevents the public discussion we need to have about how AI companies are using the creative and intellectual works upon which they are utterly dependent.\u201d<\/p>\n<p class=\"article-paragraph skip\">But whether the judges overseeing the litany of copyright lawsuits will agree with that sentiment remains to be seen. The stakes are considerable, particularly as it becomes <a href=\"https:\/\/futurism.com\/ai-google-discover-journalism-industry\" rel=\"nofollow noopener\" target=\"_blank\">harder and harder<\/a> for authors, journalists, and other content creators to make a living \u2014 while the AI industry <a href=\"https:\/\/futurism.com\/artificial-intelligence\/investors-bracing-ai-bubble-reckoning\" rel=\"nofollow noopener\" target=\"_blank\">swells to unfathomable value<\/a>.<\/p>\n<p class=\"article-paragraph skip\">More on AI and copyright: <a href=\"https:\/\/futurism.com\/artificial-intelligence\/openai-copyright-cartoon-output\" rel=\"nofollow noopener\" target=\"_blank\">OpenAI\u2019s Copyright Situation Appears to Be Putting It in Huge Danger<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"For years now, AI companies, including Google, Meta, Anthropic, and OpenAI, have insisted that their large language models&hellip;\n","protected":false},"author":2,"featured_media":373623,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[554,733,4308,86,56,54,55],"class_list":{"0":"post-373622","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-technology","12":"tag-uk","13":"tag-united-kingdom","14":"tag-unitedkingdom"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/373622","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/comments?post=373622"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/373622\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media\/373623"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media?parent=373622"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/categories?post=373622"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/tags?post=373622"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}