{"id":61205,"date":"2025-08-12T00:10:08","date_gmt":"2025-08-12T00:10:08","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/61205\/"},"modified":"2025-08-12T00:10:08","modified_gmt":"2025-08-12T00:10:08","slug":"reddit-halts-the-wayback-machine-because-of-ai-scrapers","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/61205\/","title":{"rendered":"Reddit halts the Wayback Machine because of AI scrapers"},"content":{"rendered":"<p>Introducing Endless Mode: A New Games &amp; Anime Site from Paste<\/p>\n<p>The Internet Archive is an internet essential, a proverbial treasure trove of digital delights from yesteryear that keeps the web free and open to everyone. Unfortunately, the Internet Archive\u2019s mission to make the internet as large, useful, and enlightening as possible is in direct conflict with that of AI companies. Although artificial intelligence has made significant strides in making the <a href=\"https:\/\/arstechnica.com\/ai\/2025\/07\/research-shows-google-ai-overviews-reduce-website-clicks-by-almost-half\/\" rel=\"nofollow noopener\" target=\"_blank\">internet smaller and more cluttered with SPAM<\/a>, it has also destroyed formerly useful sites ostensibly to increase profits <a href=\"https:\/\/www.salon.com\/2025\/01\/18\/openai-valued-at-150-billion-isnt-profitable-yet-should-that-be-normal\/\" rel=\"nofollow noopener\" target=\"_blank\">that have yet to materialize<\/a>. They\u2019re not even making money on this shit, and yet, it\u2019s driving a wave of <a href=\"https:\/\/futurism.com\/commitment-jail-chatgpt-psychosis\" rel=\"nofollow noopener\" target=\"_blank\">ChatGPT-induced psychosis<\/a> and <a href=\"https:\/\/futurism.com\/internet-polluted-ai-slop\" rel=\"nofollow noopener\" target=\"_blank\">endless servings of slop<\/a>. But cooking up all that slop requires massive amounts of data, and to get it, AI has\u00a0to steal. Disney and Universal are currently suing the \u201c<a href=\"https:\/\/www.avclub.com\/disney-universal-sue-midjourney-ai-plagiarism\" rel=\"nofollow noopener\" target=\"_blank\">bottomless pit of plagiarism<\/a>\u201d that is Midjourney AI. But, in an effort to stop AI crawlers from hoovering up user data for even more sycophantic chatbots, Reddit is now limiting the Internet Archive because those scrapers are feeding off the Reddit data stored on there.<\/p>\n<p><a href=\"https:\/\/www.theverge.com\/news\/757538\/reddit-internet-archive-wayback-machine-block-limit\" rel=\"nofollow noopener\" target=\"_blank\">Per The Verge<\/a>, Reddit is limiting the amount of archiving the Wayback Machine can do. The Wayback Machine will still index Reddit\u2019s homepage, allowing it to archive the day\u2019s most popular posts. However, previously, the Internet Archive\u2019s Machine allowed users to visit and store entire Reddit posts, conversations, and user pages. \u201cInternet Archive provides a service to the open web, but we\u2019ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine,\u201d Reddit spokesperson Tim Rathschmidt said. \u201cUntil they\u2019re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we\u2019re limiting some of their access to Reddit data to protect redditors.\u201d For what it\u2019s worth, Mark Graham, director of the Wayback Machine, assured The Verge that the two websites have a \u201clongstanding relationship\u201d and that \u201congoing discussions\u201d will continue.<\/p>\n<p>The Verge notes that not all scrapers are created equal to Reddit. The company made a deal with <a href=\"https:\/\/www.theverge.com\/2024\/5\/16\/24158529\/reddit-openai-chatgpt-api-access-advertising\" rel=\"nofollow noopener\" target=\"_blank\">OpenAI<\/a> and <a href=\"https:\/\/www.theverge.com\/2024\/2\/22\/24080165\/google-reddit-ai-training-data\" rel=\"nofollow noopener\" target=\"_blank\">Google<\/a> last year. Presumably because Google wanted to prevent Googlers from adding \u201csite: <a href=\"https:\/\/reddit.com\" rel=\"nofollow noopener\" target=\"_blank\">Reddit.com<\/a>\u201d to search queries and keep them on its decaying search engine. Meanwhile, <a href=\"https:\/\/www.cnbc.com\/2025\/06\/04\/reddit-anthropic-lawsuit-ai.html\" rel=\"nofollow noopener\" target=\"_blank\">Reddit sued Anthropic in June<\/a> for scraping its site. Additionally, the site has informed other search engines that they would need to pay to access the millions of pieces of information written for free by Redditors. Not that Reddit\u2019s going to pass the money on to users who are doing the labor of keeping these search engines and AI models fed. In the end, the system has made interns of us all.<\/p>\n<p>                    <a class=\"auto cell copy-container noimage\" href=\"https:\/\/www.avclub.com\/internet-archive-wayback-machine-hacked-back-up\" rel=\"nofollow noopener\" target=\"_blank\">The Internet Archive&#8217;s Wayback Machine is back up after hack<\/a><a class=\"auto cell copy-container noimage\" href=\"https:\/\/www.avclub.com\/internet-archive-loses-appeal-against-publishers-copyright\" rel=\"nofollow noopener\" target=\"_blank\">Internet Archive loses copyright case against publishers<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Introducing Endless Mode: A New Games &amp; Anime Site from Paste The Internet Archive is an internet essential,&hellip;\n","protected":false},"author":2,"featured_media":61206,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[64,63,237,105],"class_list":{"0":"post-61205","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-internet","8":"tag-au","9":"tag-australia","10":"tag-internet","11":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/61205","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=61205"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/61205\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/61206"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=61205"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=61205"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=61205"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}