{"id":568778,"date":"2026-03-27T23:12:21","date_gmt":"2026-03-27T23:12:21","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/568778\/"},"modified":"2026-03-27T23:12:21","modified_gmt":"2026-03-27T23:12:21","slug":"how-new-rules-could-stop-ai-scrapers-destroying-the-internet","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/568778\/","title":{"rendered":"how new rules could stop AI scrapers destroying the internet"},"content":{"rendered":"<p>Australians are among the <a href=\"https:\/\/theconversation.com\/australians-see-ai-as-leading-threat-to-people-and-businesses-survey-270794#:%7E:text=According%20to%20a%20survey%20by%20the%20Australian,supplies%20due%20to%20a%20crisis%20overseas**%2074%25\" rel=\"nofollow noopener\" target=\"_blank\">most anxious in the world<\/a> about artificial intelligence (AI). <\/p>\n<p>This anxiety is driven by fears AI is used to <a href=\"https:\/\/factcheck.afp.com\/doc.afp.com.36XQ46M#:%7E:text=Disinformation%20operations&amp;text=Your%20browser%20can&#039;t%20play%20this%20video.,-Learn%20more&amp;text=Learn%20more-,An%20error%20occurred.,so%20do%20cybercriminals&#039;%20tactics.%22\" rel=\"nofollow noopener\" target=\"_blank\">spread misinformation<\/a> and scam people, anxiety over <a href=\"https:\/\/www.theguardian.com\/technology\/ng-interactive\/2026\/feb\/20\/ai-future-work-technology-white-collar\" rel=\"nofollow noopener\" target=\"_blank\">job losses<\/a>, and the fact AI companies are <a href=\"https:\/\/www.tglaw.com.au\/insights\/copyright-and-ai-training-in-australia-insights-from-us-cases\" rel=\"nofollow noopener\" target=\"_blank\">training their models<\/a> on others\u2019 expertise and creative works without compensation.<\/p>\n<p>AI companies have used <a href=\"https:\/\/www.abc.net.au\/news\/2025-03-28\/authors-angry-meta-trained-ai-using-pirated-books-in-libgen\/105101436\" rel=\"nofollow noopener\" target=\"_blank\">pirated books and articles<\/a>, and routinely <a href=\"https:\/\/www.wired.com\/story\/ai-bots-are-now-a-signifigant-source-of-web-traffic\/\" rel=\"nofollow noopener\" target=\"_blank\">send bots across the web<\/a> to systematically scrape content for their models to learn from. That content may come from social media platforms such as Reddit, university repositories of academic work, and authoritative publications like <a href=\"https:\/\/www.afr.com\/companies\/media-and-marketing\/ai-firms-crawling-nine-entertainment-s-news-sites-10-times-a-second-20250806-p5mkop\" rel=\"nofollow noopener\" target=\"_blank\">news outlets<\/a>.<\/p>\n<p>In the past, online scraping was subject to a kind of detente. Although scraping may sometimes have been technically illegal, it was needed to make the internet work. For instance, without scraping <a href=\"https:\/\/theconversation.com\/google-turns-25-the-search-engine-revolutionised-how-we-access-information-but-will-it-survive-ai-212367\" rel=\"nofollow noopener\" target=\"_blank\">there would be no Google<\/a>. Website owners were OK with scraping because it made their content more available, according with the vision of the \u201c<a href=\"https:\/\/theconversation.com\/news-sites-are-locking-out-the-internet-archive-to-stop-ai-crawling-is-the-open-web-closing-274968\" rel=\"nofollow noopener\" target=\"_blank\">open web<\/a>\u201d. <\/p>\n<p>Under these conditions, scraping was managed through <a href=\"https:\/\/creativecommons.org\/ai-and-the-commons\/cc-signals\/\" rel=\"nofollow noopener\" target=\"_blank\">principles<\/a> such as respect, recognition, and reciprocity. In the context of AI, those are now faltering.<\/p>\n<p>A new online landscape<\/p>\n<p>Many news outlets are now <a href=\"https:\/\/pressgazette.co.uk\/platforms\/eight-in-ten-of-worlds-biggest-news-websites-now-block-ai-training-bots\/\" rel=\"nofollow noopener\" target=\"_blank\">blocking web scrapers<\/a>. Creators are <a href=\"https:\/\/www.buzzincontent.com\/insight\/creators-raise-alarm-as-tech-giants-use-their-content-to-train-ai-9396205\" rel=\"nofollow noopener\" target=\"_blank\">choosing not to use certain platforms<\/a> or are posting less. <\/p>\n<p>Barriers are being put in place across the open web. When only some can afford to pay to access news and information, then democracy, scientific innovation and creative communities are all harmed.<\/p>\n<p>Exceptions to copyright infringement, such as <a href=\"https:\/\/theconversation.com\/explainer-what-is-fair-dealing-and-when-can-you-copy-without-permission-80745\" rel=\"nofollow noopener\" target=\"_blank\">fair dealing for research or study<\/a>, were legislated long before generative AI became publicly available. These exceptions are no longer fit for purpose in an AI age.<\/p>\n<p>The Australian government has <a href=\"https:\/\/ministers.ag.gov.au\/media-centre\/albanese-government-ensure-australia-prepared-future-copyright-challenges-emerging-ai-26-10-2025\" rel=\"nofollow noopener\" target=\"_blank\">ruled out<\/a> a new copyright exception for text and data mining. This signals a commitment to supporting Australia\u2019s creative industries, but leaves great uncertainty about how creative content can be managed legally and at scale now that AI companies are crawling the web.  <\/p>\n<p>In response, the international nonprofit Creative Commons has proposed a new voluntary framework: <a href=\"https:\/\/creativecommons.org\/ai-and-the-commons\/cc-signals\/\" rel=\"nofollow noopener\" target=\"_blank\">CC Signals<\/a>.<\/p>\n<p><a href=\"https:\/\/creativecommons.org\/share-your-work\/cclicenses\/\" rel=\"nofollow noopener\" target=\"_blank\">Creative Commons licences<\/a> allow creators to share content and specify how it can be used. All licences require credit to acknowledge the source, but various additional restrictions can be applied. Creators can ask others not to modify their work, or not to use it for commercial purposes. For example, The Conversation\u2019s articles are available for reuse under a <a href=\"https:\/\/creativecommons.org\/licenses\/by-nd\/4.0\/\" rel=\"nofollow noopener\" target=\"_blank\">CC BY-ND licence<\/a>, which means they must be credited to the source and must not be remixed, transformed, or built upon.<\/p>\n<p>            <a href=\"https:\/\/images.theconversation.com\/files\/725749\/original\/file-20260324-57-smhxtw.png?ixlib=rb-4.1.0&amp;q=45&amp;auto=format&amp;w=1000&amp;fit=clip\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" alt=\"\" src=\"https:\/\/www.newsbeep.com\/au\/wp-content\/uploads\/2026\/03\/file-20260324-57-smhxtw.png\" class=\"native-lazy\" loading=\"lazy\"  \/><\/a><\/p>\n<p>              Summary of CC licences.<br \/>\n              <a class=\"source\" href=\"https:\/\/wiki.creativecommons.org\/images\/6\/6d\/6licenses-flat.pdf\" rel=\"nofollow noopener\" target=\"_blank\">Creative Commons<\/a><\/p>\n<p>How would CC Signals work?<\/p>\n<p>The proposed CC Signals framework lets creators decide if or how they want their material to be used by machines. It aims to strike a balance between responsible AI use and not stifling innovation, and is based on the principles of consent, compensation, and credit.<\/p>\n<p>Simplistically, CC Signals work by allowing a \u201cdeclaring party\u201d \u2013 such as a news website \u2013 to attach machine-readable instructions to a body of content. These instructions specify what combinations of machine uses are permitted, and under what conditions. <\/p>\n<\/p>\n<p>CC Signals are standardised, and both humans and machines can understand them.<\/p>\n<p>This proposal arrives at a moment that closely mirrors the early days of the web, when norms around automated access (crawling and scraping) were still being worked out in practice rather than law.<\/p>\n<p>A useful historical parallel is robots.txt, a simple file web hosts use to signal which parts of a site can be accessed by the bots that crawl the web and look for content. It was never enforceable, but it became widely adopted because it provided a clear, standardised way to communicate expectations between content hosts and developers.<\/p>\n<p>CC Signals could operate in much the same spirit. But, as with any system, it has potential benefits as well as drawbacks.<\/p>\n<p>The pros<\/p>\n<p>The framework provides more nuance and flexibility than the current scrape\/don\u2019t scrape environment we\u2019re in. It offers creators more control over the use of their content.<\/p>\n<p>It also has the potential to affect how much high-quality content is available for scraping. Without access to high-quality data, AI\u2019s <a href=\"https:\/\/doi.org\/10.1080\/21670811.2023.2229883\" rel=\"nofollow noopener\" target=\"_blank\">biases are exacerbated<\/a> and <a href=\"https:\/\/theconversation.com\/what-is-model-collapse-an-expert-explains-the-rumours-about-an-impending-ai-doom-236415\" rel=\"nofollow noopener\" target=\"_blank\">make the technology less useful<\/a>.<\/p>\n<p>The framework might also benefit smaller players who don\u2019t have the bargaining power <a href=\"https:\/\/newsguild.org\/inside-ai-negotiations-at-the-new-york-times\/\" rel=\"nofollow noopener\" target=\"_blank\">to negotiate with big tech companies<\/a> but who, nonetheless, desire remuneration, credit, or visibility for their work.<\/p>\n<p>The cons<\/p>\n<p>The greatest challenge with CC Signals is likely to be a practical one \u2013 how to calculate, and then enforce, the monetary or in-kind support required by some of the signals. <\/p>\n<p>This is also a major sticking point with content industry proposals for collective licensing schemes for AI. Calculating and distributing licence fees for the thousands, if not millions, of internet works that are accessed by generative AI systems around the world is a logistical nightmare. <\/p>\n<p>Creative Commons <a href=\"https:\/\/creativecommons.org\/ai-and-the-commons\/cc-signals\/implementation\/\" rel=\"nofollow noopener\" target=\"_blank\">has said<\/a> it plans to produce best-practice guides for how to make contributions and give credit under the CC Signals. But this work is still in progress.<\/p>\n<p>Where to from here?<\/p>\n<p>Creative Commons asserts that the CC Signals framework is not so much a legal tool as an attempt to define \u201cmanners for machines\u201d. Manners is a good way to look at this. <\/p>\n<p>The legal and practical hurdles to implementing effective copyright management for AI systems are huge. But we should be open to new ideas and frameworks that foreground respect and recognition for creators without shutting down important technological developments. <\/p>\n<p>CC Signals is an imperfect framework, but it is a start. Hopefully there are more to come.<\/p>\n","protected":false},"excerpt":{"rendered":"Australians are among the most anxious in the world about artificial intelligence (AI). This anxiety is driven by&hellip;\n","protected":false},"author":2,"featured_media":568779,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20],"tags":[256,254,255,64,63,105],"class_list":{"0":"post-568778","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-artificialintelligence","11":"tag-au","12":"tag-australia","13":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/568778","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=568778"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/568778\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/568779"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=568778"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=568778"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=568778"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}