Vass Bednar is the managing director of the Canadian SHIELD Institute and co-author of The Big Fix.
People are reflected in a hotel window in Davos, Switzerland in 2024. An array of tech companies are using customer inputs in AI training activities.Markus Schreiber/The Associated Press
If you want to use popular digital services, chances are you’re also fuelling the AI products that these companies then sell back to you.
Paying a privacy price for online services isn’t new. In 2019, Canada found that Facebook took more of your data than the value it gave back. Today, that data extraction goes further: Instead of information being used to “target” ads, increasingly, it is used to train proprietary AI models by default.
The open web is being incrementally enclosed behind opaque AI training loops. Now, if you want your website to appear in Google’s search results, you have to allow Google’s crawler to index it. Historically, that was a pretty fair trade, as indexing brought visibility. But now, Google says that anything indexed for search can also be used to train its generative AI models. Decline the AI crawler, and your site effectively disappears from search. Publishers, authors and creators are being held hostage.
These are old-school market structure problems. Economically, this behaviour resembles tying – a practice where access to one product is conditional on accepting another. When a dominant platform can tie essential services to compulsory data extraction, it undermines competition and erodes user agency.
This extractive scenario is increasingly common. Soundcloud updated their terms last year to allow content uploaded on the site to be used as training data for AI. Zoom’s terms give the company broad rights to use “service-generated data” – such as usage patterns and behavioural metrics – to train its AI models, even if users decline to share their actual meeting content.
While Zoom insists it won’t use video, audio or chat transcripts without explicit consent, simply using the platform still allows certain data to feed into its algorithmic systems.
Similarly, Anthropic has recently announced that the consumer version of its AI assistant, Claude, will use chats and coding sessions to train models unless the user opts out. LinkedIn just updated its terms to state that it can use member data in many jurisdictions to train content-generating AI models. While you can opt out, all data collected from the years prior remain in training data sets. Reddit is licensing major deals to allow external use of its content for AI training, capitalizing on the information that Redditors freely contribute.
Just 2% of Canadian businesses got return on generative AI investments, survey shows
Canadian regulators don’t seem alive to these abuses, but other jurisdictions are. In the United States and the European Union, digital businesses have taken legal action against Google, citing its coercive collection of data from websites to generate its AI overviews.
Chegg, an online education platform, sued Google for using its website content to train its generative AI models. The lawsuit highlights that Google’s use of website content beyond indexing is “reciprocal dealing,” a form of tied selling and an unlawful leveraging of Google’s monopoly powers.
In Europe, the Independent Publishers Alliance submitted a complaint to the European Commission that Google is abusing its dominant position in online search by appropriating web content to generate its AI overviews. In Britain, the Competition and Markets Authority has opened a strategic market status investigation into Google search, a move that could compel the company to separate its search business from its AI training operations.
Alphabet CEO Sundar Pichai speaks at a Google event in California in 2024. In the U.S. and Europe, businesses have taken legal action against Google, citing its coercive collection of data to generate its AI overviews.Jeff Chiu/The Associated Press
That would be a tectonic shift, re-establishing a line between indexing for visibility and harvesting for model training.
Canada should do the same. Our privacy law already requires that consent be voluntary and appropriate. Our Competition Bureau has the authority to examine when bundling consent for multiple uses becomes anti-competitive or unconscionable. Enforcing that line would send a powerful signal: that Canada is a fair place for AI markets.
Doing so could create a powerful competitive advantage. Cloudflare’s chief executive suggested that we imagine Canada as the “Delaware of the internet” – a jurisdiction where data relationships are governed by fairness and transparency. If we required search engines to decouple from AI training, publishers worldwide might route their servers through Canada to avoid the coercive extraction of their content. It would give authors the leverage they currently lack.
The last time Canada tried to challenge Big Tech’s extraction logic through the Online News Act, Meta retaliated by blocking news entirely. But that shouldn’t stop us from acting again, this time on firmer ground: competition and consent.
Canada has the opportunity to define the terms of a more open and trustworthy digital economy. There’s no better time to reject extractive consent models from American Big Tech companies that use our participation as raw material for their profit.