ai-pocalypse Profound is a startup that promises to help companies understand how they appear in AI responses to customer queries. But one expert in the field thinks the AI analytics startup has been sucking up information on users’ AI conversations without proper consent.

Lee S Dryburgh, an expert in AI visibility for consumer health and longevity brands, told The Register that he believes Profound has been relying on data gathered via browser extensions that capture people’s conversations with chatbots.

“Users think they’re getting a free VPN or SEO widget; in reality, their most private queries — health scares, finances, identity crises — are being slurped, anonymized, and resold,” Dryburgh explained in an email. “Onavo and Jumpshot déjà vu, only worse: this time it’s your inner dialogue.”

Profound disputes that claim. “Of course we do not use any ‘stolen’ data,” a company spokesperson said in response to our inquiry. “All data we use is 100 percent opt-in by the end-user and both GDPR- and CCPA-compliant,” referring to the EU’s General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA). “This claim is absurd,” said the spokesperson.

AIO becoming the new SEO

Search engine optimization (SEO) is beginning to shift toward AI Optimization (AIO) as consumer interaction with AI models rises. A recent OpenAI paper [PDF] makes the case that workers are increasingly using ChatGPT for decision support. As a result, brand reputation concerns have spread from the web to AI models and tools. Profound promises companies visibility into how they appear in AI interactions.

According to an email from a sales representative at Profound to a potential customer, which was shared with The Register, “We have access to 150+ million real user conversations. This is primarily clickstream data where a user has opted in to be tracked and automatically shared their ChatGPT conversations.”

According to Dryburgh, Profound hasn’t disclosed its data provider, but he believes some of its data comes from Datos, which “offers access to the desktop and mobile browsing behavior for tens of millions of users across the globe, packaged into clean, easy to understand data products.”

As an example of how such data might be gathered, he points to a browser extension like Semrush AI Writer and Editor, offered by Datos’ parent company Semrush. The privacy policy, Dryburgh contends, has a carve-out that allows data aggregation, anonymization, and sharing with partners for market research. 

Dryburgh says he cannot be certain that Profound sources its clickstream data from Datos, nor which extensions might be involved, without the company’s cooperation and transparency. Another possible source could be from iOS/Android apps that function as wrappers around AI model APIs from OpenAI and the like. But he argues that browser extensions provide better usage over time and better data, and therefore are a particularly tempting source of info for marketers.

The Register asked both Datos and Semrush to clarify whether they provide data to Profound, but we’ve not heard back. But this blog post written by Semrush Enterprise claims, “Datos is a Semrush company powering both AIO and Profound.” We also asked Profound to clarify its user consent data flow and how it obtains data, but we’ve not yet heard back.

There’s other evidence that Datos is interested in chatbot data. In 2023, software developer Matt Frisbie, who created ChatGPT Assistant, a browser extension that sends prompts to ChatGPT, wrote that Datos contacted him to propose a partnership.

“The company pays various sources to collect anonymized clickstream data, which it then sells it to anyone who wants it: business intelligence analysts, hedge funds, that sort of thing,” he wrote in his post. “They wanted data sources with high data quality and at least 100,000 monthly active users.”

Frisbie said he declined that offer and others. As we reported previously, developers of popular extensions commonly get solicited to sell their extensions or enter some other commercial arrangement, often so the new owner or partner can exploit the extraordinary level of access to sensitive data that many extensions grant. 

Browser extensions can present a particularly significant privacy risk because they’re often granted permission to “Read and change all your data on websites” – as the Semrush AI Writer and Editor popup requests. Darius Belejevas, head of data broker removal service Incogni, told The Register in an email that the company’s researchers recently looked at AI Chrome extensions to understand the permissions being granted.

“Among the most common ones were ‘activeTab,’ which gives extensions temporary access to the currently active browser,” said Belejevas. “This permission was required by a staggering 93 AI Chrome extensions, mainly in the information lookup and collection, audio transcriber, and other categories. Another very sensitive permission, required by 88 extensions (mainly within the writing assistant category) is ‘scripting,’ which grants extensions the ability to inject code into websites. Seeing the number of sensitive permissions used by these extensions, it is possible that they might be used to threaten user privacy.”

Again, Profound insisted to us that its data collection is above board, as users opted in, and that it follows regulations.

But Dryburgh explained that the GDPR’s standards for consent are high, noting that it must be freely given, specific, informed, and unambiguous. The way Chrome extensions ask for broad permissions falls short of those standards, he argues. And the CCPA requires a right to opt out of data sales or sharing, an option not offered during Chrome extension installation. He expects that regulators sooner or later will move to limit clickstream data collection from AI user conversations.

Alexander Hanff, managing director of privacy consultancy Hanff & Co. AB, told The Register that Article 5 of the GDPR sets out various principles like fairness, transparency, purpose limitation, and data minimization.

If there is a lack of transparency in the way Profound obtains data, he said that would make it unlawful to process personal data at all.

“Then there is the ePrivacy Directive (PECR in the UK) which would require consent in order to access browser plugins unless it is considered as strictly necessary for the provision of a service requested by the end user,” Hanff said in an email.

“Also in PECR / ePrivacy it is prohibited to process traffic data (TCP/IP and HTTP Headers) for any purpose other than to facilitate a transmission or for the purpose of billing (but these exemptions are usually limited to telecommunication network providers (Telecoms/ISPs). As such it would seem highly likely that such activities would be problematic from a legal perspective both in the EU and the UK.”

For Dryburgh, this isn’t just a legal issue but a data quality issue for brands that are trying to assess how they appear in people’s interactions with AI models. He told The Register that he intends to explore his concerns in more detail in a future post on LinkedIn. ®