Fighting for quality news media in the digital age.

  1. Tech Platforms
May 12, 2025

How Google forced publishers to accept AI scraping as price of appearing in search

Court documents reveal Google considered giving publishers AI opt-out and rejected the idea.

By Charlotte Tobitt

Google considered allowing publishers to opt out of their data being used for AI grounding and still appear in search results but described it as a “hard red line”.

New documents disclosed in the remedies portion of an antitrust trial into Google’s search monopoly in the US reveals the tech giant preferred not to give publishers the option as it was “evolving into a space for monetisation”.

A US judge ruled in August that Google has an illegal search monopoly and new documents have now been published amid a remedies trial held to decide what, if anything, should be done.

Possible remedies could include forcing Google to sell the Chrome browser and share data with competitors. The UK’s Competition and Markets Authority has since launched its own investigation of Google’s search dominance.

Slides prepared by Google director of product management Chetna Bindra in April last year ahead of the US rollout of AI Overviews (then called Search Generative Experience) show the controls Google considered offering to publishers to enable them to opt out of their data being used for various purposes.

Option number one would have been no changes to how publishers could opt out of or limit the display of their content in search. “If not satisfied, they can choose to opt out of indexing.” This option was described as “likely unstable”.

At the other end of the spectrum was the “hard red line” of introducing a separation of grounding versus training for SGE. The suggestion was that publishers could “choose to opt out of their data being used for grounding – Their content [w]on’t be used for any retrieval augmented generation [RAG].”

RAG is the process through which generative AI models retrieve and reference new information from the web in real time.

Google defines grounding as “the ability to connect model output to verifiable sources of information. If you provide models with access to specific data sources, then grounding tethers their output to these data and reduces the chances of inventing content”.

Google-Extended lets publishers opt out from allowing AI chatbot Gemini and AI development platform Vertex from scraping their content. However, Google-Extended does not stop sites from being accessed and used in Google’s AI Overviews summaries. To avoid this publishers would have to opt out of being scraped by Googlebot, which indexes for search.

Financial Times director of global public policy and platform strategy Matt Rogerson last year said this meant publishers have an “unenviable choice”.

“To opt-out of the Google Search crawler entirely, and become invisible to the 90%+ of the UK population that currently uses Google Search, or allow scraping to continue in ways that both extract value without compensation, and undermine nascent commercial licensing markets for the use of high quality IP to build and enable the AI models of the future.”

Of several options labelled “for discussion” on the Google slides, the preferred recommendation was that publishers “can use ‘no snippet’ to impose a limit on how much content is used for grounding and display,” an evolution from having the choice for “display only”.

Google search publisher controls slides as disclosed in search monopoly remedies trial. Picture: US Department of Justice
Google search publisher controls slides as disclosed in search monopoly remedies trial. Picture: US Department of Justice

This aligns with what Google told Press Gazette in November in response to Rogerson’s criticism: it advised publishers that did not want their content to appear in AI Overviews to use the NOSNIPPET meta tag and the DATA-NOSNIPPET attribute to limit visibility of specific pages or parts of pages – similar to how they could previously control whether they appeared as featured snippets at the top of results.

However, the slides produced earlier in the year said Google should “silently update” with “no public announcement”.

The slide continued: “Make it clear that no-snippet enables pubs to opt out of more than just display.

“Do not say this opts them out of training, as we don’t want to get into the details of distinction between Gemini training and SGE training…

“Recommend not saying this opts them out of grounding, as this is evolving into a space for monetisation…

“Instead lean into something closer to saying it opts them out of display that includes corroboration, and will also opt them out of having snippets shown for blue links.”

Google said in 2019 that all versions of an experiment equivalent to no-snippeting in search (“only URLs, very short fragments of headlines, and no preview images”) resulted in “substantial traffic loss to news publishers”.

“Even a moderate version of the experiment (where we showed the publication title, URL, and video thumbnails) led to a 45 percent reduction in traffic to news publishers. Our experiment demonstrated that many users turned instead to non-news sites, social media platforms, and online video sites—another unintended consequence of legislation that aims to support high-quality journalism. Searches on Google even increased as users sought alternate ways to find information.”

Separately Google told Press Gazette previously that its search scraper Googlebot is used in AI Overviews, intertwining the legacy search and AI permissions, because AI is integral to how search functions.

During the remedies trial, a vice president of product at Google revealed the company can train AI Overviews on web content even when publishers have opted out of training its AI products.

According to Bloomberg, a Department of Justice lawyer asked: “Once you take the [AI model] Gemini and put it inside the search org, the search org has the ability to train on the data that publishers had opted out of training, correct?”

Eli Collins, vice president of Google’s AI research lab Deep Mind, told the court: “Correct – for use in search.”

Topics in this article : ,

Email pged@pressgazette.co.uk to point out mistakes, provide story tips or send in a letter for publication on our "Letters Page" blog

Websites in our network