Fighting for quality news media in the digital age.

  1. Ads & Marketing
March 27, 2023updated 08 Mar 2024 11:39am

AOP warns that advertisers must stop stealing publishers’ website data

AOP objects to "unscrupulous" vendors scraping website metadata and article text without permission.

By Charlotte Tobitt

Publishers are taking a stand against the “endemic” IP theft of their data by vendors crawling their websites and using it for their own commercial gain.

The Association of Online Publishers has published an open letter to advertisers and agencies in the industry asking them to hold the vendors they work with to account and ensure the data they work with is “reliable, trustworthy and from a verified source”.

AOP managing director Richard Reeves, who signed the letter on behalf of digital publishers, told Press Gazette he is defending the right of a publisher to decide who its partners are, what data those partners are able to use and, ultimately, to determine the value for their IP themselves.

The letter said bad crawling behaviour puts the digital media space in “jeopardy”, adding: “With their competitive advantage depleting, many publishers are struggling to secure crucial ad revenue; leaving them less able to support premium content environments where advertisers can reach desirable audiences, safely.”

Publishers have long opened up their sites for crawling in a limited way so vendors can verify the delivery of adverts and ensure they are in a brand-safe placement.

However some “unscrupulous” vendors, the open letter said, now pack in unseen extra tags into authorised in-header wrappers or run bots to collect “publisher metadata and article text to build contextual audience segments for their own commercial gain” without gaining permission, whether through a licence or other agreement.

“For publishers, this action erodes and undermines their exclusive capacity to enrich user experiences and ad inventory using owned data,” the letter said. “Meanwhile, buyers face significant data validity concerns.”

Reeves told Press Gazette publishers have “no intention or desire to prohibit contextual vendors from exercising their purpose, which is the legitimate verification of ad placement” but that it is their view that going beyond that purpose for a vendor’s own commercial reasons is a breach of publisher IP.

It comes after publishers have invested a lot of time, resources and effort into developing their own first-party data strategies, in part to protect them when Google finally carries out its long-delayed plan to phase out third-party cookies by the end of 2024 – a proposal that has led to a rise in demand for contextual advertising.

Publishers have also made their own decisions about how and when to make their first-party data available to external parties – for example, in partnerships with advertisers and agencies to make their campaigns more effective.

“Publishers know first-party data has become an asset with value going forward,” Reeves said.

According to Reeves, some of the parties involved in discussions have questioned whether legal action would be the best step forward but he said the first step was to clear up some “grey areas” and have a “broader understanding across the industry of the broader principles at play”, which led to the open letter.

“Those principles are that surely if this is the publisher’s IP – we’re not denying that there is demand and increasing demand for contextual ad solutions, but what we are saying is that if those are being taken to market by a third party and that third party is reliant on the publisher as the source, then ethically under those principles, should it not be that those third parties would seek the consent or the permission through licence or other forms of agreement from the publisher in order to make that data available in their commercial offerings further upstream within the bid process?”

One of the “grey areas” that needed clearing up was the Trustworthy Accountability Group’s Brand Safety Certification, which has now been updated to define the legitimate use of publisher data using these crawler technologies.

But without the conversation around the issue, Reeves warned there is a risk that the industry will get to a point of having “unverified data, murkiness around confidence around that data and the potential risk of data not being trustworthy”.

If the issue is not tackled, Reeves said, more players will identify an opportunity to fulfil a demand for context and “before you know it, you’ve potentially got crawlers from unknown parties running rife across sites extrapolating this data… it will be harder to differentiate quality from the rubbish and it will be virtually impossible to identify whether the data they’re purporting to have at their disposal for you to engage with as an advertiser is quality.”

Reeves added that the need for publishers to protect their IP will only increase with the rise of AI chatbots like ChatGPT that scrape websites to find answers to user questions without providing sources. Publishers are currently discussing whether they should be pursuing any payment for this use of their website, while some are updating their terms and conditions to explicitly state AI should not be deployed for the purpose of training on their website.

Reeves said the two issues are “not unrelated” as “what we’re talking about is certain types of crawlers behaving in certain ways that we don’t feel is entirely ethical”.

Topics in this article : ,

Email pged@pressgazette.co.uk to point out mistakes, provide story tips or send in a letter for publication on our "Letters Page" blog

Websites in our network