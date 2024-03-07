Webpages of the New York Times, Common Crawl, OpenAI, and Microsoft are seen on a computer. Picture: Shutterstock/Tada Images

News publishers appear torn between litigating or negotiating when it comes to AI companies using their content to train large language models (LLMs) such as ChatGPT.

Press Gazette analysis has found that more than four in ten of the 100 biggest English-language news websites have decided not to block AI bots from the likes of OpenAI and Google.

OpenAI is reportedly offering news organisations between $1m and $5m per year to license their copyrighted content to train its models.

Meanwhile Apple has reportedly been exploring AI deals with the likes of Conde Nast, NBC News and People and Daily Beast owner IAC to license their content archives, but nothing has yet been made public.

This page will be updated when new deals are struck or legal actions are launched relating to news publishers and AI companies.

Plenty of other news organisations are understood to be in negotiations with OpenAI while some, including the publisher of Mail Online, have suggested they are seriously considering their options legally.

On an earnings call with investors last month, News Corp CEO Robert Thomson indicated that his company was looking to a deal with AI companies – but nothing has been announced as yet.

He said: “It is reassuring that certain digital companies appreciate the value of integrity, quality and creativity, and while certain other media companies prefer litigation, we prefer consultation, as the former is merely creating a gold rush for lawyers. Courtship is preferable to courtrooms – we are wooing not suing. But let’s be clear, in my view those who are repurposing our content without approval are stealing.”

But not all publishers want deals: Reach chief executive Jim Mullen told investors on 5 March that the UK’s largest commercial publisher is not in any “active discussions” with AI companies and suggested other publishers should hold off on deals to allow the industry to come at the issue with a position of solidarity.

He said: “We would prefer that we don’t get into a situation where we did with the referrers ten years ago and gave them access and we became hooked on this referral traffic and we would like it to be more structured. We produce content, which is really valuable, and we would like to license or agree how they use our base intelligence to actually inform the AI and the open markets. The challenge we have as an industry is that we need to be unified.

“I used to be the chairman of the NMA and if we stay together and work with it, then that’s a really strong position that we have, particularly with the Government to help us get to there. So I’m using this as a bit of a campaign, [it] only takes one publisher to break away and start doing deals and then it sort of disintegrates.”

If you feel there is something missing that should be included, or you want to alert us to a new development, please contact charlotte.tobitt@pressgazette.co.uk.

Suing

The Intercept, Raw Story and Alter Net

Three US progressive news and politics digital outlets filed lawsuits against OpenAI on Wednesday 28 February.

The Intercept, Raw Story and Alter Net objected to the use of their articles to train ChatGPT. The Intercept also sued Microsoft, which has partnered with OpenAI to create a Bing chatbot.

Raw Story publisher Roxanne Cooper said: “Raw Story’s copyright-protected journalism is the result of significant efforts of human journalists who report the news. Rather than license that work, OpenAI taught ChatGPT to ignore journalists’ copyrights and hide its use of copyright-protected material.”

CEO and founder John Byrne added: “It is time that news organisations fight back against Big Tech’s continued attempts to monetise other people’s work.”

The New York Times

The most high-profile case against OpenAI and Microsoft from a news publisher so far, The New York Times made a surprise announcement in the days after Christmas that it would seek damages, restitution and costs as well as the destruction of all large language models (LLMs) trained on its content.

OpenAI and NYT had been in negotiations for nine months but the news organisation felt no resolution was forthcoming and decided instead to share its concerns over the use of its intellectual property publicly. The success of the lawsuit will depend on the US court’s interpretation of “fair use” in copyright law – assuming the companies don’t find their way to a settlement first.

OpenAI previously said a “high-value partnership around real-time display with attribution in ChatGPT” was on the cards with the NYT before the news organisation surprised it by launching the lawsuit.

The NYT said the two tech companies, which have a partnership centred around ChatGPT and Bing, have “reaped substantial savings by taking and using – at no cost” its content to create their models without paying for a licence. It added that the use of its content in chatbots “threatens to divert readers, including current and potential subscribers, away from The Times, thereby reducing the subscription, advertising, licensing, and affiliate revenues that fund The Times’s ability to continue producing its current level of groundbreaking journalism”.

In its response, filed on Monday 26 February, OpenAI argued: “In the real world, people do not use ChatGPT or any other OpenAI product” to substitute for a NYT subscription. “Nor could they. In the ordinary course, one cannot use ChatGPT to serve up Times articles at will.”

OpenAI accused the NYT of paying someone to hack its products and taking “tens of thousands of attempts to generate the highly anomalous results” in which verbatim paragraphs from articles were spat out by ChatGPT. “They were able to do so only by targeting and exploiting a bug (which OpenAI has committed to addressing) by using deceptive prompts that blatantly violate OpenAI’s terms of use,” it said.

“And even then, they had to feed the tool portions of the very articles they sought to elicit verbatim passages of, virtually all of which already appear on multiple public websites. Normal people do not use OpenAI’s products in this way.”

Getty Images

Getty Images began legal proceedings against Stability AI in the UK in January 2023, claiming that the AI image company “unlawfully copied and processed” millions of its copyrighted images without a licence through its text-to-image model Stable Diffusion.

In December, the High Court in London ruled that Getty’s case could go to trial after Stability AI failed to persuade a judge that two aspects of the claim – relating to training and development as well as copyright – should be struck out.

Mrs Justice Joanna Smith said Getty’s claim has a “real prospect of success” in relation to Stable Diffusion’s “image-to-image feature” which the photo agency claimed allows users to make “essentially identical copies of copyright works”.

Signing

Unknown independent publishers

A handful of unnamed independent publishers are taking part in a private programme with Google, according to Adweek, which will see them paid a five-figure annual sum to take part in a trial of a new AI platform.

The publishers are reportedly expected to produce a certain number of stories for a year and provide analytics and feedback in exchange.

Reddit

Social media platform Reddit has signed a deal allowing its content to be used by Google in the training of its AI tools. Reuters reported that the deal is worth around $60m per year.

Although not a news organisation, the Reddit deal is still a content licensing deal. There is also likely to be news media content copied within Reddit posts from users on the platform which could therefore fall within the remit of the deal.

Semafor (sort of)

Ben Smith and Justin B Smith’s start-up Semafor has secured “substantial” Microsoft sponsorship for an AI-driven news feed, although this was not built by the tech giant but by the newsroom itself.

The deal, announced in February, will see Microsoft help Semafor refine the tool and makes the digital outlet one of the first newsrooms to heavily involve ChatGPT in their workflow.

Although not a content deal as such, the agreement indicates a level of co-operation rather than acrimony.

Axel Springer

In December Politico, Business Insider, Bild and Welt owner Axel Springer agreed a partnership with OpenAI that would see its content summarised within ChatGPT around the world, including otherwise paywalled content, with links and attribution. Axel Springer’s content is permitted to be used to train OpenAI products going forward.

Axel Springer can also use OpenAI technology to continue building its own AI products.

Axel Springer CEO Mathias Döpfner said: “We are excited to have shaped this global partnership between Axel Springer and OpenAI – the first of its kind. We want to explore the opportunities of AI empowered journalism – to bring quality, societal relevance and the business model of journalism to the next level.”

American Journalism Project

In July 2023 OpenAI committed $5m to the American Journalism Project, a philanthropic organisation working to support and rebuild local news organisations, to support the expansion of its work. It also pledged up to $5m in OpenAI API credits to help participating organisations try out emerging AI technologies.

American Journalism Project chief executive Sarabeth Berman said: “To ensure local journalism remains an essential pillar of our democracy, we need to be smart about the potential powers and pitfalls of new technology. In these early days of generative AI, we have the opportunity to ensure that local news organisations, and their communities, are involved in shaping its implications. With this partnership, we aim to promote ways for AI to enhance—rather than imperil—journalism.”

Associated Press

OpenAI and Associated Press signed a deal in July 2023 that allows the AI company to license the news agency’s content archive going back to 1985 for training purposes.

The companies said they are also looking at “potential use cases for generative AI in news products and services” but did not share specifics.

Kristin Heitmann, AP senior vice president and chief revenue officer, said: “We are pleased that OpenAI recognises that fact-based, nonpartisan news content is essential to this evolving technology, and that they respect the value of our intellectual property. AP firmly supports a framework that will ensure intellectual property is protected and content creators are fairly compensated for their work.”

One professor told AP the deal could be particularly beneficial to OpenAI because it would mean they can still use a wealth of trusted content even if they lose other lawsuits and are forced to delete training data as a result, from The New York Times for example.

Shutterstock

In July 2023 Shutterstock expanded its partnership with OpenAI with a six-year agreement allowing access to a wealth of training data including images, videos, music and associated metadata.

For its part, Shutterstock gets “priority access” to new OpenAI technology and can offer DALL-E’s text-to-image capabilities directly within its platform.

Email pged@pressgazette.co.uk to point out mistakes, provide story tips or send in a letter for publication on our "Letters Page" blog