Fighting for quality news media in the digital age.

  1. Media Law
January 9, 2024

OpenAI says ‘high-value partnership’ was on cards with New York Times before lawsuit

OpenAI accuses publisher of "intentionally manipulating" prompts for its lawsuit.

By Charlotte Tobitt

OpenAI has revealed it proposed a “high-value partnership” based around showing attribution to up-to-the-minute New York Times reporting in ChatGPT before the publisher filed a lawsuit against it.

OpenAI claimed in a new statement that discussions with the news publisher “had appeared to be progressing constructively” right up until their last communication on 19 December.

The New York Times filed its copyright lawsuit against OpenAI and Microsoft on 27 December, 12 days later.

The publisher claimed its copyrighted content is “disproportionately” used in OpenAI and Microsoft’s generative AI products and that its subscription, advertising, licensing and affiliate revenues – as well as its reputation – are taking a hit as a result.

OpenAI has now revealed details of the negotiations, which it said “focused on a high-value partnership around real-time display with attribution in ChatGPT, in which The New York Times would gain a new way to connect with their existing and new readers, and our users would gain access to their reporting.

“We had explained to The New York Times that, like any single source, their content didn’t meaningfully contribute to the training of our existing models and also wouldn’t be sufficiently impactful for future training.

Their lawsuit on December 27—which we learned about by reading The New York Times—came as a surprise and disappointment to us.”

OpenAI still wants ‘constructive partnership’ with New York Times

The AI company said it considers the lawsuit to be “without merit” but that it still hopes for a “constructive partnership” with The New York Times.

As it did so, however, it accused the publisher of “intentionally” manipulating prompts to force ChatGPT into regurgitating chunks of its content and backing up its case against OpenAI.

Two major publishers signed deals with OpenAI in 2023: the Associated Press news agency and Business Insider, Politico and Die Welt publisher Axel Springer.

OpenAI also agreed two partnerships designed to bolster relationships with the news industry in the US: it committed $5m to the American Journalism Project to support its work in local news with a further $5m in OpenAI API credits so publishers can try out and deploy new AI tools, while it also gave a grant of $395,000 to New York University’s new journalism ethics initiative looking at issues including AI.

However Mail, Metro and i publisher DMG Media said in a submission to the House of Lords Communications and Digital Committee in the autumn that it was “actively seeking advice on potential legal action” because of the use of its content, especially at Mail Online “because of its uniform headline/bullet points/article text structure”, being used to test the effectiveness of AI training without any permission or payment being agreed.

OpenAI says it wants ‘healthy news ecosystem’

In its statement on Monday, OpenAI set out what it hopes to achieve when negotiating with news organisations.

“Our goals are to support a healthy news ecosystem, be a good partner, and create mutually beneficial opportunities,” it said. “With this in mind, we have pursued partnerships with news organizations to achieve these objectives:

  1. Deploy our products to benefit and support reporters and editors, by assisting with time-consuming tasks like analyzing voluminous public records and translating stories.
  2. Teach our AI models about the world by training on additional historical, non-publicly available content.
  3. Display real-time content with attribution in ChatGPT, providing new ways for news publishers to connect with readers.”

As well as the use of its content for training, The New York Times objected in its lawsuit to ChatGPT and Microsoft’s Bing search chat features memorising and producing reproductions of large portions of text.

Citing Wirecutter product reviews being fully reproduced in AI-generated chat, the lawsuit noted that the publisher “does not receive affiliate referral revenue if a user purchases the Wirecutter-recommended product through a link on Defendants’ platforms”.

On Monday, OpenAI insisted that this type of memorisation “is a rare bug that we are working to drive to zero” but it is “more common when particular content appears more than once in training data, like if pieces of it appear on lots of different public websites”.

Of The New York Times, it said: “Along the way, they had mentioned seeing some regurgitation of their content but repeatedly refused to share any examples, despite our commitment to investigate and fix any issues. We’ve demonstrated how seriously we treat this as a priority, such as in July when we took down a ChatGPT feature immediately after we learned it could reproduce real-time content in unintended ways.”

OpenAI claimed the examples of regurgitation in the New York Times lawsuit “appear to be from years-old articles that have proliferated on multiple thirdparty websites”.

It accused the publisher of fixing the results it received: “It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.”

OpenAI also continued to make its argument that training AI models using material available on the internet is fair use in copyright law.

“That being said, legal right is less important to us than being good citizens,” it added. “We have led the AI industry in providing a simple opt-out process for publishers (which The New York Times adopted in August 2023) to prevent our tools from accessing their sites.”

Reuters is believed to have been the first of the top 100 websites in the world to block OpenAI’s GPTBot on its website’s Robots.txt file, while other publishers joining them and The New York Times included CNN, Bloomberg, Axios, New York Magazine, The Atlantic, France 24 and Vox.

Guardian: Free content is not a ‘free pass’ for scraping

Guardian Media Group argued in a submission, published last month, to the Lords Communications and Digital Select Committee that the fact its content is available for free to all internet users “should not… be regarded as a free pass for third parties to scrape or copy our journalism and wider IP for commercial purposes”. Its terms of service specifically prohibits this.

It also said this would threaten an open web and the ability of news publishers to keep operating: “The absorption of audiences away from the open web to closed gen AI environments would have the effect of hollowing out the open web, in terms of both a loss of advertising and other revenues, but also by reducing the incentive of individuals, businesses and institutions from investing time, energy and resources in new quality content for the open web.

“This would directly undermine the 300-year-old founding principles of copyright protections that reward creators for the fruits of their labour.

“Which news organisation is going to publish to the open web if they know it will be extracted and served in a closed, paid-for LLM chatbot environment, without any form of remuneration?”

AI-powered search could destroy quality news publishing – DMG Media

DMG Media similarly warned of the danger from AI-generated search results that show little or no attribution and provide so much information that many users will not click through: “Quality news publishing is already seriously threatened by traditional search and social media, AI-powered search could destroy it altogether.”

Referring to upcoming legislation in the UK, the Mail publisher said: “For AI search to be commercially successful it will have to be constantly updated with fresh information, for which news sites are the obvious source. At the moment this is being done without permission or payment.

“That must end – establishing proper terms for copyright consent and payment should be a first priority for the Digital Markets Unit (DMU) once the Digital Markets Competition and Consumer (DMCC) Bill is passed.”

Ian Crosby, partner with Susman Godfrey and lead counsel for The New York Times, said in a statement in response to OpenAI’s new statement: “The blog concedes that OpenAI used The Times’s work, along with the work of many others, to build ChatGPT.

“As The Times’s complaint states: ‘Through Microsoft’s Bing Chat (recently rebranded as ‘Copilot’) and OpenAI’s ChatGPT, Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.’ That’s not fair use by any measure.”

In its own submission to the committee filed last month and published last week, OpenAI argued it would be “impossible” to train AI models without using copyrighted models because copyright law covers “virtually every sort of human expression”.

“Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens,” it said.

OpenAI claimed it complies with “all applicable laws, including copyright laws” but had nonetheless been “industry leaders in allowing creators to express their preferences with respect to the use of their works for AI training”.

It added: “While we look forward to continuing to develop additional mechanisms to empower rightsholders to opt-out of training, we are actively engaged with them to find mutually beneficial arrangements to gain access to materials that are otherwise inaccessible, and also to display content in ways that go beyond what copyright law otherwise allows.”

Email pged@pressgazette.co.uk to point out mistakes, provide story tips or send in a letter for publication on our "Letters Page" blog

Websites in our network