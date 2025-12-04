ChatGPT. Picture: Shutterstock

A US judge has ordered OpenAI to share 20 million anonymised ChatGPT user logs with news publishers who are suing for breach of their copyright.

OpenAI is currently removing anything that can identify its users from 20 million output logs (out of the tens of billions it has stored in total) and has been ordered to hand these over to the news publishers within seven days of completing that process.

Lawyers for the news publishers will then be able to analyse the 20 million conversations looking for responses that reproduce their copyrighted work in whole or part.

The New York Times was the first major news publisher to sue OpenAI (and its partner Microsoft) over the alleged crawling of millions of its articles to train ChatGPT, which it has argued can repeat large amounts of that material almost verbatim.

Since then several other publisher lawsuits have been joined to the NYT case, including 17 publications owned or managed by Alden Global Capital subsidiaries MediaNews Group or Tribune Publishing such as The New York Daily News, Chicago Tribune, Boston Herald, Los Angeles Daily News and San Diego Union-Tribune.

Also part of the grouped case are The Intercept, The Center for Investigative Reporting, which produces Mother Jones and Reveal, and Mashable publisher Ziff Davis.

This year the two sides have been battling over disclosure of ChatGPT user queries, of which the news publishers originally asked for 120 million.

In November OpenAI filed a motion for reconsideration over an earlier order directing it to hand over the 20 million logs.

US Magistrate Judge Ona T. Wang found this week that producing the ChatGPT records would be “both relevant and proportional” to the case.

OpenAI contended that the news publishers had conceded “at least 99.99% of the conversation logs are irrelevant” but Magistrate Judge Wang said this was untrue.

She said that even if many of the ChatGPT logs do not reproduce the publisher content, they “may still be relevant to OpenAI’s fair use defence”.

Magistrate Judge Wang explained that the logs are “clearly relevant to News Plaintiffs’ output claims to the extent that they contain partial or whole reproductions of News Plaintiffs’ copyrighted works, and to OpenAI’s affirmative defences to the extent that they contain other user activity – and News Plaintiffs are entitled to discovery on both”.

OpenAI had also argued the time it would take to de-identify the sample of 20 million would not be proportional, but Judge Wang pointed out it has already almost completed this process and the burden is therefore “minimal”.

She also said: “The Court recognises that the privacy considerations of OpenAI’s users are sincere. However, such considerations are only one factor in the proportionality analysis, and cannot predominate where there is clear relevance and minimal burden.”

As another layer of protecting ChatGPT users’ privacy, the logs will be for “attorneys’ eyes only”.

Frank Pine, executive editor of MediaNews Group and Tribune Publishing, said in response to this week’s ruling: “OpenAI’s leadership was hallucinating when they thought they could get away with withholding evidence about how their business model relies on stealing from hardworking journalists, and we look forward to holding them accountable for their ongoing misappropriation of our work.

“They should pay for the copyright-protected work they use to build and maintain their apps and products, and they know it.”

Steven Lieberman, lawyer for MediaNews Group and Tribune Publishing, added: “In her ruling, Magistrate Judge Wang found that OpenAI has been withholding critically important evidence that was first requested by the News plaintiffs in May 2024 and rejected multiple arguments made by OpenAI as contrary to the record – including OpenAI’s arguments that its users’ privacy is at risk.

“The Court also raised the issue of whether OpenAI’s efforts to delay production of the ChatGPT logs was motivated by an improper purpose, saying of the two possible explanations for OpenAI’s behaviour: [n]either bode well for OpenAI.”

Lieberman was referring to the magistrate judge’s note that OpenAI had already begun de-identifying the 20 million sample ,even though it opposed handing the information over.

“Either OpenAI initially intended to produce the 20 million logs to News Plaintiffs and changed its mind, for one reason or another; or OpenAI never intended to produce the logs and de-identified the entire 20 million either as a discovery tactic, or for some other reason that has not been identified. Neither bode well for OpenAI,” Magistrate Judge Wang said.

OpenAI chief strategy officer Jason Kwon said last month the attempt to access its chatbot conversations was an “overreach on user privacy” and called for a “a new form of privilege – AI privilege – given some of the kinds of conversations people are having with these tools today”.

OpenAI maintains that the use of publicly available publisher content to train models like ChatGPT is fair use.

