OpenAI asks New York Times to disclose reporters’ notes in ‘vindictive’ legal move

New York Times warns of "chilling effect" on reporting if OpenAI forces disclosure of journalistic materials.

Webpages of the New York Times, Common Crawl, OpenAI, and Microsoft are seen on a computer. Picture: Shutterstock/Tada Images

The New York Times has described an attempt by OpenAI to see its journalists’ confidential notes as “harassment and retaliation” for its decision to sue the tech company.

OpenAI, the creator of ChatGPT, has asked a New York judge to force the NYT to hand over “underlying reporter’s notes, interview memos, records of materials cited, or other ‘files’” to prove its work can be classed as original works of authorship under US copyright law.

The New York Times said in response: “Permitting OpenAI to investigate The Times’s privileged newsgathering process would have serious negative and far-reaching consequences.

“It would entail the disclosure of The Times’s confidential reporters’ files on investigative reporting into highly sensitive matters, including those related to the defendants themselves.”

The NYT filed a lawsuit against OpenAI and its partner Microsoft in December after months of negotiations on a deal fell short, arguing the use of its content for the training of large language models (LLMs) like ChatGPT was “free-riding” on its own investment in journalism.

A filing in the case published this month revealed OpenAI continues to seek “critical discovery regarding the creation, registration, and ownership of the copyrighted works” put at issue by the NYT.

“The Times can only assert infringement over those portions of the works that are (a) original to the author, and (b) owned or exclusively licensed to the Times,” it said.

In other words, according to OpenAI, the NYT should not be allowed to bring its case in relation to any of its reporting in which it “copied another’s work” or used “elements in the public domain”.

The tech company is therefore trying to get access to “documents sufficient to show each and every written work that informed the preparation of each of Your Asserted Works, regardless of its length, format, or medium”.

The NYT claims that millions of its articles were crawled by OpenAI and for the case it filed pages and pages of documents linking to thousands of stories dating back to the 1950s.

OpenAI’s case for discovery also attempts to question whether the NYT does, as it says, invest “an enormous amount of time… expertise, and talent” in its journalism including through “deep investigations – which usually take months and sometimes years to report and produce – into complex and important areas of public interest”.

OpenAI argued that reporters’ privilege under the First Amendment “does not justify withholding the materials at issue here because they (i) are of likely relevance to a significant issue in the case—whether the Times is asserting copyright protection over works or portions thereof in which it does not have a copyright—and (ii) are not reasonably obtainable from other available sources”.

But it emphasised that it is not seeking information that would identify confidential sources.

In response, the NYT described OpenAI’s request as “unprecedented” and “invasive” and said it is “far outside the scope of what’s allowed under the Federal Rules and serves no purpose other than harassment and retaliation for The Times’s decision to file this lawsuit”.

It said its “newsgathering process on a story-by-story basis has no relevance to whether it is entitled to enforce the millions of copyrights it has registered over the years. OpenAI claims that the reporters’ notes underlying the asserted works may shed light on whether The Times’s news articles are really original, expressive content—but that is not how copyright law works.”

The publisher said US copyright law protects the “manner of expression” in a work “including structure, word choice, and ‘the author’s analysis or interpretation of events’.

“Moreover, even in the improbable case that a reporter’s notes show that 90% of an article comprises verbatim quotes from the author’s original sources, that article would still be protected by copyright.”

The NYT called the request “overbroad and unduly burdensome” and added: “OpenAI is not entitled to unbounded discovery into nearly 100 years of underlying reporters’ files, on the off chance that such a frolic might conceivably raise a doubt about the validity of The Times’s registered copyrights.”

The publisher said it “makes little difference” that OpenAI claims not to be seeking the identities of confidential sources, adding: “The burden of separating information from which identities of confidential sources could be derived from the requested materials would be enormous, if it could even be done.

“In any event, reporters’ source files as a whole are protected by the reporters’ privilege, regardless of whether such files reveal the identities of confidential sources.”

The NYT suggested OpenAI is hoping to have a “chilling effect” on its reporting and “weaponise discovery as a means of targeting the confidential and irrelevant information that underpins The Times’s reporting”.

It said the tech company has not addressed “the chilling effect that such massive discovery requests would have on a news organization’s reporting—and its ability to bring lawsuits to defend its copyrighted works. Indeed, given the wildly improper scope of this request, one has to wonder if a chilling effect is exactly what OpenAI, who appears to have stolen from millions of content creators, is hoping for.

The NYT has already agreed to produce the certificates of copyright registration for its work and financial documents proving its investment into journalism.