Two news outlets lose copyright claim against OpenAI over scraping of content

AlterNet and Raw Story have the opportunity to replead their case.

A close up view of the ChatGPT app updating on an iPhone screen, illustrating a story about increasing referral traffic from AI platforms ChatGPT and Perplexity to news sites — A close up view of the ChatGPT app updating on an iPhone screen. Picture: PixieMe/Shutterstock

One of the first copyright cases to be brought by news publishers against OpenAI has been dismissed by a judge.

Left-leaning news outlets AlterNet and Raw Story objected to the use of “thousands” of pieces of their content being used by OpenAI to train ChatGPT.

They alleged that their copyrighted work was “caught in a “scrape of most of the internet” to train ChatGPT and stripped of their author, title and copyright information” and sought damages from OpenAI.

On Thursday a judge in New York granted a request by OpenAI to dismiss the complaint in its entirety.

US District Judge Colleen McMahon said: “Plaintiffs allege that ChatGPT has been trained on ‘a scrape of most of the internet’, which includes massive amounts of information from innumerable sources on almost any given subject. Plaintiffs have nowhere alleged that the information in their articles is copyrighted, nor could they do so.

“When a user inputs a question into ChatGPT, ChatGPT synthesises the relevant information in its repository into an answer. Given the quantity of information contained in the repository, the likelihood that ChatGPT would output plagiarised content from one of Plaintiffs’ articles seems remote.

“And while Plaintiffs provide third-party statistics indicating that an earlier version of ChatGPT generated responses containing significant amounts of plagiarised content, Plaintiffs have not plausibly alleged that there is a ‘substantial risk’ that the current version of ChatGPT will generate a response plagiarising one of plaintiffs’ articles.”

The publishers, who have had the same owner since AlterNet was acquired by Raw Story Media in 2018, centred the case on the removal of the copyright management information (CMI) from their works, saying this meant ChatGPT would not have learned to communicate that information when fashioning responses to inquiries from users.

They claimed the removal of the CMI was a violation of Section 1202(b)(i) of the Digital Millennium Copyright Act (the DMCA) and they were therefore entitled to damages.

However the judge said they did not meet the threshold of Article III standing – which in US law requires concrete injury even in the context of a statutory violation.

The publishers attempted to address this, saying “the unlawful removal of CMI from a copyrighted work is a concrete injury”.

Judge McMahon said: “Plaintiffs allege that their copyrighted works (absent CMI) were used to train an AI-software program and remain in ChatGPT’s repository of text. But Plaintiffs have not alleged any actual adverse effects stemming from this alleged DMCA violation.”

And she cited a previous case that concluded: “No concrete harm, no standing.”

The publishers also sought an injunction requiring OpenAI to remove all copies of their “copyrighted works from which author, title, copyright, and terms of use information w[ere] removed from their training sets and any other repositories”.

They argued that they are entitled to such an injunction because “whether ChatGPT has or has not already reproduced their copyrighted work without attaching the required CMI, there is a substantial risk that ChatGPT will do so in the future”.

OpenAI argued the publishers failed to “allege facts tending to show that the risk of ChatGPT reproducing Plaintiffs’ work, in whole or in part, absent the requisite CMI is ‘substantial'” and the judge agreed.

“Let us be clear about what is really at stake here,” she wrote. “The alleged injury for which Plaintiffs truly seek redress is not the exclusion of CMI from Defendants’ training sets, but rather Defendants’ use of Plaintiffs’ articles to develop ChatGPT without compensation to Plaintiffs…

“Whether or not that type of injury satisfies the injury-in-fact requirement, it is not the type of harm that has been ‘elevated’ by Section 1202(b)(i) of the DMCA… Whether there is another statute or legal theory that does elevate this type of harm remains to be seen. But that question is not before the Court today.”

Judge McMahon said she was “sceptical” about the publishers’ ability to allege a perceptible injury caused by OpenAI, but said she was “prepared to consider an amended pleading”.

“We’re confident that we can address the court’s concerns in an amended complaint,” Matt Topic, a partner at Loevy & Loevy, the firm representing Raw Story Media, told Wired.

Explaining in February why the case had been launched, Raw Story publisher Roxanne Cooper said: “Raw Story’s copyright-protected journalism is the result of significant efforts of human journalists who report the news. Rather than license that work, OpenAI taught ChatGPT to ignore journalists’ copyrights and hide its use of copyright-protected material.”

The New York Times was the first news publisher to launch legal action against OpenAI, alongside its partner Microsoft, in December last year. The case has since been combined with those of eight Alden Global Capital-owned publications including The New York Daily News and is currently at the stage of the publishers searching the AI company’s training database in secure conditions.