News

September 4, 2023updated 23 Nov 2023 9:33am

Generative AI and journalism updates: Guardian joins publishers blocking ChatGPT from trawling their content

Updates on publishers and their use of, and deals with, generative AI companies.

A picture of the ChatGPT login page on a computer screen and, in front of it, the chatbot itself on a phone screen. The picture illustrates a story about publisher and industry responses to the UK government's proposal to create an opt-out copyright regime in relation to AI data mining. — ChatGPT login and chatbot. Picture: Shutterstock

Since ChatGPT launched in November 2022 there has been a flurry of activity around generative AI and journalism.

First, a period of quiet experimentation and the writing of guidelines. Now, many are slightly more vocal on their plans for the new technology – whether they are planning to incorporate it into editorial workflows, like Gizmodo, or not, like The Guardian.

And, most recently, news and picture publishers are beginning to make licensing deals with AI companies while others continue to ponder whether to litigate or negotiate over the use of their content to train generative AI model.

In August, ChatGPT owner OpenAI gave website publishers the ability for the first time to opt out of their content being used to train AI tools. Some jumped at the chance to block the tech, but others are wary of creating a barrier that could halt any chance of them being paid for the value their content creates for companies like OpenAI.

Here we have created a round-up of the latest deals, legal threats, guidelines, and uses of generative AI in journalism.

More Press Gazette coverage on generative AI and journalism (from most recent):

Publishers mixed over blocking ChatGPT

At the start of September The Guardian became one of the first major publishers to publicly announce it has blocked ChatGPT from trawling its content.

OpenAI first made this option available in August although publishers cannot remove material that has already been scraped to train ChatGPT.

A spokesperson for Guardian News and Media said: “The scraping of intellectual property from the Guardian’s website for commercial purposes is, and has always been, contrary to our terms of service. The Guardian’s commercial licensing team has many mutually beneficial commercial relationships with developers around the world, and looks forward to building further such relationships in the future.”

Guardian reporter Rob Davies said he was “pleased about this, given that ChatGPT made up a fake article by me and put my byline on it. Always hungry for bylines but there are limits.”

ChatGPT has been found to have referenced articles supposedly by The Guardian, with bylines from named journalists, in response to prompts made by members of the public. The errors were discovered after those researchers and students got in touch with the publisher which, after trawling the archives, found the stories never existed.

A number of other websites and publishers that have chosen to block OpenAI from using their content have been revealed by Originality.AI, an AI detector for publishers.

They include: The New York Times and sister title The Athletic, CNN, Bloomberg, Insider, The Verge, PC Mag, Vulture, Mashable, Times of India, New York Magazine, The Atlantic, Bustle, Vox, Lonely Planet, Hello!, Axios, France 24, and the New York Daily News.

However Sajeeda Merali, chief executive of the Professional Publishers Association which represents specialist publishers, told Press Gazette there are potential downsides to publishers from making this decision.

“[If] ChatGPT is to continue to grow and become an entry point for digital information in the same way that Google search is at the moment then opt-out isn’t really a viable option.

“What we don’t want to do is create barriers to negotiating the right terms with ChatGPT and we certainly don’t want them to be able to say that ultimately publishers can choose to do what they want.”

New York Times legal threat against OpenAI

After weeks of contentious licensing discussions between The New York Times and OpenAI, the news publisher is now reportedly considering legal action.

The publisher believes it should be paid for the use of its original reporting in the training of OpenAI’s tools.

Of the potential outcome of a court case, NPR explained that “if a federal judge finds that OpenAI illegally copied the Times’ articles to train its AI model, the court could order the company to destroy ChatGPT’s dataset, forcing the company to recreate it using only work that it is authorized to use”.

The New York Times has at the same time decided not to join a coalition of other publishers, likely to also include Axel Springer and News Corp, who want to benefit from jointly negotiating with AI companies about potential compensation for the use of their content.

Deals between generative AI companies and publishers

The Associated Press and Shutterstock are, to date, the only major news companies to have agreed deals with OpenAI, the creator of ChatGPT.

The Associated Press has signed a two-year deal that will let OpenAI train its generative AI tools on the news agency’s historical content.

AP said the deal showed OpenAI respected the value of their intellectual property. A number of other news publishers are said to be in discussions about similarly receiving payment for the use of their content in the training of AI tools.

The deal, which follows a similar arrangement between OpenAI and picture agency Shutterstock (see below), allows OpenAI to license part of AP’s text archive while the agency can “leverage OpenAI’s technology and product expertise”.

They said they would together look at potential uses for generative AI in news products and services. However AP made clear it is not using generative AI in its news stories.

Kristin Heitmann, AP senior vice president and chief revenue officer, said: “Generative AI is a fast-moving space with tremendous implications for the news industry. We are pleased that OpenAI recognizes that fact-based, nonpartisan news content is essential to this evolving technology, and that they respect the value of our intellectual property.

“AP firmly supports a framework that will ensure intellectual property is protected and content creators are fairly compensated for their work. News organizations must have a seat at the table to ensure this happens, so that newsrooms large and small can leverage this technology to benefit journalism.”

Days earlier, photo agency Shutterstock signed a new six-year agreement with OpenAI, allowing the ChatGPT creator to use its data for training.

The deal gives OpenAI access to Shutterstock’s image, video and music libraries and associated metadata as training data, while Shutterstock gets priority access to the latest OpenAI technology and will continue to incorporate DALL-E’s text-to-image generator tool in its website as well as add the ability for customers to edit and enhance any picture in its library.

Will other deals be struck between ChatGPT and publishers?

Some of the world’s biggest news publishers are in discussions over potential licensing payments for the use of their content by AI companies.

News Corp, Axel Springer, The New York Times and The Guardian have all met at least one of OpenAI, Google, Microsoft and Adobe, the Financial Times reported in June.

Chris Moran, head of editorial innovation at The Guardian, noted in an interview with Press Gazette that the newsbrand has “operated an open platform for a number of years, an API, which people can absolutely pay for to get a stream of our content in a legitimate way to build a business on. But none of these companies have been to us to ask to use that and that in itself is interesting. So, yes, we are thinking about what the value of our content is in this environment.”

News Corp CEO Robert Thomson has also called for compensation to publishers for their content being harvested and scraped to train AI engines, as well as for the information snippets in AI-powered search results that “contain all the effort and insights of great journalism but designed so the reader will never visit a journalism website”.

He said he wants as many media companies as possible to derive value and that it should not be solely a sweetheart deal for the biggest publishers.

Round-up of generative AI in journalism guidelines/principles

Many major publishers have published versions of their guidelines or principles about how they will think about the potential uses of generative AI in their newsrooms.

Common themes, as assessed by two academics, include oversight – making sure humans have the final say on all content, transparency – so that any uses of AI are clearly labelled to readers, and citing what the strategic intentions of any use would be, including boosting efficiency so that journalists can spend more time on original work.

Aftonbladet
Associated Press – although it has a licensing deal with OpenAI, it says that while “staff may experiment with ChatGPT with caution, they do not use it to create publishable content”.
BBC
CBC News
CNET – clarifying its policy in June to say it would not create full stories using AI using its in-house tool Responsible AI Machine Partner following errors being discovered in more than half of its earlier, quietly-published AI-written stories.
Financial Times
The Guardian
Insider
Mediahuis
Reuters
Swedish Radio
The Telegraph (via Press Gazette report)
William Reed
Wired

Latest generative AI experiments and launches in news publishing

Newsquest and Gannett

UK regional publisher Newsquest hired its first “AI-powered” reporter in June with the remit to expand the use of AI tools, including to create local content.

In the advert for another “AI-assisted” reporter role in August, the publisher said the journalist, whose work would focus on its newsrooms around Oxford, would be “at the forefront of a new era in journalism, utilising AI technology to create national, local, and hyper-local content for our news brands, while also applying their traditional journalism skills”.

In the US, Newsquest’s parent company Gannett paused the use of a generative AI tool named LedeAI to write high school sports reports after a number of “major flubs”, as CNN described them, in articles published by a number of its local titles.

The articles had been criticised for being repetitive, “lacking key details”, and sounding like they were written by a computer that had no knowledge of sports. CNN pointed to one example, which read: “The Worthington Christian [[WINNING_TEAM_MASCOT]] defeated the Westerville North [[LOSING_TEAM_MASCOT]] 2-1 in an Ohio boys soccer game on Saturday.” That article has since been amended.

A Gannett spokesperson said: “In addition to adding hundreds of reporting jobs across the country, we are experimenting with automation and AI to build tools for our journalists and add content for our readers. We are continually evaluating vendors as we refine processes to ensure all the news and information we provide meets the highest journalistic standards.”

Axel Springer

Axel Springer has created a global generative AI team drawn from editorial, product, tech and business who will lead its efforts on identifying and leveraging the potential uses of AI.

It is one of several developments at the publisher since mid-June signalling its appetite for moving quickly in this area. It has also created news plugins for ChatGPT in the OpenAI store for German news media brands Bild and Welt, meaning users can ask questions to find out the latest news if they have a ChatGPT Plus subscription.

Bild’s deputy editor-in-chief Timo Lokoschat said: “Artificial intelligence not only supports journalistic work, but also creates new access for our readers…”

Axel Springer also made headlines when it announced plans to make about 200 people redundant at Bild in a digital-only transition that would involve an “AI offensive”.

Staff were told in an email that this would mean “the functions of editorial managers, page editors, proofreaders, secretaries and photo editors will no longer exist as they do today”.

Chief information officer Samir Fadlallah subsequently told Reuters the changes would have positives for journalists: “For newsrooms, AI opens up new paths and freedoms. Journalists can outsource tedious work to AI and devote more time and energy to their core tasks… We see great potential in generative AI to provide our readers and users with even more attractive and individually tailored products.”

Gizmodo

Technology website Gizmodo published an inaccurate AI-generated article about Star Wars and staff were vocal in their displeasure.

The article, bylined Gizmodo Bot, was headlined: “A Chronological List of Star Wars Movies & TV Shows.” However the list was not in chronological order. It has since been amended.

Gizmodo deputy editor James Whitbrook explained on Twitter that the article had been published by someone outside of editorial, adding: “No one employed at [entertainment vertical] io9 or Gizmodo looked at or interacted with the piece at any point of its creation prior to or after its release.”

Gizmodo Media Group Union claimed that the AI rollout had been “pushed by” the company’s chief executive, editorial director and deputy editorial director.

Whitbrook wrote that the article “rejects the very standards this team holds itself to on a daily basis as critics and as reporters.

“It is shoddily written, it is riddled with basic errors: in closing the comments section off, it denies our readers, the lifeblood of this network, the chance to publicly hold us accountable, and to call this work exactly what it is: embarrassing, unpublishable, disrespectful of both the audience and the people who work here, and a blow to our authority and integrity.

“It is shameful that this work has been put to our audience and to our peers in the industry as a window to G/O’s future, and it is shameful that we as a team have had to spend an egregious amount of time away from our actual work to make it clear to you the unacceptable errors made in publishing this piece.”

Man of Many

Australian men’s lifestyle website Man of Many has launched an AI chatbot called Ask MoM, developed with Chatling.ai.

The publisher said the chatbot, which will only deliver information that has been published on the Man of Many website, was “designed to offer personalised, instant responses to user queries, reducing search fatigue and significantly improving reader engagement”.

Man of Many also claimed to have been the first major publisher to have launched a ChatGPT plugin in the OpenAI store, having gone live ahead of Axel Springer.

Users need a ChatGPT Plus subscription to access it, but can interact with Man of Many content in a conversational way, asking questions on topics like products, culture and style.

The publisher said the tool is also designed to drive traffic back to the site by offering previews of content and suggesting users click through for the full thing.

Reuters

International news agency Reuters has added AI-powered discoverability features to its video content on its content marketplace Reuters Connect.

It said AI would apply automated transcripts, translation and identification of public figures to Reuters video content and make it easier for clients to find what they need.

Reuters said its applied innovation team built the features by “finely tuning machine learning models to consistently produce the highest quality automated analysis of Reuters video”.

Shutterstock

Ahead of its deal with OpenAI, Shutterstock announced it would fully indemnify its Enterprise customers for the license and use of generative AI images on its platform.

It said: “Shutterstock will fulfill indemnification requests on demand via human review with the intent to protect its customers against potential claims related to their use of generative AI images created and licensed on shutterstock.com.”

General counsel John Lapham said: “We’re at an inflection point in the use of generative AI technology as business professionals are seeking more assurance around their rights to legally use AI-generated content, and creators of original content want to ensure their work is fairly licensed for use.

“We have always sought to manage risk for our customers and are uniquely positioned to bring a commercially viable image generator to market and indemnify its outputs, because of our relationship with artists and intimate understanding of the complexities of licensing.”

Taboola

Taboola has made its generative AI capabilities, which include creating copy and content for ads, available to all its advertisers who run campaigns in English.

The AI-powered content recommendation engine said it has carried out a successful beta test in which some brands “more than doubled the click through rate for their campaigns when measured against evergreen campaigns – driving more customers, improving efficiency and refining their long-term advertising strategy based on Taboola’s AI-driven suggestions”.

The platform said the tech would help advertisers produce original titles, headlines and images leveraged on best practice as it would learn from previous successful campaigns.

Taboola’s chief executive Adam Singolda said it had become clear “generative AI is an important next step for every advertiser”.

Buzzfeed

In March, Buzzfeed used generative AI to create 44 travel guides which were widely derided as bland and formulaic. Futurism described the articles as “comically bland”. They appear here under the headline: “As told to Buzzy the Robot“. Buzzfeed uses ChatGPT technology under licence to power personalised quizzes.

Future

Future is using OpenAI technology to power a chatbot which answers questions about content on the technology site Tom’s Hardware.

Reach

Reach is running a beta experiment in My News Assistant, an AI-powered website that aggregates the thousands of articles published daily across Reach’s more than 80 online brands.

Aftonbladet

Sweden’s largest daily newspaper Aftonbladet has tested news stories “rapped” by an AI service at the suggestion of a youth panel.

Despite what he called the “rather cringey” result, Aftonbladet’s deputy editor-in-chief Martin Schori said the paper hoped to “provoke a debate” about creating news content that appeals to younger audiences.