Fighting for quality news media in the digital age.

  1. News
June 3, 2021updated 30 Sep 2022 10:20am

Website duplication: Publishers need to wake up to growing ‘threat to journalism’

By Freddy Mayhew

Website duplication, in which content is lifted wholesale from a website and hosted elsewhere without permission, could be costing UK news publishers millions in lost revenue and resources.

It’s a problem that appears to be growing, but one which many news publishers may be unaware is affecting them.

While some news outfits complain about their stories being “ripped off” when they appear in rival titles without sufficient credit, website duplication takes copyright theft to a more serious level.

One national newsbrand, which Press Gazette has been asked not to name, had its entire website cloned and hosted at a web address that was just one letter different from the original website. The fake site, which Press Gazette has seen in an image, could easily have been mistaken for the real thing.

It was taken down after an intervention by NLA Media Access (formerly the Newspaper Licensing Agency), which works on behalf of UK news publishers to manage their copyright. But it’s not only national titles that are affected.

The Grocer, a B2B title for the food and drink industry, found that its original content was appearing word for word on another website called UK Food & Drink, which again stopped after the NLA intervened.

Duplication of content on The Grocer. Picture: NLA Media Access

And local website Kent Online, part of the KM Group owned by regional publisher Iliffe Media, discovered its reports were being lifted in their entirety and published on the Kent Chronicle – a site that had popped up out of nowhere. After an investigation, Kent Online found the site appeared to have been the creation of a teen tech whizz and it was taken down.

These duplicate websites use web crawlers to scrape content on legitimate websites and publish it elsewhere. Some crawlers are entirely legitimate, however, and belong to partner organisations. The main offenders are fake news sites, clone sites or “aggregated content farms”.

Last year NLA Media Access took down about 20,000 articles from more than 700 sources on behalf of UK news publishers. Around half of all cases are the result of general ignorance about copyright law, while the other half are those who “know what they’re doing”, according to Matt Aspinall, head of publisher services at NLA Media Access.

The NLA has been using a third-party online article tracking system since 2015 to actively search for instances of web duplication. It brought the technology in-house last year and built its own Text Tracker service.

Aspinall said publishers “often don’t know the scale of [website duplication] themselves”, adding: “They don’t know that they’re losing those precious eyeballs on their articles.” He gave the example of one national title that was unaware their articles had been “ripped”.

Aspinall said the NLA started work with a big regional group in April and by the end of the month had found 500 articles that had been lifted from its websites and republished across 15 “illegal sites”.

While 90% of the content the NLA finds and takes down comes from non-paywalled news websites, Aspinall said: “I don’t think publishers that have subscription models should be under any illusions that their content is safe because they have a paywall.”

In 2019 the NLA found a website “as blatant as ‘behind the paywall’ dot something or other and it was entire articles from leading paywall brands, which obviously charge a lot of money for readers to access”, he said.

Aspinall said that when the NLA started out their main task was targeting content farms, which aggregate content without permission or payment, but increasingly it is seeing more duplicate websites that the “unsuspecting reader… would believe is a legitimate news site, when in actual fact it has content that was ripped from a legitimate news site”.

He said he suspected publishers “don’t know the scale of how that’s happening, because they’re not proactively tracking it right now”, but only reacting to issues as they arise.

The News Media Association said: “Fake news sites which invest nothing in real journalism yet lift and seek to monetise the content of genuine news media outlets have no place in our democratic society. They cynically undermine the efforts of real journalists who are working to keep the public informed during a time of national crisis.

“Sadly, in recent months we have seen several examples of these websites targeting genuine local news websites in a systematic way. They pose a direct threat to journalism and should be stopped.”

Richard Reeves, managing director at the Association of Online Publishers, estimates the financial impact on the news industry is considerable.

He said: “I don’t suppose anyone’s ever done a real investigation… but if in terms of driving traffic off to another destination that isn’t the legitimate site, and that traffic not being able to be fulfilled in terms of monetization, and the impact that has, it’s not a conservative figure – it will be in the millions, potentially in the billions.”

Not only does website duplication cost a publisher advertising and subscription revenue, but “the fraudulent site is still putting strain on the original, the legitimate, host server”, said Reeves, because rather than building something new it is merely leeching off the original website. This leaves publishers footing additional resource costs to support the strain, often while they are unaware of the cause.

“If there’s 100,000 people at one given moment looking at a homepage, the publisher’s server says we need to optimise performance because we’ve got about 100,000 people on,” explains Reeves. But when a site is being duplicated he said the “performance is working more akin to having 400,000 people on the homepage”.

“Publishers are impacted definitely in resource, cost, traffic, legitimacy, speed, performance, rankings,” said Reeves, who has called for more awareness of the threat of website duplication on the news industry.

“But more than anything, there is a bigger motivation for all of us to address this because it’s what continues to fuel the ability for our data to be copied and cloned and, and to be reapplied in fraudulent ways.”

Email pged@pressgazette.co.uk to point out mistakes, provide story tips or send in a letter for publication on our "Letters Page" blog

Websites in our network