Fighting for quality news media in the digital age.

  1. Data
March 24, 2021updated 30 Sep 2022 10:03am

Reading news is not private: How news media websites help Big Tech to track you

By Nicu Calcea

Our investigation into website tracking software has found that news websites have, on average, 17 third party reader tracking tools – and Google trackers are on nearly all of them.

“Big Tech companies insist spying on users, government is inadvertent,” reads an August 2020 headline on The Washington Times website, a conservative daily covering the DC area as well as US national politics.

It is just one of many similar articles published in recent years in which the Times takes a critical stance against the way Big Tech giants like Google, Facebook and Microsoft have reportedly mishandled their users’ data and invaded their privacy.

Despite the Times’s criticism of Big Tech, hidden in its privacy policy is an admission that it too may share data about its visitors with third parties, specifically mentioning Google and LiveRamp as its partners.

What it doesn’t say is that the number of companies with access to records of its visitors is much higher than readers might expect.

A Press Gazette analysis reveals that The Washington Times shared data with at least 57 advertising companies, including Facebook, Twitter, Google, Microsoft, Adobe and Oracle.

While the Times is one of the worst offenders when it comes to how extensive their data sharing is, it’s hardly the only media institution to allow advertising trackers on their website.

We analysed 3,838 of the world’s most popular news websites and found that the typical website includes tracking cookies or code from an average of 17 companies.

[Sign up here for Press Gazette’s fortnightly email newsletter Marketing Matters]

The British local paper Ipswich Star tops our ranking, with trackers from 82 distinct companies found on its online pages, followed by the LGBT+ news website PinkNews with 76 and the Dominican newspaper El Nacional with 67.

We found that Press Gazette had trackers from 22 different companies, although further investigation found that most of these are defunct and we are in the process of removing them.

“It is standard practice to track cookies for advertising, e-commerce and other optimisation purposes,” said a spokesperson from PinkNews, one of the websites with the highest number of trackers in our analysis.

Besides that, the company claims that some third-party code snippets we found on their pages are “ad tags” that don’t necessarily collect personal data, although they did not respond to a request to provide examples of these.

Most publishers provide some information about what data they collect, how it is processed, how to opt-out and, in the case of PinkNews, some of the trackers they use, on their privacy policy page.

Future Plc’s technology website TechRadar doesn’t name the services it uses, though it does link to a generic privacy policy page.

“Advertising is one of the revenue streams that helps us deliver content to our readers,” a representative of the company told us.

“We comply with industry standards, privacy regulations, and are fully transparent with readers around the use of cookies on our site.

“We always strive to optimise the partners we work with to ensure we balance our commercial needs whilst providing our visitors with an excellent user experience.”

We have also reached out to Ipswich Star parent Archant, but we have not received a comment from the company.

Why do news websites track you?

There are many legitimate reasons for websites to include third-party tracking codes. Services such as Google Analytics help website owners get a better understanding of who their audiences are, what content they’re reading and how well the website is performing. Facebook’s “Pixel” lets companies see how well their ads perform and allows them to target people who’ve visited the website before.

Other services help publishers increase their revenue by providing targeted advertising. Outbrain (present on 25.7% of news websites) and Taboola (17.9%), for example, provide the infamous chum-box ads at the end of articles that present themselves to readers as “relevant content”.

Some other services, like AddThis, allow website owners to easily add social media sharing buttons to their articles while Disqus lets them add commenting functionality without having to build it themselves.

But while many of these services are free to use, they do come at a price.

Both AddThis, owned by Oracle, and Disqus, owned by Zeta Global, secretly inject additional trackers from companies like Google, Adobe and Amazon, as well as trackers from potentially unsecured sources.

These companies usually claim that the data they collect is anonymous, but the virtual profiles they create with it can be used to identify real individuals. Companies can also suffer from security breaches, which may leave their users’ identities exposed. One such breach happened to Disqus in 2013, when journalists confronted commenters about the “anonymous” racist comments made via the platform.

Some of the trackers can even be used to directly collect data about individual behaviour.

Around 100 of the websites in our analysis (2.6%) used a technique called keylogging, which allows websites to capture all keystrokes on a particular page. A website could, for example, record everything that was typed into a form or comment box even if the user ultimately decides not to click the “Submit” button.

A similar technique is session recording, which allows website owners to replay a user’s interactions on a page, including their clicks, scrolls and mouse movements. These tools, usually used to analyse how people navigate websites and improve layouts, were found on 435 news websites in our analysis (11.3%).

Many modern browsers like Safari, Firefox or Brave provide tools to block or limit trackers out of the box. Other browsers, like Chrome, can block trackers with the help of installable extensions. Chrome will also soon join other browsers in banning third-party cookies, which would pose a serious challenge to some of Google’s competitors (but not Google themselves).

But advertising companies are wising up. A more recent technique called canvas fingerprinting, which is much more difficult to block, works by drawing a hidden graphic on the page and identifying users by the way that graphic is rendered. It is now present on one-in-ten news websites.

Google is also planning on introducing a new system that will create a virtual profile for each user inside its Chrome browser, instead of relying on cookies scattered around the internet, potentially giving the company even more control over online advertising.

The tech and news media giants tracking users

Whatever Google decides to do, there’s no question that it already dominates the market. Nearly all websites in our analysis – 97.6% of them – loaded at least one tracker from one of Google’s properties, such as its advertising platform Google Ads or its web analytics service Google Analytics.

But Google is far from being alone. Our analysis identified trackers from 914 different companies, ranging from names you may know, like Twitter and Microsoft, to names you may not, like Bidtellect and Neustar. What is certain, however, is that these companies know you.

After Google, the second most commonly found trackers were those of AT&T’s WarnerMedia, present on more than half of all media websites (56.1%). This is largely due to news websites using AppNexus, an advertising platform AT&T purchased in 2018.

Facebook’s tracking codes were also present in more than half of all websites (50.4%), with most of them using Facebook’s “Pixel” tracking code.

Tracking codes from comScore (48.2%), Oracle (46.4%) and Adobe (45.6%) were also common on media websites.

News websites are far more likely to include trackers than the average website, and it’s not difficult to see why.

Some reports found that targeted advertising is more likely to make visitors click on ads and buy products, although other studies haven’t found significant differences.

Despite that, the main attraction for publishers is that ad networks connect them to where the money is, meaning they don’t have to negotiate with each advertiser independently.

Data collected through trackers can also help reporters and editors focus their coverage on topics that bring in more readers, for better or worse.

While some media organisations have been able to reduce their reliance on targeted advertising by pushing for more subscriptions or through crowdfunding, it’s likely that advertising is here to stay for the foreseeable future.

To preserve trust between the media and readers, publishers will need to cut out excessive tracking, explain why some tracking is necessary, who the information is shared with and provide alternatives for those who would prefer to opt out.


Dominic WaltersRead New Statesman Media Group’s free white paper: How to define, discover and develop actionable leads in B2B marketing


How we worked out the numbers

Our analysis covers a total of 3,838 news and media websites that we collected from several sources, including Alexa, SimilarWeb, Comscore and an archive of the DMOZ link directory.

We then used the Tranco list of most visited websites around the world, which in turn is made up through a combination of multiple sources, to filter for the most visited pages.

We cross-referenced these websites with a scan of cookies and third-party tracking codes performed with Blacklight, a piece of software produced by The Markup. You can read more about their methodology here or scan a website here.

Each tracker we found was matched with its parent company using DuckDuckGo’s Tracker Radar dataset.

Due to the variety of sources, our records may not include some websites or mistakenly include websites that were miscategorised as news sources, although these should be rare.

Advertising companies constantly change where they host their trackers, which may also lead to some trackers being missed by our analysis. Publishers may have removed or added trackers since we performed our scan.

Our analysis includes scans of both the homepage and a random page on each website. Performing the scans again might lead to different results due to another page being chosen.

Websites can also theoretically detect when a visit is performed automatically, and they can change their surveillance behaviour as a result.

Additional data work by Josh Rayman.

Image by Best-Backgrounds/Shutterstock

Topics in this article : ,

Email pged@pressgazette.co.uk to point out mistakes, provide story tips or send in a letter for publication on our "Letters Page" blog

Websites in our network