Our investigation into website tracking software has found that news websites have, on average, 17 third party reader tracking tools – and Google trackers are on nearly all of them.
“Big Tech companies insist spying on users, government is inadvertent,” reads an August 2020 headline on The Washington Times website, a conservative daily covering the DC area as well as US national politics.
It is just one of many similar articles published in recent years in which the Times takes a critical stance against the way Big Tech giants like Google, Facebook and Microsoft have reportedly mishandled their users’ data and invaded their privacy.
What it doesn’t say is that the number of companies with access to records of its visitors is much higher than readers might expect.
A Press Gazette analysis reveals that The Washington Times shared data with at least 57 advertising companies, including Facebook, Twitter, Google, Microsoft, Adobe and Oracle.
While the Times is one of the worst offenders when it comes to how extensive their data sharing is, it’s hardly the only media institution to allow advertising trackers on their website.
We analysed 3,838 of the world’s most popular news websites and found that the typical website includes tracking cookies or code from an average of 17 companies.
The British local paper Ipswich Star tops our ranking, with trackers from 82 distinct companies found on its online pages, followed by the LGBT+ news website PinkNews with 76 and the Dominican newspaper El Nacional with 67.
We found that Press Gazette had trackers from 22 different companies, although further investigation found that most of these are defunct and we are in the process of removing them.
“It is standard practice to track cookies for advertising, e-commerce and other optimisation purposes,” said a spokesperson from PinkNews, one of the websites with the highest number of trackers in our analysis.
Besides that, the company claims that some third-party code snippets we found on their pages are “ad tags” that don’t necessarily collect personal data, although they did not respond to a request to provide examples of these.
“Advertising is one of the revenue streams that helps us deliver content to our readers,” a representative of the company told us.
“We always strive to optimise the partners we work with to ensure we balance our commercial needs whilst providing our visitors with an excellent user experience.”
We have also reached out to Ipswich Star parent Archant, but we have not received a comment from the company.
Why do news websites track you?
There are many legitimate reasons for websites to include third-party tracking codes. Services such as Google Analytics help website owners get a better understanding of who their audiences are, what content they’re reading and how well the website is performing. Facebook’s “Pixel” lets companies see how well their ads perform and allows them to target people who’ve visited the website before.
Other services help publishers increase their revenue by providing targeted advertising. Outbrain (present on 25.7% of news websites) and Taboola (17.9%), for example, provide the infamous chum-box ads at the end of articles that present themselves to readers as “relevant content”.
Some other services, like AddThis, allow website owners to easily add social media sharing buttons to their articles while Disqus lets them add commenting functionality without having to build it themselves.
But while many of these services are free to use, they do come at a price.
Both AddThis, owned by Oracle, and Disqus, owned by Zeta Global, secretly inject additional trackers from companies like Google, Adobe and Amazon, as well as trackers from potentially unsecured sources.
These companies usually claim that the data they collect is anonymous, but the virtual profiles they create with it can be used to identify real individuals. Companies can also suffer from security breaches, which may leave their users’ identities exposed. One such breach happened to Disqus in 2013, when journalists confronted commenters about the “anonymous” racist comments made via the platform.
Some of the trackers can even be used to directly collect data about individual behaviour.
Around 100 of the websites in our analysis (2.6%) used a technique called keylogging, which allows websites to capture all keystrokes on a particular page. A website could, for example, record everything that was typed into a form or comment box even if the user ultimately decides not to click the “Submit” button.
A similar technique is session recording, which allows website owners to replay a user’s interactions on a page, including their clicks, scrolls and mouse movements. These tools, usually used to analyse how people navigate websites and improve layouts, were found on 435 news websites in our analysis (11.3%).
Many modern browsers like Safari, Firefox or Brave provide tools to block or limit trackers out of the box. Other browsers, like Chrome, can block trackers with the help of installable extensions. Chrome will also soon join other browsers in banning third-party cookies, which would pose a serious challenge to some of Google’s competitors (but not Google themselves).
But advertising companies are wising up. A more recent technique called canvas fingerprinting, which is much more difficult to block, works by drawing a hidden graphic on the page and identifying users by the way that graphic is rendered. It is now present on one-in-ten news websites.
Google is also planning on introducing a new system that will create a virtual profile for each user inside its Chrome browser, instead of relying on cookies scattered around the internet, potentially giving the company even more control over online advertising.
The tech and news media giants tracking users
Whatever Google decides to do, there’s no question that it already dominates the market. Nearly all websites in our analysis – 97.6% of them – loaded at least one tracker from one of Google’s properties, such as its advertising platform Google Ads or its web analytics service Google Analytics.
But Google is far from being alone. Our analysis identified trackers from 914 different companies, ranging from names you may know, like Twitter and Microsoft, to names you may not, like Bidtellect and Neustar. What is certain, however, is that these companies know you.
After Google, the second most commonly found trackers were those of AT&T’s WarnerMedia, present on more than half of all media websites (56.1%). This is largely due to news websites using AppNexus, an advertising platform AT&T purchased in 2018.
Facebook’s tracking codes were also present in more than half of all websites (50.4%), with most of them using Facebook’s “Pixel” tracking code.
Tracking codes from comScore (48.2%), Oracle (46.4%) and Adobe (45.6%) were also common on media websites.
News websites are far more likely to include trackers than the average website, and it’s not difficult to see why.
Despite that, the main attraction for publishers is that ad networks connect them to where the money is, meaning they don’t have to negotiate with each advertiser independently.
Data collected through trackers can also help reporters and editors focus their coverage on topics that bring in more readers, for better or worse.
While some media organisations have been able to reduce their reliance on targeted advertising by pushing for more subscriptions or through crowdfunding, it’s likely that advertising is here to stay for the foreseeable future.
To preserve trust between the media and readers, publishers will need to cut out excessive tracking, explain why some tracking is necessary, who the information is shared with and provide alternatives for those who would prefer to opt out.
Read New Statesman Media Group’s free white paper: How to define, discover and develop actionable leads in B2B marketing
How we worked out the numbers
Our analysis covers a total of 3,838 news and media websites that we collected from several sources, including Alexa, SimilarWeb, Comscore and an archive of the DMOZ link directory.
We then used the Tranco list of most visited websites around the world, which in turn is made up through a combination of multiple sources, to filter for the most visited pages.
We cross-referenced these websites with a scan of cookies and third-party tracking codes performed with Blacklight, a piece of software produced by The Markup. You can read more about their methodology here or scan a website here.
Each tracker we found was matched with its parent company using DuckDuckGo’s Tracker Radar dataset.
Due to the variety of sources, our records may not include some websites or mistakenly include websites that were miscategorised as news sources, although these should be rare.
Advertising companies constantly change where they host their trackers, which may also lead to some trackers being missed by our analysis. Publishers may have removed or added trackers since we performed our scan.
Our analysis includes scans of both the homepage and a random page on each website. Performing the scans again might lead to different results due to another page being chosen.
Websites can also theoretically detect when a visit is performed automatically, and they can change their surveillance behaviour as a result.
Additional data work by Josh Rayman.
Image by Best-Backgrounds/Shutterstock