Panama Papers: How 400 journalists made sense of biggest ever data leak to media - Press Gazette

Panama Papers: How 400 journalists made sense of biggest ever data leak to media

German newspaper Suddeutsche Zeitung has provided a fascinating insight into what may be the biggest journalistic investigation in history – the Panama Papers (sub-titled The Secrets of Dirty Money).
The leak of 11 million documents, dating back 40 years, came from an anonymous source who never met journalists and only made contact via encrypted digital communications.
Coordinated by the US-based International Consortium of Investigative Journalists, some 400 journalists from 109 news organisations based in 80 countries have spent the last year sifting through the information.
The ICIJ says the documents from Panama based law firm Mossack Fonseca reveal "the offshore holdings of world political leaders, links to global scandals, and details of the hidden financial dealings of fraudsters, drug traffickers, billionaires, celebrities, sports stars and more".
The UK partners are The Guardian and BBC Panorama.
The source reportedly told Suddeutsche Zeitung "my life is in danger" and said the reason they were releasing the files was because "I want to make these crimes public".
Several UK national newspapers today lead on the revelation by The Guardian that Prime Minister David Cameron's father  "ran an offshore fund that avoided ever having to pay tax in Britain by hiring a small army of Bahamas residents – including a part-time bishop – to sign its paperwork".
The leaked data is structured as follows: Mossack Fonseca created a folder for each shell firm. Each folder contains e-mails, contracts, transcripts, and scanned documents. In some instances, there are several thousand pages of documentation. First, the data had to be systematically indexed to make searching through this sea of information possible.

To this end, the Süddeutsche Zeitung used Nuix, the same program that international investigators work with. Süddeutsche Zeitung and ICIJ uploaded millions of documents onto high-performance computers. They applied optical character recognition (OCR) to transform data into machine-readable and easy to search files. The process turned images – such as scanned IDs and signed contracts – into searchable text. This was an important step: it enabled journalists to comb through as large a portion of the leak as possible using a simple search mask similar to Google.

The journalists compiled lists of important politicians, international criminals, and well-known professional athletes, among others. The digital processing made it possible to then search the leak for the names on these lists. The "party donations scandal" list contained 130 names, and the UN sanctions list more than 600. In just a few minutes, the powerful search algorithm compared the lists with the 11.5 million documents.



Press Gazette's must-read weekly newsletter featuring interviews, data, insight and investigations.