The leaked data is structured as follows: Mossack Fonseca created a folder for each shell firm. Each folder contains e-mails, contracts, transcripts, and scanned documents. In some instances, there are several thousand pages of documentation. First, the data had to be systematically indexed to make searching through this sea of information possible.To this end, the Süddeutsche Zeitung used Nuix, the same program that international investigators work with. Süddeutsche Zeitung and ICIJ uploaded millions of documents onto high-performance computers. They applied optical character recognition (OCR) to transform data into machine-readable and easy to search files. The process turned images – such as scanned IDs and signed contracts – into searchable text. This was an important step: it enabled journalists to comb through as large a portion of the leak as possible using a simple search mask similar to Google.
The journalists compiled lists of important politicians, international criminals, and well-known professional athletes, among others. The digital processing made it possible to then search the leak for the names on these lists. The "party donations scandal" list contained 130 names, and the UN sanctions list more than 600. In just a few minutes, the powerful search algorithm compared the lists with the 11.5 million documents.
Email pged@pressgazette.co.uk to point out mistakes, provide story tips or send in a letter for publication on our "Letters Page" blog