Finding out who, what, where and when just got easier... - Press Gazette

Finding out who, what, where and when just got easier...

Ever since Paul Julius Reuter’s news agency started using carrier pigeons 150 years ago, accelerating the spread of news by using new technologies has been a key selling point for its products.

And as the company continues to compete on speed, the ability to provide context and make sense of the glut of online information is becoming another important area for innovation.

‘Reuters has forever tried to use technology to get an advantage in the market of bringing news to people,’says Gerry Campbell, president of search and content technologies at the agency, which last week became part of the merged Thomson Reuters.

Campbell oversees a project that extends an effort to coordinate the news agency’s own vast archives of stories and data on the wider internet, by giving web developers outside the company free access to a powerful software tool that can automatically organise online content.

The project, which uses the long-hypothesised idea of the Semantic Web, is called Open Calais – named after an important development in the agency’s history. In the mid-19th century, Reuters used the new underwater telegraph cable between the French port and Dover to transmit news directly between Paris and London.


Although transmission times are now measured in fractions of a second, speed remains an important point of competition in financial news, especially for those City firms which have started to use algorithmic trading systems that respond automatically to ‘machine-readable news’coming off the wire.

For human readers, meanwhile, a growing source of the value of a news service is its ability to rapidly add context and extract relevant intelligence from the torrent of online information.

‘Reuters sits on a tonne of information,’Campbell says. ‘The opportunity that I see – and that a lot of people see – is that through the use of some relatively sophisticated tools, that information can be all brought together.”

A year ago, Reuters acquired Clearforest, an American-Israeli software firm specialising in text analytics. Clearforest’s software takes text news stories and identifies people, companies, places, events and other ‘entities’mentioned in the copy. Crucially, it can also discern the relationships between them.

To achieve this, the software uses a technique known as natural-language processing, which applies linguistic theory to analysing text. Reuters has been using it internally to connect related information in its disparate databases which include everything from news stories to share-price histories, and lists of corporate directorships.

The Calais web service gives external web developers the opportunity to use part of the Clearforest software to organise their own websites in the same way.

Different from search engines

Semantic indexing operates very differently from conventional search engines.

‘The old realm of search – the old realm of information gathering – was about phrase-matching,’Campbell explains. ‘If I’m looking for ‘Bill Gates’, Google or any of the other search engines will look for that phrase ‘Bill Gates’, but they won’t know that he’s a person, and they won’t know anything other than that there’s a string of characters that occurs in multiple places and therefore will return those as search results.”

‘In the new world, the word ‘semantic’ really means that there’s an understanding of the meaning of the concepts that are being presented.”

A mention of ‘Microsoft’would be understood to be a company, for instance, and could therefore be connected automatically to the software company’s share price. A mention of ‘Bill Gates’would connect to previous mentions of ‘William Gates”, and realise both phrases refer to the same person – and that he is associated with Microsoft.

Benefits to journalists

Journalists at Reuters and elsewhere have long categorised and tagged news stories manually, but automating the process allows it to be done faster, more comprehensively and – crucially for linking related information – more consistently.

‘The person who writes the story and the editors who review the story – and ultimately the readers – are thinking about the story itself, and not so much the descriptive information,’says Campbell. ‘Anything that can help that process is seen as very positive.”

‘We are in the process of revamping our tools for our editorial production, so that these things are increasingly factored in and easier to do.

And, as we bring Thomson and Reuters together, there’s going to be a lot of activity on bringing our systems together on the content-creation side.”

Since releasing the public version of Clearforest’s tool in late January, more than 1,500 developers have registered with, and there are around 250 projects in the works. In one of the biggest applications so far, the Powerhouse Museum of Science and Design in Australia, used it to tag its entire catalogue of exhibits.

This month, one developer released a plug-in for the WordPress blogging platform that allows bloggers to automatically tag posts using Open Calais. Another appears to be working on something similar for the open-source content-management system Drupal.

How Reuters benefits

Although it is effectively giving away access to expensive specialist software, the benefit to Reuters is that everything tagged using the Open Calais web service allows the company’s customers to discover relationships between the agency’s own information and other material on the wider internet.

‘We’ve got a way of understanding what our customers need,’says Campbell. ‘They need Reuters’s content, which is the stuff that our journalists create and the things that we capture uniquely. They need stuff that we partner for to bring in, and we’re starting – all companies are starting – to realise that our customers can be better served by bringing in not just what they pay for as a part of our realm, but by actually adding value by connecting things in that are out on the open web – for example, a blog post that’s relevant or something else.

‘The general idea is that we gain so much by providing that value to our customers that we’re not necessarily risking anything, because if our customers are happy, our customers are happy.”


1 thought on “Finding out who, what, where and when just got easier...”

  1. I think this is a great idea and long overdue, being able to easily relate between a story and other internet sources will be very useful. Especially when it’s necessary to Find People who search engines might struggle with for different reasons.

Comments are closed.