Adventures in spreadsheet journalism: Lesson one - Press Gazette

Adventures in spreadsheet journalism: Lesson one

This blog is about spreadsheets on the net and what can be done with them once they’re found, in the hopes of learning something interesting, and publishable, about them.  

But there are times when we encounter data in some native, primeval state that have to be tweaked before they can be smoothed out into all those rows and columns – in short, made spreadsheet-ready. And one redoubtable way for navigating the data-to-spreadsheet route is via the Text-to-Columns option.  (One note before we proceed: I’m using Excel 2010.) 

Text to Columns takes undifferentiated rows of data each of which inhabit one column, and lets the user parse them into serviceable columns, in the plural. But OK – exactly what does that abstruse guarantee mean?

Let’s  look at some data and see. If you pull into the

site, a digital inventory of UK weather data assembled at various reporting stations across the country, and select Heathrow 1948 from the page’s drop down menu, you’ll see something like this:


That’s an excerpt of all the data,  of course, but for  demonstration purposes just mouse-select some of the data starting with the row beginning “yyyy” and execute a garden-variety copy-and-paste into a blank Excel workbook:

Now here’s the point: appearances notwithstanding, the data above are all piled into the A column, even as you want to chunk the numbers into usable year, month, temperature, etc. columns. Click in the A column in the first row containing the 1948 year reference, for example, and check out the formula bar. You’ll see:

All those numbers are enfolded into one cell, where they aren’t doing you any good.

So here comes Text to Columns. Select the entire A column (that is, just click the A column heading) and click Data tab >   Text to Columns in the Data Tools. You should see something like this:

In this case Excel automatically types the data as Fixed Width, as per the associated legend above – meaning that Excel will break the data into column on the basis of the spaces interpolating the numbers.  Click Next:

You’re starting to get the idea. Excel proposes to columnise the data at the break points above, which look good. In the interests of introductory simplicity, click Next again and Finish. You should see:

Voila. The numbers have been repatriated into columns all their own, yielding…a workable spreadsheet. (Note, by the way, that you can drag those arrow-pointed dividing bars to alternative locations, in order to engender different data column breaks if the data warrant.)

What Excel hasn’t done particularly well here, though, is to properly isolate the header row titles. You’d probably want to retype your own as a result, making sure in turn to delete one of the two header rows. You can’t serve two masters – and spreadsheet data can’t have two header rows.

Of course there’s a bit more to the Text to Columns feature, but these are the important basics. You can learn more in more in my current spreadsheetjournalism post.