Sunday 28 February 2016

Life Expectancy at Older Ages: Part 2 - Reorganising the data


Public Health England recently released 'Life Expectancy: Recent trends in Older Ages.'  As its title suggests, this publication looks at life expectancy at 65, 75, and 85. A main message from the publication is that life expectancy at older ages is continuing to increase.


Figure 1: Life expectancy by sex at ages 65, 75, and 85, England

Note: This bar chart uses the Google Visualisation API - mouse over for data values.

Comparisons with European Union (EU) countries suggest that as of 2013, life expectancy for males in England at 65 and 75 is a little higher than that for the EU overall; that for males at 85 is about the same as that across the EU. For females in England, life expectancy is below EU-wide life expectancy for all age bands.

In their Foreword, Public Health England noted that whilst life expectancy at older ages was increasing overall, at local authority level "there is a good deal of variation in the trend in life expectancy at older age for which there is no apparent explanation." Their advisory group on mortality surveillance (which includes prominent geographer Danny Dorling) are looking into this further. This seemed interesting, so I thought I'd make a data explorer to take a look.

The reference tables released with the publication include Excel spreadsheets containing local authority level time-series for life expectancy at 65, 75, and 85. My first step was to take these well-designed reference tables and turn them into simple csv files, so that I could import them into SQL as tables. The Excel files had to be prepared first: in practice, this meant stripping out blank columns, table titles, and any rows containing explanatory notes. Also, I needed to make sure that all column header information that I wanted to carry over into the SQL table was contained in a single header cell for each column.

Once this was done, I needed to choose an appropriate csv delimiter. Commas are the default separator for csv files (surprise!) - but that wasn't going to work for me, as some local authority names include commas - such as "Kingston upon Hull, City of". One option would have been to go into the data and enclose the cells containing commas with double quotes - but it's pretty straightforward to set a different delimiter, so I did that instead. There's an easy to follow tutorial at www.howtogeek.com which sets out how to do this.

Once happy with the csv files, I imported them into SQL (using the task wizard) and made sure that the data types were correctly set (i.e. that SQL was recognising decimal fields, rather than treating all columns as string data). Finally, I wrote two scripts - one creating a reference table linking local authority codes and names; the second pulling together the data from all of the csv files I'd uploaded as a very simple normal table. This can easily be pivoted to produce time series data in a format that's simple to chart: once I've done a last check that my figures still match those released by Public Health England, that's what I'll be doing.

As an aside, it's increasingly expected that data releases will include reuseable data - this will boost demand for two distinct product types from those publishing statistics:

  • polished data tables for end users - such as those published by Public Health England
  • reuse-ready files released as csv files, for people who will reanalyse or re-present the data.

The U.K. Statistics Authority have been promoting this two-pronged approach for some time: in November 2014 they released a briefing entitled "Releasing statistics in spreadsheets: Good practice guidance" which is well worth a look.

No comments:

Post a Comment