Monday 7 March 2016

Life expectancy at 65 in England (Local Authority highs and lows, with some asides about line charts)



Life expectancy at 65 has increased over time in England for both males and females: male life expectancy at 65 increased from 16.2 years for 2000-02 to 18.5 years for 2011-13, whilst female life expectancy over the same period rose from 19.2 years to 21.0 years. Male life expectancy at 65 increased faster than female life expectancy, reducing the gender gap from three years (for 2000-02) to 2.5 years (2011-13).  

Overall, this is an encouraging picture. Nevertheless, beneath the England level figures, substantial variation is found at local local authority (LA) level (and local authority district (LAD) level for two tier authorities).  Figures 1 and 2 show the time series data for female and male life expectancy at 65 for the LAs/LADs with the highest and lowest life expectancy as of 2011-13. 

Figure 1 shows that the highest female life expectancy at 65 were found in Chiltern (a rural LAD in Buckinghamshire) and Camden (in London): the lowest life expectancy of 18.8 years was found in Halton Borough Council (which covers Widnes and Runcorn).  Camden's life expectancy increased rapidly, by 2.9 years, between 2004-06 and 2008-10.  Female life expectancy at 65 for Halton has actually decreased in each of the last two available years: from 19.5 in 2009-11 to 18.8 in 2011-13. I'd expect that there are some public health people in Halton who are pretty unhappy about that.


Figure 1:  Female life expectancy at age 65: highest and lowest local authorities / local authority districts as of 2013

Female life expectancy at 65 for Chiltern, Camden, England and Halton
Notes: Upper and lower 95% confidence limits are shown in faint lines for local authority/local authority district life expectancy. Life expectancy is calculated using a rolling average over a three year period.

Figure 2 shows that for 2011-13, Harrow (in Greater London) had the highest male life expectancy at 65 (21.1 years). Manchester, the LA area covering central Manchester, had the lowest life expectancy (16.0 years).  This male life expectancy gap of 5.1 years is similar to that seen for women (5.2 years).


Figure 2: Male life expectancy at age 65: highest and lowest local authorities as of 2013
Time series chart showing male life expectancy at age 65 for Harrow, England and Manchester


Notes: Upper and lower 95% confidence limits are shown in faint lines for local authority/local authority district life expectancy. Life expectancy is calculated using a rolling average over a three year period.

Whilst these charts show that the life expectancy gap for these particular LAs / LADs has increased over time, it doesn't show anything about what the LA/LAD level life expectancy gap has done in general over this period (Harrow and Manchester are at the extremes of the range in 2011-13, but not necessarily for other years). I'll be pulling together some charts that look at how the life expectancy gap has changed over time for another post.

About the charts

These are fairly simple Excel-based time series charts (primarily made as mock-ups for a data explorer I'm making in SSRS). If you like this design, there are a few things that may be worth noting about how to build these.

Y axis scales

Y-axis scales for time series charts don't have to start at zero. The main purpose of these (and similar) charts is to examine change over time. Starting the y-axis at zero would make the changes too difficult to see clearly - and make the chart unhelpful to users. But - if you are going to ask your readers to compare different charts, make sure the charts use a common format:

  • Apply the same y-axis range to all the charts that will be compared. To do this in Excel 2016, select the y-axis and look under  'Format Axis - Axis options - Axis options.' The option needed is 'Bounds - Minimum and Maximum'  
  • Ensure that the charts to be compared are the same size.  This is straightforward in Excel 2016: under 'Format chart area - Size and properties' check under properties that 'Don't size or move with charts' is selected (this keeps the chart from re-sizing if you expand a column or row) and then  under 'Size', set the chart height and width to the desired dimensions
  • Ensure that colour use is as consistent as possible across charts. In Figures 1 and 2, for example, 'blue' denotes the LA with the lowest life expectancy; teal is used for England-level life expectancy and 'orange' is used for high life expectancy (gray is also used in Figure 1, where two local authorities had the same female life expectancy at 65 for 2011-13).

Replace legends with data labels
Using data labels instead of legends makes it easier for readers to interpret charts quickly.
Ordinarily for charts like these, I'd include both the series name and the final data value as a data label at the right of the chart. This won't work for Figure 1 as Camden and Chiltern share the same final data value. Whilst both series names and final values could still be displayed to the right (just offset up and down), it would be unclear which label related to which data series. I got around this in Figure 1 by adding a blank cell to the start of the x-axis series and to each data series. This provided space to include data series names at the start of the data series, without overlapping the y-axis.

Legends slow readers down, forcing them to look away from the actual chart for extra information. Keep them in reserve for cases where there are no viable alternatives (for example, where data series start and end on the same values).

Line weights combined with tints/shades can be used to emphasise or deemphasise data 

This is a pain in Excel (each series has to be set manually, and chart templates don't seem to consistently solve the problem, unless one is very particular about data set up), but pleasantly straightforward in SSRS.  I've used thin line weights and tints to show the upper and lower 95% confidence limits for the the LA/LAD results. I've also moved the confidence limit series to the top of the data series list (in the 'Select Data' pane), meaning that data series showing life expectancy will always cross in front of confidence limit series.


Tuesday 1 March 2016

Nobody wants to look at your chart



People don't want to carefully match the colours in the legend to the different data series, or take out their ruler and check that the chart is drawn to scale. They don't want to check that the proportions in the pie chart add to 100 (insert solemn ritual to ward off the evil pie here). People really, really, oh so really don't want to check that the y-axes used in that set of small multiples share a common scale.

Sometimes people don't want to look at charts because charts make them feel anxious and ignorant, reminding them of real or imagined mathematical skill deficits.  No-one likes feeling anxious or ignorant - though there's a real high to be had from transforming 'anxiety and ignorance' into 'temporary sense of relief and slightly-less ignorance').

Sometimes people don't want to look at charts because they're badly designed. One does not simply read these charts. There are bars with lengths difficult to compare, and chart text running in many directions. There are data labels obscured by bars, and legends in little boxes. The very page is riddled with garish colours, 3d effects and heavy gridlines...

And there are a lot of them. Open any annual report, or government publication: it's about even odds that you'll find some proper doozies.

Finally, people don't want to look at charts because, ideally, they want to already know the stuff that's in the chart, without having to look at the chart at all (much like Goran Peuc's product users).

Given that telepathy is not currently an option - and that charts can be a quick way to share findings - what can be done to get people to look at charts?

It probably helps to distinguish between some different kinds of charts.

Charts that analysts use to figure stuff out

Analysts generally do like to look at charts - but the way they use them is often different from other users. When I'm doing data analysis, I end up making lot of charts - they're a quick way to try and spot interesting things. In that exploratory context, it doesn't matter if the colours are hideous and the text is on its side (I already know what it says). Getting rid of chart cruft is nice - but a lot of the time it doesn't really matter because I'm going to just chuck the whole chart anyway. For analysts, charts are quick, cheap and disposable - and most of them get tossed.   

This means that when analysts have found the cool stuff, it's important that they switch gear from thinking about exploring the data (where charts are cheap disposable ephemera), to thinking about how to show off the data in presentations and reports (where charts live on forever in pdf - or at least until bitrot takes them). 

Presentation and report charts

The main audiences for report and presentation charts are people who make decisions that are supposed to be data-driven / evidence-based.  Many of these people (policy makers, service planners etc.) do not want to look at your charts - for all the reasons outlined above - but especially, because they're busy and they'd just like to know the stuff already.  

The risk is that they won't look at your chart, but will make policy or plan services anyway. 

What can you do to minimise this risk? 
  • Present your key findings with static charts: use interactive tools as supplementary enrichment for your more engaged users. Interactive tools - even really clear, aesthetically pleasing and well-designed tools - involve both some time and a learning curve.
  • Make your charts easy to read and interpret. A few things to do are listed below:
    • getting rid of all vertical text
    • showing data labels where possible (people go back and forth about including data labels: I've usually found that policy makers / service planners want them, and that they serve as helpful reference points in discussions).
    • labeling data series (and avoiding legends)
    • getting rid of boxes and borders
    • removing chart cruft and avoiding bad default settings
    • using white space to chunk related data series (for bar charts)
    • Using colour sparingly and with purpose
  • Talk to your users and be on hand to explain things that aren't clear 
  • Assess whether the potential gain from an innovative chart design exceeds the likely costs associated with unfamiliarity and the time needed to learn to interpret it

Charts for public education

From what I've seen, charts for public education split into two broad types: infographics and interactive visualisations. The former frequently suffer from what I'm sure I've seen described as 'loads of blokes waiting for the toilet' syndrome.  Here's an example from vis4net  (given as an illustration of a terrible infographic).

U.S. insurance reform infographic

There are good ones to be found though.

Interactive data visualisations - even those using directed stories - require users to invest time both to learn the interface and explore the data. Good design - both in terms of interface and aesthetics - strongly influences whether this is an enjoyable or frustrating process (and hence user attrition rates).
There are some great examples out there:  Nelson Davis's Visualizing 'A Problem from Hell' - The Effect of Genocide and War is one of the best I've seen recently.


Thanks to Goran Peuc's 'Nobody wants to use your product' which seems very applicable to chart design (and is a great read).



Sunday 28 February 2016

Life Expectancy at Older Ages: Part 2 - Reorganising the data


Public Health England recently released 'Life Expectancy: Recent trends in Older Ages.'  As its title suggests, this publication looks at life expectancy at 65, 75, and 85. A main message from the publication is that life expectancy at older ages is continuing to increase.


Figure 1: Life expectancy by sex at ages 65, 75, and 85, England

Note: This bar chart uses the Google Visualisation API - mouse over for data values.

Comparisons with European Union (EU) countries suggest that as of 2013, life expectancy for males in England at 65 and 75 is a little higher than that for the EU overall; that for males at 85 is about the same as that across the EU. For females in England, life expectancy is below EU-wide life expectancy for all age bands.

In their Foreword, Public Health England noted that whilst life expectancy at older ages was increasing overall, at local authority level "there is a good deal of variation in the trend in life expectancy at older age for which there is no apparent explanation." Their advisory group on mortality surveillance (which includes prominent geographer Danny Dorling) are looking into this further. This seemed interesting, so I thought I'd make a data explorer to take a look.

The reference tables released with the publication include Excel spreadsheets containing local authority level time-series for life expectancy at 65, 75, and 85. My first step was to take these well-designed reference tables and turn them into simple csv files, so that I could import them into SQL as tables. The Excel files had to be prepared first: in practice, this meant stripping out blank columns, table titles, and any rows containing explanatory notes. Also, I needed to make sure that all column header information that I wanted to carry over into the SQL table was contained in a single header cell for each column.

Once this was done, I needed to choose an appropriate csv delimiter. Commas are the default separator for csv files (surprise!) - but that wasn't going to work for me, as some local authority names include commas - such as "Kingston upon Hull, City of". One option would have been to go into the data and enclose the cells containing commas with double quotes - but it's pretty straightforward to set a different delimiter, so I did that instead. There's an easy to follow tutorial at www.howtogeek.com which sets out how to do this.

Once happy with the csv files, I imported them into SQL (using the task wizard) and made sure that the data types were correctly set (i.e. that SQL was recognising decimal fields, rather than treating all columns as string data). Finally, I wrote two scripts - one creating a reference table linking local authority codes and names; the second pulling together the data from all of the csv files I'd uploaded as a very simple normal table. This can easily be pivoted to produce time series data in a format that's simple to chart: once I've done a last check that my figures still match those released by Public Health England, that's what I'll be doing.

As an aside, it's increasingly expected that data releases will include reuseable data - this will boost demand for two distinct product types from those publishing statistics:

  • polished data tables for end users - such as those published by Public Health England
  • reuse-ready files released as csv files, for people who will reanalyse or re-present the data.

The U.K. Statistics Authority have been promoting this two-pronged approach for some time: in November 2014 they released a briefing entitled "Releasing statistics in spreadsheets: Good practice guidance" which is well worth a look.

Thursday 25 February 2016

Keeping chart text horizontal: Part 3 - Tweaking double-barrelled axis labels

What are double-barrelled axis labels when they're at home?

Individual categories are often grouped into convenient and/or analytically useful over-arching categories:

  • Some U.K. councils group electoral wards into 'area committees' or some synonymous term
  • Within census data, detailed ethnic group categories are aggregated into broader categories

Sometimes analysts chart individual categories - but also  show how they fit within overarching categories. If they use Excel defaults to do so, double-barreled axis labels will result. Here's an example of double-barreled axis labels in a bar chart.

Figure 1: Double-barreled axis labels

Double-barreled axis labels in a bar chart.
This is a reasonably simple, straightforward chart (although there are quite a lot of bars) - but what's that mess at the left side?

The same thing can also happen with bar charts: there it's worse because the x-axis labels for individual categories end up being vertical. Here's an example with vertical x-axis text and a bonus vertical y-axis title.

Figure 2: Double-barreled x-axis labels

Double-barreled column chart with vertical x-axis labels



The basic idea of chunking individual categories into overarching groups is great. People find it easier to make sense of and remember lots of bits of individual information if they are grouped together in meaningful ways.  So how can we keep the 'chunking' but lose the vertical text that is making these charts harder to read?


Turn column charts into bar charts

Transforming column charts into bar charts will immediately reduce the amount of vertical text that has to be removed. Right-click on chart area, select 'Change Chart Type' and swap the column chart out to a bar chart.

Add blank data series

Once you've got a bar chart, take a look at the data used in the chart. First off, if you have any merged cells in your 'overarching categories' series, unmerge these. Then insert a couple of empty rows between each category.  Finally, reselect your data so that you have an empty row at the top of the chart. You should now have a data selection that looks a bit like this...

Figure 3: Sample data with empty data series added
Chart data with blank data series added.

























and a chart that looks something like this:

Figure 4: Adding blank data series

Getting rid of vertical text - adding extra blank data series at end of category groups

Reorganise axis labels
Next, copy the overarching category labels into the same column as your individual data series, and delete the column they were previously in (or reselect the chart data without that column). Your data should look something like the sample data below.

Figure 5:  Data for chart without vertical axis text. 

Example of how to structure data to avoid vertical text in two-level axis labels

At this point, there is no more vertical text in the chart - but the overarching category headings look like any other category label on the axis. They need to be distinguished in some way. The simplest way to do this (there are others) is to add spaces after the category heading, such that all the headings appear left justified. If category header names are long, it may also be necessary to use Alt-Enter to split text over two lines.

The final chart should look something like this.

Figure 6: No more vertical text

Chart with two-level axis categories and no vertical text

This chart takes up a little more space than the original, but getting rid of the vertical text has the nice side effect of visually grouping related bars together. This makes it easier to compare values within categories.

Summing up

In this 'Keeping chart text horizontal' series, I've looked at how to get rid of three varieties of vertical text in charts:

Y-axis title
This is both the most common and (fortunately) the most innocuous type of vertical text. It's the sandfly of vertical chart text: mostly harmless and irritating, but enough of them will ruin a day at the beach.

But, like sandflies, they're easily swatted. If the y-axis title is painfully obvious, just remove it. If it's only very obvious (or not particularly obvious at all or actually quite obscure), then resize the chart and plot areas, so there's a little space at the top and change the text orientation to horizontal. Save the chart as a template - next time the y-axis will be in the right place.  Here's our video showing how to do it: https://youtu.be/zL6l3Rv2vqs and here's a link to 'Keeping chart text horizontal: Part 1, Reformatting y-axis titles'

Vertical x-axis text
These usually happen in crammed column charts. The solution is to make a bar chart instead, so that the category labels are set out vertically. There's a walk-through of how to do this in 'Keeping chart text horizontal: Part 2, Getting rid of vertical x-axis text in charts'.

Double-barrelled axis labels
These happen in column and bar charts with sub-categories grouped into larger categories. See above.

Tuesday 23 February 2016

Making a data explorer for Life Expectancy at Older Ages: Part 1


Public Health England (PHE) recently released 'Life Expectancy: recent trends in older ages', which found that "life expectancy at older ages in England has risen to its highest ever level." They also cautioned that this overall gain masks substantial variation at Local Authority (LA) level. Danny Dorling (the well-known geographer, in this case speaking in his capacity as a member of PHE's mortality surveillance group) stated that:

"Although national average life expectancy continues to rise, in many parts of England improvements have stalled in recent years. There is an urgent need to determine why this is happening. Beneath the headline figures of this report there is evidence of worsening health for many older people in some parts of the country."

My initial thought was that there might be a relationship between changes in life expectancy and LA-level ranks on the Indices of Multiple Deprivation 2015. But there wasn't - at least at LA level.

This seems interesting, so I'm going to have a go at building a data explorer to make it easier to view LA-level life expectancy data that PHE have published and link it up with Indices of Deprivation data, including some of the sub-domains, such as Income Deprivation Affecting Older People Index (IDAOPI).

The first task will be reformatting and reorganizing the life expectancy data tables that have been released along with the publication (available here). These are well-designed reference tables for end-users - but I'm going to be using them for further analysis. To that end, I will strip out a lot of the formatting, import them into SQL and restructure them so that they're easier to work with.

Once that's done, I'll do a bit of mapping and chart making, either with SSRS or the Google Visualisation API.  I'll be posting the odd progress report, as well as posts about how to avoid the pot-holes I'll inevitably find into along the way. There may be the odd interim product as well.

In the meantime, here are some sample charts showing change over time in male life expectancy at 65 years and over. I've built these in Excel to get a sense of the look and feel that I want - but obviously the final versions will be built using something else. Eventually they will need to accommodate multiple data series (so that users can compare results for different age bands, or by gender, for example).

Given that this is local authority level data, I think it's important to show the 95% confidence limits as well as the calculated life expectancy. The first approach I took was simply to chart the upper and lower confidence limits as fainter, lighter lines around the life expectancy value itself. I think this looks fairly clean (and would continue to do so with more data series) - but I'm not sure whether or not the confidence limits are emphasised sufficiently.

Figure 1: Life expectancy for males aged 65 and over in Middleborough, 2000/02 to 2011/13





















Note: Life expectancy is calculated using three-year rolling averages. Life expectancy data for 2013, for example, is a rolling average of data from 2011-2013. 
Sources: Office for National Statistics http://www.ons.gov.uk/ons/rel/subnational-health4/life-expectancy-at-birth-and-at-age-65-by-local-areas-in-england-and-wales/2011-13/stb-life-expectancy-at-birth-2011-13.html  Crown Copyright 2015

In this second approach, I've charted the range between the upper and lower 95% confidence intervals as an opaque block, adding a thin 'glow' effect to indicate that a smattering of possible values would fall outside the limits (here I'm indebted to Michael Correll and Michael Gleicher and their paper "Error Bars Considered Harmful: Exploring Alternate Encodings for Mean and Error". A darker bar shows life expectancy. (Please note that my 'glow' effect in my example is currently indicative - not representative of the fall-off outside the 95% limits).

This chart doesn't smooth the data (it's actually a stack column chart in disguise): I think this may make it easier to assess which changes over time are striking.

Figure 2: Life expectancy for males aged 65 and over in Middleborough, 2000/02 to 2011/13















Note: Life expectancy is calculated using three-year rolling averages. Life expectancy data for 2013, for example, is a rolling average of data from 2011-2013. 
What do you prefer? Does the range within the confidence limits feel more prominent in the second chart? Or is it much of a muchness? 


Monday 22 February 2016

Keeping chart text horizontal: Part 2, Getting rid of vertical x-axis text in charts



Vertical text in charts is a pet peeve of mine. Actually, no. It's not. Vertical text is a substantial and entirely unnecessary barrier to the effective communication of key messages and data findings to readers, some of whom are supposed to use these messages to plan services and make policies that potentially affect all of our lives.    

Around half (52.0%) of adults surveyed in the OECD Skills Outlook 2013: First Results from the Survey of Adult Skills had numeracy skills assessed at Level 2 or below (a proportion which rises to 57.5% in England and to a whopping 61.3% in the U.S.). Around 28.0% of those with an undergraduate degree (OECD-wide) had numeracy skills assessed at Level 2 or below. 

This means that if you are displaying data to a general audience within OECD, you need to assume (bearing in mind country-level variations) that around half of them find it difficult to interpret complex charts and graphs. If you're sharing charts and graphs with policy makers and service planners, you need to assume that a fair chunk of them are in the same position. Within this context, it's flat-out foolish to make reading the chart text difficult by putting it on its side. 

Here's the thing: policy makers and service planners who look at an unnecessarily complex chart and get that panicky feeling of 'OMG I'm such an imposter! Why can't I understand this?' aren't thinking about the messages in the data. Depending on how much time they have available and how persistent they are, some of those policy makers and planners will get past their initial reaction and reason their way through: others will put it in the 'too hard' basket. They'll still be planning policies and making services - but now they'll be doing so without the evidence. And that's bad for everybody.    

So - now that I've got that rant out of the way - let's take a look at how to get rid of vertical x-axis text. 

Vertical x-axis text usually happens when a data analyst tries to cram a whole load of categories into a column chart. The (modified) stack column charts below are good examples of this.  They are taken from the OECD (2013) OECD Skills Outlook 2013: First Results from the Survey of Adult Skills. In general, this is a very useful and informative report, but some of the charts could be clearer.  

Example 1: Stack column charts with vertical x-axis labels


Example chart with vertical x axis labels


These charts show the percentages of respondents reporting changes in their workplace, broken down by country (on the x-axis) and work type/skill level (within each stack). Overall breakdowns by clerical and manual work are distinguished through their respective positioning above and below the '0' point on the y-axis. The 'total' line (which shouldn't be a line at all, as the x-axis displays categorical data) shows the overall percentage of respondents reporting changes in their workplace. There's a lot of information packed into this chart - which is great, but makes it important that the text elements are easy to read. 

The easiest way to get rid of vertical text from a stacked column chart is to turn it into a stack bar chart instead. This can be done very simply by changing the chart type from 'stack column' to 'stack bar' (or applying a custom stack bar template if you have already designed one).  If you want to use a mirrored axis in your chart (as used to chart manual workers in this example), ensure that the data series' to appear 'below' zero on the x-axis are assigned negative values.

If you're planning to use a mirrored x-axis, don't panic if you see a horror like this when you first create the chart.

Figure 1: Stack bar chart before formatting

Example of part-completed stack chart with axis text sitting on top of bars

This can quickly be turned into something more useful by taking the following steps:
  1. Reduce the white space between data series by using 'Format data series - Series options - Reduce gap width.'  I find a gap width of 50% to be about right for a bar chart (not the 150% that comes as an Excel default)
  2. Reverse the order of countries on the y-axis (so that the average appears at the top of the chart) by selecting the y-axis and, under 'Format axis - axis options - axis options,' checking the 'categories in reverse order' and selecting the 'horizontal axis crosses at maximum value' option.
  3. Whilst still in the same 'Format axis - axis options' menu, move the axis text clear of the data series by selecting the 'label' drop down and selecting label position: 'low' (not 'next to axis').
  4.  Finally, to display only positive values on the x-axis (and show that these are percentages), select the x-axis and under 'axis options - number' drop down, enter the following 'custom' format:  #,##0"%";#,##0"%"

Once those steps are taken you'll have something more like this: 

Figure 2: Stack bar chart after main formatting changes made

Depending on your Excel themes, you might still need to tweak some colours and reposition the legend - but at this point, the chart is pretty much ready to go. 

My final version looks like this: 

Figure 3: Organisational change and new technologies: Percentage of workers who reported change in their current workplace in the last three years that affected their immediate working environment

Note: Total percentages reporting change are reported in brackets by each country name. In both charts, countries are ranked in descending order of the percentage of workers with low and high clerical related skills who reported changes. This means that the order of countries varies between the two charts.

I've made a couple of other smaller tweaks along the way:
  • overall percentages of workers reporting change are given in brackets beside the country name (instead of as a separate line series on the chart). This uses a simple concatenation formula using '&', that combines the country name, a couple of little text strings for spaces and brackets, and the percentage of workers reporting change  (Here's an example:  =A8&" ("&G8&"%)"    )
  • The order of categories in the legend corresponds to the order in which data series appear on the chart (I did this as a simple cut and paste in MS Paint).
  • The OECD average is pulled out and displayed at the top of the chart, rather than within the sorted countries.
  • I've shared one legend between the two charts - again using MS Paint to stitch the different charts and the legend together. 
The rest of the chart is left largely unchanged: colours are used to distinguish manual and clerical workers, with manual workers charted to the left of the x-axis and clerical workers charted to the right.


This chart has its flaws, some of which are also found in the original. There's a risk that cursory readers will assume that the list of countries in the second chart duplicates the first, resulting in their attributing data in the second chart to the wrong country. I've reduced that risk by noting explicitly that the country labels are not duplicates - but it would be better to find a visual solution. Both bar and stack column graphs have the general disadvantage that it is hard to compare the relative size of individual data series that are not located next to the '0' on the x-axis (in this chart, for example, it is hard to see small country level differences between high skilled workers). 

Nevertheless, I think this chart is easier to read than the original. It's easier to see at a glance which countries have high proportions of clerical workers reporting change (and which countries have high overall proportions of workers reporting change).  


A caveat about the data
On looking more closely at the data table underlying these charts, I found that the data for the components within the stacks did not add precisely to the reported total (this isn't apparent in the chart itself). There were no huge differences, but it didn't look like a simple rounding issue. A bit of digging in the European Working Conditions Survey 2010 (EWCS 2010) which the data source (tables available here) suggests that PIAAC derived its breakdowns from respondent counts and 'yes'/'no' row percentages rounded to one decimal place for each occupation/skill category. They did so without taking their derived respondent counts back to the nearest whole person, meaning that the components don't quite agree with the total reported (which is taken directly from the EWCS 2010 data). Basically, they have accumulated some rounding error in recalculating the data. Among other things, that makes the second decimal place in the data tables a nice example of over-precision. 

If I were going to use this data for analysis, I'd recalculate the lot. For most countries the impact is fairly trivial, but for a few with smaller respondent counts (such as Turkey, which has a correctly reported total of 20.8%, but derived components totaling 23.12% for the proportion of workers reporting changes in processes or technology) it is more pronounced.  As my purpose here is just to demonstrate redesign options that are easier to interpret, I've just noted the discrepancy here.


Wednesday 17 February 2016

Keeping chart text horizontal: Part 1, Reformatting y-axis titles




We've all been there. Scanning through a report trying to figure out its main findings and suddenly . . .

PIAAC chart with vertically aligned text



it's off to the physiotherapist with a stiff neck . . .

Vertical text is often used in charts - but it doesn't make for easy reading. A paper copy can be flipped around, but people mostly read reports on screen. The reader might be able to rotate the screen image, but does the writer really want readers to stop thinking about the statistics and start thinking about how to fiddle about with their displays?

The chart above is taken from the OECD Skills Outlook 2013 'First Results from the Survey of Adult Skills.' It's actually a very useful, comprehensive report for people with an interest in numeracy levels within OECD countries. To my mind, that's pretty much everybody involved in displaying charts and statistics to the general public and non-expert readers in OECD countries.

A big 'take away' message from the report is that across the OECD, around 52% of adults had numeracy skills at Level 2 or below.  Those at Level 2 (around 33%) will be able to manage "basic analysis of data and statistics in texts, tables, and graphs" (p.76), but are likely to struggle with charts and tables that are more complex. This doesn't mean charts should be 'dumbed down': it does mean avoiding chart design choices that make data interpretation harder than it needs to be.

One of the best and kindest things you can do for all of your readers (surprisingly large proportions of whom may get antsy when confronted with charts and data tables) is to make sure that all the chart text goes in the same direction. Vertical chart text makes data interpretation feel like hard work, even if the actual messages in the chart are straightforward. It disrupts readers' flow and makes them more likely to skip over charts, or just get frustrated and stop reading altogether.

Euuwwgh!  . . . How did that vertical text get into my chart? How do I get rid? 


Excel defaults are a frequent culprit. There are three main types of vertical text I've come across that are encouraged by Excel defaults. Each needs to be dealt with differently.


Type 1: Vertical y-axis text

Example chart showing vertical axes chart elements
Default y-axis titles are the type of vertical text I see most often in charts. Whilst they're relatively harmless, I find them irritating precisely because they are so easy to remove. Here's how: 
  1. Decide whether or not any axis text is needed at all. Despite what's sometimes taught in school, axis titles are often redundant. A good rule of thumb is that you don't need an axis title if it makes you think 'Well duh!' and roll your eyes. 
  2. If the axis title adds information, then make it easy to read. Align it horizontally and place it at the top of the axis. In Excel 2016, this can be done by selecting 'Text direction: Horizontal' under  'Format Axis Title - Text Options - Textbox'. Then drag it where you want it to display, resize the chart and you're done. Save your chart as a template (Right-click on chart and save as template), and the next time you want to make a similar chart, the box for the axis title will already be correctly positioned.
Type 1: No more vertical text
Example chart with vertical text replaced by horizontal text


The other two types of vertical text that pop up in charts are vertical column labels (as in the example from the OECD Skills Outlook above), and vertical axis category labels (as in the bar chart below).

Type 3: Two-level axis labels in stack charts


I'll look at how to get rid of these types of vertical text in a future post. 


Update:  I've added a YouTube video demoing how to replace vertical y-axis text with horizontally aligned text here