Monday 22 February 2016

Keeping chart text horizontal: Part 2, Getting rid of vertical x-axis text in charts



Vertical text in charts is a pet peeve of mine. Actually, no. It's not. Vertical text is a substantial and entirely unnecessary barrier to the effective communication of key messages and data findings to readers, some of whom are supposed to use these messages to plan services and make policies that potentially affect all of our lives.    

Around half (52.0%) of adults surveyed in the OECD Skills Outlook 2013: First Results from the Survey of Adult Skills had numeracy skills assessed at Level 2 or below (a proportion which rises to 57.5% in England and to a whopping 61.3% in the U.S.). Around 28.0% of those with an undergraduate degree (OECD-wide) had numeracy skills assessed at Level 2 or below. 

This means that if you are displaying data to a general audience within OECD, you need to assume (bearing in mind country-level variations) that around half of them find it difficult to interpret complex charts and graphs. If you're sharing charts and graphs with policy makers and service planners, you need to assume that a fair chunk of them are in the same position. Within this context, it's flat-out foolish to make reading the chart text difficult by putting it on its side. 

Here's the thing: policy makers and service planners who look at an unnecessarily complex chart and get that panicky feeling of 'OMG I'm such an imposter! Why can't I understand this?' aren't thinking about the messages in the data. Depending on how much time they have available and how persistent they are, some of those policy makers and planners will get past their initial reaction and reason their way through: others will put it in the 'too hard' basket. They'll still be planning policies and making services - but now they'll be doing so without the evidence. And that's bad for everybody.    

So - now that I've got that rant out of the way - let's take a look at how to get rid of vertical x-axis text. 

Vertical x-axis text usually happens when a data analyst tries to cram a whole load of categories into a column chart. The (modified) stack column charts below are good examples of this.  They are taken from the OECD (2013) OECD Skills Outlook 2013: First Results from the Survey of Adult Skills. In general, this is a very useful and informative report, but some of the charts could be clearer.  

Example 1: Stack column charts with vertical x-axis labels


Example chart with vertical x axis labels


These charts show the percentages of respondents reporting changes in their workplace, broken down by country (on the x-axis) and work type/skill level (within each stack). Overall breakdowns by clerical and manual work are distinguished through their respective positioning above and below the '0' point on the y-axis. The 'total' line (which shouldn't be a line at all, as the x-axis displays categorical data) shows the overall percentage of respondents reporting changes in their workplace. There's a lot of information packed into this chart - which is great, but makes it important that the text elements are easy to read. 

The easiest way to get rid of vertical text from a stacked column chart is to turn it into a stack bar chart instead. This can be done very simply by changing the chart type from 'stack column' to 'stack bar' (or applying a custom stack bar template if you have already designed one).  If you want to use a mirrored axis in your chart (as used to chart manual workers in this example), ensure that the data series' to appear 'below' zero on the x-axis are assigned negative values.

If you're planning to use a mirrored x-axis, don't panic if you see a horror like this when you first create the chart.

Figure 1: Stack bar chart before formatting

Example of part-completed stack chart with axis text sitting on top of bars

This can quickly be turned into something more useful by taking the following steps:
  1. Reduce the white space between data series by using 'Format data series - Series options - Reduce gap width.'  I find a gap width of 50% to be about right for a bar chart (not the 150% that comes as an Excel default)
  2. Reverse the order of countries on the y-axis (so that the average appears at the top of the chart) by selecting the y-axis and, under 'Format axis - axis options - axis options,' checking the 'categories in reverse order' and selecting the 'horizontal axis crosses at maximum value' option.
  3. Whilst still in the same 'Format axis - axis options' menu, move the axis text clear of the data series by selecting the 'label' drop down and selecting label position: 'low' (not 'next to axis').
  4.  Finally, to display only positive values on the x-axis (and show that these are percentages), select the x-axis and under 'axis options - number' drop down, enter the following 'custom' format:  #,##0"%";#,##0"%"

Once those steps are taken you'll have something more like this: 

Figure 2: Stack bar chart after main formatting changes made

Depending on your Excel themes, you might still need to tweak some colours and reposition the legend - but at this point, the chart is pretty much ready to go. 

My final version looks like this: 

Figure 3: Organisational change and new technologies: Percentage of workers who reported change in their current workplace in the last three years that affected their immediate working environment

Note: Total percentages reporting change are reported in brackets by each country name. In both charts, countries are ranked in descending order of the percentage of workers with low and high clerical related skills who reported changes. This means that the order of countries varies between the two charts.

I've made a couple of other smaller tweaks along the way:
  • overall percentages of workers reporting change are given in brackets beside the country name (instead of as a separate line series on the chart). This uses a simple concatenation formula using '&', that combines the country name, a couple of little text strings for spaces and brackets, and the percentage of workers reporting change  (Here's an example:  =A8&" ("&G8&"%)"    )
  • The order of categories in the legend corresponds to the order in which data series appear on the chart (I did this as a simple cut and paste in MS Paint).
  • The OECD average is pulled out and displayed at the top of the chart, rather than within the sorted countries.
  • I've shared one legend between the two charts - again using MS Paint to stitch the different charts and the legend together. 
The rest of the chart is left largely unchanged: colours are used to distinguish manual and clerical workers, with manual workers charted to the left of the x-axis and clerical workers charted to the right.


This chart has its flaws, some of which are also found in the original. There's a risk that cursory readers will assume that the list of countries in the second chart duplicates the first, resulting in their attributing data in the second chart to the wrong country. I've reduced that risk by noting explicitly that the country labels are not duplicates - but it would be better to find a visual solution. Both bar and stack column graphs have the general disadvantage that it is hard to compare the relative size of individual data series that are not located next to the '0' on the x-axis (in this chart, for example, it is hard to see small country level differences between high skilled workers). 

Nevertheless, I think this chart is easier to read than the original. It's easier to see at a glance which countries have high proportions of clerical workers reporting change (and which countries have high overall proportions of workers reporting change).  


A caveat about the data
On looking more closely at the data table underlying these charts, I found that the data for the components within the stacks did not add precisely to the reported total (this isn't apparent in the chart itself). There were no huge differences, but it didn't look like a simple rounding issue. A bit of digging in the European Working Conditions Survey 2010 (EWCS 2010) which the data source (tables available here) suggests that PIAAC derived its breakdowns from respondent counts and 'yes'/'no' row percentages rounded to one decimal place for each occupation/skill category. They did so without taking their derived respondent counts back to the nearest whole person, meaning that the components don't quite agree with the total reported (which is taken directly from the EWCS 2010 data). Basically, they have accumulated some rounding error in recalculating the data. Among other things, that makes the second decimal place in the data tables a nice example of over-precision. 

If I were going to use this data for analysis, I'd recalculate the lot. For most countries the impact is fairly trivial, but for a few with smaller respondent counts (such as Turkey, which has a correctly reported total of 20.8%, but derived components totaling 23.12% for the proportion of workers reporting changes in processes or technology) it is more pronounced.  As my purpose here is just to demonstrate redesign options that are easier to interpret, I've just noted the discrepancy here.


No comments:

Post a Comment