I’m continuing with the politics and data visualisation theme established in my last post. However, I’ll state up front that this is not a political article. I have assiduously stayed silent [on this blog at least] on the topic of my country’s future direction, both in the lead up to the 23rd June poll and in its aftermath. Instead, I’m going to restrict myself to making a point about data visualisation; both how it can inform and how it can mislead.
The exhibit above is my version of one that has appeared in various publications post referendum, both on-line and print. As is referenced, its two primary sources are the UK Electoral Commission and Lord Ashcroft’s polling organisation. The reason why there are two sources rather than one is explained in the notes section below.
With the caveats explained below, the above chart shows the generational divide apparent in the UK Referendum results. Those under 35 years old voted heavily for the UK to remain in the EU; those with ages between 35 and 44 voted to stay in pretty much exactly the proportion that the country as a whole voted to leave; and those over 45 years old voted increasingly heavily to leave as their years advanced.
One thing which is helpful about this exhibit is that it shows in what proportion each cohort voted. This means that the type of inferences I made in the previous paragraph leap off the page. It is pretty clear (visually) that there is a massive difference between how those aged 18-24 and those aged 65+ thought about the question in front of them in the polling booth. However, while the percentage based approach illuminates some things, it masks others. A cursory examination of the chart above might lead one to ask – based on the area covered by red rectangles – how it was that the Leave camp prevailed? To pursue an answer to this question, let’s consider the data with a slightly tweaked version of the same visualisation as below:
[Aside: The eagle-eyed amongst you may notice a discrepancy between the figures shown on the total bars above and the actual votes cast, which were respectively: Remain: 16,141k and Leave: 17,411k. Again see the notes section for an explanation of this.]
A shift from percentages to actual votes recorded casts some light on the overall picture. It now becomes clear that, while a large majority of 18-24 year olds voted to Remain, not many people in this category actually voted. Indeed while, according to the 2011 UK Census, the 18-24 year category makes up just under 12% of all people over 18 years old (not all of whom would necessarily be either eligible or registered to vote) the Ashcroft figures suggest that well under half of this group cast their ballot, compared to much higher turnouts for older voters (once more see the notes section for caveats).
This observation rather blunts the assertion that the old voted in ways that potentially disadvantaged the young; the young had every opportunity to make their voice heard more clearly, but didn’t take it. Reasons for this youthful disengagement from the political process are of course beyond the scope of this article.
However it is still hard (at least for the author’s eyes) to get the full picture from the second chart. In order to get a more visceral feeling for the dynamics of the vote, I have turned to the much maligned pie chart. I also chose to use the even less loved “exploded” version of this.
Here the weight of both the 65+ and 55+ Leave vote stands out as does the paucity of the overall 18-24 contribution; the only two pie slices too small to accommodate an internal data label. This exhibit immediately shows where the referendum was won and lost in a way that is not as easy to glean from a bar chart.
While I selected an exploded pie chart primarily for reasons of clarity, perhaps the fact that the resulting final exhibit brings to mind a shattered and reassembled Union Flag was also an artistic choice. Unfortunately, it seems that this resemblance has a high likelihood of proving all too prophetic in the coming months and years.
I have leveraged age group distributions from the Ascroft Polling organisation to create these exhibits. Other sites – notably the BBC – have done the same and my figures reconcile to the interpretations in other places. However, based on further analysis, I have some reason to think that either there are issues with the Ashcroft data, or that I have leveraged it in ways that the people who compiled it did not intend. Either way, the Ashcroft numbers lead to the conclusion that close to 100% of 55-64 year olds voted in the UK Referendum, which seems very, very unlikely. I have contacted the Ashcroft Polling organisation about this and will post any reply that I receive.
– Peter James Thomas, 14th July 2016
Caveat: I am neither a professional political pollster, nor a statistician. Instead I’m a Pure Mathematician, with a basic understanding of some elements of both these areas. For this reason, the following commentary may not be 100% rigorous; however my hope is that it is nevertheless informative.
In the wake of the UK Referendum on EU membership, a lot of attempts were made to explain the result. Several of these used splits of the vote by demographic attributes to buttress the arguments that they were making. All of the exhibits in this article use age bands, one type of demographic indicator. Analyses posted elsewhere looked at things like the influence of the UK’s social grade classifications (A, B, C1 etc.) on voting patterns, the number of immigrants in a given part of the country, the relative prosperity of different areas and how this has changed over time. Other typical demographic dimensions might include gender, educational achievement or ethnicity.
However, no demographic information was captured as part of the UK referendum process. There is no central system which takes a unique voting ID and allocates attributes to it, allowing demographic dicing and slicing (to be sure a partial and optional version of this is carried out when people leave polling stations after a General Election, but this was not done during the recent referendum).
So, how do so many demographic analyses suddenly appear? To offer some sort of answer here, I’ll take you through how I built the data set behind the exhibits in this article. At the beginning I mentioned that I relied on two data sources, the actual election results published by the UK Electoral Commission and the results of polling carried out by Lord Ashcroft’s organisation. The latter covered interviews with 12,369 people selected to match what was anticipated to be the demographic characteristics of the actual people voting. As with most statistical work, properly selecting a sample with no inherent biases (e.g. one with the same proportion of people who are 65 years or older as in the wider electorate) is generally the key to accuracy of outcome.
Importantly demographic information is known about the sample (which may also be reweighted based on interview feedback) and it is by assuming that what holds true for the sample also holds true for the electorate that my charts are created. So if X% of 18-24 year olds in the sample voted Remain, the assumption is that X% of the total number of 18-24 year olds that voted will have done the same.
12,000 plus is a good sample size for this type of exercise and I have no reason to believe that Lord Ashcroft’s people were anything other than professional in selecting the sample members and adjusting their models accordingly. However this is not the same as having definitive information about everyone who voted. So every exhibit you see relating to the age of referendum voters, or their gender, or social classification is based on estimates. This is a fact that seldom seems to be emphasised by news organisations.
The size of Lord Ashchoft’s sample also explains why the total figures for Leave and Remain on my second exhibit are different to the voting numbers. This is because 5,949 / 12,369 = 48.096% (looking at the sample figures for Remain) whereas 16,141,241 / 33,551,983 = 48.108% (looking at the actual voting figures for Remain). Both figures round to 48.1%, but the small difference in the decimal expansions, when applied to 33 million people, yields a slightly different result.