Curiouser and Curiouser – The Limits of Brexit Voting Analysis

An original illustration from Charles Lutwidge Dodgson's seminal work would have been better, but sadly none such seems to be extant
 
Down the Rabbit-hole

When I posted my Brexit infographic reflecting the age of voters an obvious extension was to add an indication of the number of people in each age bracket who did not vote as well as those who did. This seemed a relatively straightforward task, but actually proved to be rather troublesome (this may be an example of British understatement). Maybe the caution I gave about statistical methods having a large impact on statistical outcomes in An Inconvenient Truth should have led me to expect such issues. In any case, I thought that it would be instructive to talk about the problems I stumbled across and to – once again – emphasise the perils of over-extending statistical models.

Brexit ages infographic
Click to download a larger PDF version in a new window.

Regular readers will recall that my Brexit Infographic (reproduced above) leveraged data from an earlier article, A Tale of two [Brexit] Data Visualisations. As cited in this article, the numbers used were from two sources:

  1. The UK Electoral Commission – I got the overall voting numbers from here.
  2. Lord Ashcroft’s Poling organisation – I got the estimated distribution of votes by age group from here.

In the notes section of A Tale of two [Brexit] Data Visualisations I [prophetically] stated that the breakdown of voting by age group was just an estimate. Based on what I have discovered since, I’m rather glad that I made this caveat explicit.
 
 
The Pool of Tears

In order to work out the number of people in each age bracket who did not vote, an obvious starting point would be the overall electorate, which the UK Electoral Commission stated as being 46,500,001. As we know that 33,551,983 people voted (an actual figure rather than an estimate), then this is where the turnout percentage of 72.2% (actually 72.1548%) came from (33,551,983 / 45,500,001).

A clarifying note, the electorate figures above refer to people who are eligible to vote. Specifically, in order to vote in the UK Referendum, people had to meet the following eligibility criteria (again drawn from the UK Electoral Commission):

To be eligible to vote in the EU Referendum, you must be:

  • A British or Irish citizen living in the UK, or
  • A Commonwealth citizen living in the UK who has leave to remain in the UK or who does not require leave to remain in the UK, or
  • A British citizen living overseas who has been registered to vote in the UK in the last 15 years, or
  • An Irish citizen living overseas who was born in Northern Ireland and who has been registered to vote in Northern Ireland in the last 15 years.

EU citizens are not eligible to vote in the EU Referendum unless they also meet the eligibility criteria above.

So far, so simple. The next thing I needed to know was how the electorate was split by age. This is where we begin to run into problems. One place to start is the actual population of the UK as at the last census (2011). This is as follows:
 

Ages (years) Population % of total
0–4 3,914,000 6.2
5–9 3,517,000 5.6
10–14 3,670,000 5.8
15–19 3,997,000 6.3
20–24 4,297,000 6.8
25–29 4,307,000 6.8
30–34 4,126,000 6.5
35–39 4,194,000 6.6
40–44 4,626,000 7.3
45–49 4,643,000 7.3
50–54 4,095,000 6.5
55–59 3,614,000 5.7
60–64 3,807,000 6.0
65–69 3,017,000 4.8
70–74 2,463,000 3.9
75–79 2,006,000 3.2
80–84 1,496,000 2.4
85–89 918,000 1.5
90+ 476,000 0.8
Total 63,183,000 100.0

 
If I roll up the above figures to create the same age groups as in the Ashcroft analysis (something that requires splitting the 15-19 range, which I have assumed can be done uniformly), I get:
 

Ages (years) Population % of total
0-17 13,499,200 21.4
18-24 5,895,800 9.3
25-34 8,433,000 13.3
35-44 8,820,000 14.0
45-54 8,738,000 13.8
55-64 7,421,000 11.7
65+ 10,376,000 16.4
Total 63,183,000 100.0

 
The UK Government isn’t interested in the views of people under 18[citation needed], so eliminating this row we get:
 

Ages (years) Population % of total
18-24 5,895,800 11.9
25-34 8,433,000 17.0
35-44 8,820,000 17.8
45-54 8,738,000 17.6
55-64 7,421,000 14.9
65+ 10,376,000 20.9
Total 49,683,800 100.0

 
As mentioned, the above figures are from 2011 and the UK population has grown since then. Web-site WorldOMeters offers an extrapolated population of 65,124,383 for the UK in 2016 (this is as at 12th July 2016; if extrapolation and estimates make you queasy, I’d suggest closing this article now!). I’m going to use a rounder figure of 65,125,000 people; there is no point pretending that precision exists where it clearly doesn’t. Making the assumption that such growth is uniform across all age groups (please refer to my previous bracketed comment!), then the above exhibit can also be extrapolated to give us:
 

Ages (years) Population % of total
18-24 6,077,014 11.9
25-34 8,692,198 17.0
35-44 9,091,093 17.8
45-54 9,006,572 17.6
55-64 7,649,093 14.9
65+ 10,694,918 20.9
Total 51,210,887 100.0

 
 
Looking Glass House

So our – somewhat fabricated – figure for the 18+ UK population in 2016 is 51,210,887, let’s just call this 51,200,000. As at the beginning of this article the electorate for the 2016 UK Referendum was 45,500,000 (dropping off the 1 person with apologies to him or her). The difference is explicable based on the eligibility criteria quoted above. I now have a rough age group break down of the 51.2 million population, how best to apply this to the 45.5 million electorate?

I’ll park this question for the moment and instead look to calculate a different figure. Based on the Ashcroft model, what percentage of the UK population (i.e. the 51.2 million) voted in each age group? We can work this one out without many complications as follows:
 

Ages (years)
 
Population
(A)
Voted
(B)
Turnout %
(B/A)
18-24 6,077,014 1,701,067 28.0
25-34 8,692,198 4,319,136 49.7
35-44 9,091,093 5,656,658 62.2
45-54 9,006,572 6,535,678 72.6
55-64 7,649,093 7,251,916 94.8
65+ 10,694,918 8,087,528 75.6
Total 51,210,887 33,551,983 65.5

(B) = Size of each age group in the Ashcroft sample as a percentage multiplied by the total number of people voting (see A Tale of two [Brexit] Data Visualisations).
 
Remember here that actual turnout figures have electorate as the denominator, not population. As the electorate is less than the population, this means that all of the turnout percentages should actually be higher than the ones calculated (e.g. the overall turnout with respect to electorate is 72.2% whereas my calculated turnout with respect to population is 65.5%). So given this, how to explain the 94.8% turnout of 55-64 year olds? To be sure this group does reliably turn out to vote, but did essentially all of them (remembering that the figures in the above table are too low) really vote in the referendum? This seems less than credible.

The turnout for 55-64 year olds in the 2015 General Election has been estimated at 77%, based on an overall turnout of 66.1% (web-site UK Political Info; once more these figures will have been created based on techniques similar to the ones I am using here). If we assume a uniform uplift across age ranges (that “assume” word again!) then one might deduce that an increase in overall turnout from 66.1% to 72.2%, might lead to the turnout in the 55-64 age bracket increasing from 77% to 84%. 84% turnout is still very high, but it is at least feasible; close to 100% turnout in from this age group seems beyond the realms of likelihood.

So what has gone wrong? Well so far the only culprit I can think of is the distribution of voting by age group in the Ashcroft poll. To be clear here, I’m not accusing Lord Ashcroft and his team of sloppy work. Instead I’m calling out that the way that I have extrapolated their figures may not be sustainable. Indeed, if my extrapolation is valid, this would imply that the Ashcroft model over estimated the proportion of 55-64 year olds voting. Thus it must have underestimated the proportion of voters in some other age group. Putting aside the likely fact that I have probably used their figures in an unintended manner, could it be that the much-maligned turnout of younger people has been misrepresented?

To test the validity of this hypothesis, I turned to a later poll by Omnium. To be sure this was based on a sample size of around 2,000 as opposed to Ashcroft’s 12,000, but it does paint a significantly different picture. Their distribution of voter turnout by age group was as follows:
 

Ages (years) Turnout %
18-24 64
25-39 65
40-54 66
55-64 74
65+ 90

 
I have to say that the Omnium age groups are a bit idiosyncratic, so I have taken advantage of the fact that the figures for 25-54 are essentially the same to create a schedule that matches the Ashcroft groups as follows:
 

Ages (years) Turnout %
18-24 64
25-34 65
35-44 65
45-54 65
55-64 74
65+ 90

 
The Omnium model suggests that younger voters may have turned out in greater numbers than might be thought based on the Ashcroft data. In turn this would suggest that a much greater percentage of 18-24 year olds turned out for the Referendum (64%) than for the last General Election (43%); contrast this with an estimated 18-24 turnout figure of 47% based on the just increase in turnout between the General Election and the Referendum. The Omnium estimates do still however recognise that turnout was still greater in the 55+ brackets, which supports the pattern seen in other elections.
 
 
Humpty Dumpty

While it may well be that the Leave / Remain splits based on the Ashcroft figures are reasonable, I’m less convinced that extrapolating these same figures to make claims about actual voting numbers by age group (as I have done) is tenable. Perhaps it would be better to view each age cohort as a mini sample to be treated independently. Based on the analysis above, I doubt that the turnout figures I have extrapolated from the Ashcroft breakdown by age group are robust. However, that is not the same as saying that the Ashcroft data is flawed, or that the Omnium figures are correct. Indeed the Omnium data (at least those elements published on their web-site) don’t include an analysis of whether the people in their sample voted Leave or Remain, so direct comparison is not going to be possible. Performing calculation gymnastics such as using the Omnium turnout for each age group in combination with the Ashcroft voting splits for Leave and Remain for the same age groups actually leads to a rather different Referendum result, so I’m not going to plunge further down this particular rabbit hole.

In summary, my supposedly simple trip to the destitution of an enhanced Brexit Infographic has proved unexpectedly arduous, winding and beset by troubles. These challenges have proved so great that I’ve abandoned the journey and will be instead heading for home.
 
 
Which dreamed it?

Based on my work so far, I have severe doubts about the accuracy of some of the age-based exhibits I have published (versions of which have also appeared on many web-sites, the BBC to offer just one example, scroll down to “How different age groups voted” and note that the percentages cited reconcile to mine). I believe that my logic and calculations are sound, but it seems that I am making too many assumptions about how I can leverage the Ashcroft data. After posting this article, I will accordingly go back and annotate each of my previous posts and link them to these later findings.

I think the broader lesson to be learnt is that estimates are just that, attempts (normally well-intentioned of course) to come up with figures where the actual numbers are not accessible. Sometimes this is a very useful – indeed indispensable – approach, sometimes it is less helpful. In either case estimation should always be approached with caution and the findings ideally sense-checked in the way that I have tried to do above.

Occam’s razor would suggest that when the stats tell you something that seems incredible, then 99 times out of 100 there is an error or inaccurate assumption buried somewhere in the model. This applies when you are creating the model yourself and doubly so where you are relying upon figures calculated by other people. In the latter case not only is there the risk of their figures being inaccurate, there is the incremental risk that you interpret them wrongly, or stretch their broader application to breaking point. I was probably guilty of one or more of the above sins in my earlier articles. I’d like my probable misstep to serve as a warning to other people when they too look to leverage statistics in new ways.

A further point is the most advanced concepts I have applied in my calculations above are addition, subtraction, multiplication and division. If these basic operations – even in the hands of someone like me who is relatively familiar with them – can lead to the issues described above, just imagine what could result from the more complex mathematical techniques (e.g. ambition, distraction, uglification and derision) used by even entry-level data scientists. This perhaps suggests an apt aphorism: Caveat calculator!

Beware the Jabberwock, my son! // The jaws that bite, the claws that catch! // Beware the Jubjub bird, and shun // The frumious Bandersnatch!

 

2 thoughts on “Curiouser and Curiouser – The Limits of Brexit Voting Analysis

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s