Curiouser and Curiouser – The Limits of Brexit Voting Analysis

An original illustration from Charles Lutwidge Dodgson's seminal work would have been better, but sadly none such seems to be extant
Down the Rabbit-hole

When I posted my Brexit infographic reflecting the age of voters an obvious extension was to add an indication of the number of people in each age bracket who did not vote as well as those who did. This seemed a relatively straightforward task, but actually proved to be rather troublesome (this may be an example of British understatement). Maybe the caution I gave about statistical methods having a large impact on statistical outcomes in An Inconvenient Truth should have led me to expect such issues. In any case, I thought that it would be instructive to talk about the problems I stumbled across and to – once again – emphasise the perils of over-extending statistical models.

Brexit ages infographic
Click to download a larger PDF version in a new window.

Regular readers will recall that my Brexit Infographic (reproduced above) leveraged data from an earlier article, A Tale of two [Brexit] Data Visualisations. As cited in this article, the numbers used were from two sources:

  1. The UK Electoral Commission – I got the overall voting numbers from here.
  2. Lord Ashcroft’s Poling organisation – I got the estimated distribution of votes by age group from here.

In the notes section of A Tale of two [Brexit] Data Visualisations I [prophetically] stated that the breakdown of voting by age group was just an estimate. Based on what I have discovered since, I’m rather glad that I made this caveat explicit.
The Pool of Tears

In order to work out the number of people in each age bracket who did not vote, an obvious starting point would be the overall electorate, which the UK Electoral Commission stated as being 46,500,001. As we know that 33,551,983 people voted (an actual figure rather than an estimate), then this is where the turnout percentage of 72.2% (actually 72.1548%) came from (33,551,983 / 45,500,001).

A clarifying note, the electorate figures above refer to people who are eligible to vote. Specifically, in order to vote in the UK Referendum, people had to meet the following eligibility criteria (again drawn from the UK Electoral Commission):

To be eligible to vote in the EU Referendum, you must be:

  • A British or Irish citizen living in the UK, or
  • A Commonwealth citizen living in the UK who has leave to remain in the UK or who does not require leave to remain in the UK, or
  • A British citizen living overseas who has been registered to vote in the UK in the last 15 years, or
  • An Irish citizen living overseas who was born in Northern Ireland and who has been registered to vote in Northern Ireland in the last 15 years.

EU citizens are not eligible to vote in the EU Referendum unless they also meet the eligibility criteria above.

So far, so simple. The next thing I needed to know was how the electorate was split by age. This is where we begin to run into problems. One place to start is the actual population of the UK as at the last census (2011). This is as follows:

Ages (years) Population % of total
0–4 3,914,000 6.2
5–9 3,517,000 5.6
10–14 3,670,000 5.8
15–19 3,997,000 6.3
20–24 4,297,000 6.8
25–29 4,307,000 6.8
30–34 4,126,000 6.5
35–39 4,194,000 6.6
40–44 4,626,000 7.3
45–49 4,643,000 7.3
50–54 4,095,000 6.5
55–59 3,614,000 5.7
60–64 3,807,000 6.0
65–69 3,017,000 4.8
70–74 2,463,000 3.9
75–79 2,006,000 3.2
80–84 1,496,000 2.4
85–89 918,000 1.5
90+ 476,000 0.8
Total 63,183,000 100.0

If I roll up the above figures to create the same age groups as in the Ashcroft analysis (something that requires splitting the 15-19 range, which I have assumed can be done uniformly), I get:

Ages (years) Population % of total
0-17 13,499,200 21.4
18-24 5,895,800 9.3
25-34 8,433,000 13.3
35-44 8,820,000 14.0
45-54 8,738,000 13.8
55-64 7,421,000 11.7
65+ 10,376,000 16.4
Total 63,183,000 100.0

The UK Government isn’t interested in the views of people under 18[citation needed], so eliminating this row we get:

Ages (years) Population % of total
18-24 5,895,800 11.9
25-34 8,433,000 17.0
35-44 8,820,000 17.8
45-54 8,738,000 17.6
55-64 7,421,000 14.9
65+ 10,376,000 20.9
Total 49,683,800 100.0

As mentioned, the above figures are from 2011 and the UK population has grown since then. Web-site WorldOMeters offers an extrapolated population of 65,124,383 for the UK in 2016 (this is as at 12th July 2016; if extrapolation and estimates make you queasy, I’d suggest closing this article now!). I’m going to use a rounder figure of 65,125,000 people; there is no point pretending that precision exists where it clearly doesn’t. Making the assumption that such growth is uniform across all age groups (please refer to my previous bracketed comment!), then the above exhibit can also be extrapolated to give us:

Ages (years) Population % of total
18-24 6,077,014 11.9
25-34 8,692,198 17.0
35-44 9,091,093 17.8
45-54 9,006,572 17.6
55-64 7,649,093 14.9
65+ 10,694,918 20.9
Total 51,210,887 100.0

Looking Glass House

So our – somewhat fabricated – figure for the 18+ UK population in 2016 is 51,210,887, let’s just call this 51,200,000. As at the beginning of this article the electorate for the 2016 UK Referendum was 45,500,000 (dropping off the 1 person with apologies to him or her). The difference is explicable based on the eligibility criteria quoted above. I now have a rough age group break down of the 51.2 million population, how best to apply this to the 45.5 million electorate?

I’ll park this question for the moment and instead look to calculate a different figure. Based on the Ashcroft model, what percentage of the UK population (i.e. the 51.2 million) voted in each age group? We can work this one out without many complications as follows:

Ages (years)
Turnout %
18-24 6,077,014 1,701,067 28.0
25-34 8,692,198 4,319,136 49.7
35-44 9,091,093 5,656,658 62.2
45-54 9,006,572 6,535,678 72.6
55-64 7,649,093 7,251,916 94.8
65+ 10,694,918 8,087,528 75.6
Total 51,210,887 33,551,983 65.5

(B) = Size of each age group in the Ashcroft sample as a percentage multiplied by the total number of people voting (see A Tale of two [Brexit] Data Visualisations).
Remember here that actual turnout figures have electorate as the denominator, not population. As the electorate is less than the population, this means that all of the turnout percentages should actually be higher than the ones calculated (e.g. the overall turnout with respect to electorate is 72.2% whereas my calculated turnout with respect to population is 65.5%). So given this, how to explain the 94.8% turnout of 55-64 year olds? To be sure this group does reliably turn out to vote, but did essentially all of them (remembering that the figures in the above table are too low) really vote in the referendum? This seems less than credible.

The turnout for 55-64 year olds in the 2015 General Election has been estimated at 77%, based on an overall turnout of 66.1% (web-site UK Political Info; once more these figures will have been created based on techniques similar to the ones I am using here). If we assume a uniform uplift across age ranges (that “assume” word again!) then one might deduce that an increase in overall turnout from 66.1% to 72.2%, might lead to the turnout in the 55-64 age bracket increasing from 77% to 84%. 84% turnout is still very high, but it is at least feasible; close to 100% turnout in from this age group seems beyond the realms of likelihood.

So what has gone wrong? Well so far the only culprit I can think of is the distribution of voting by age group in the Ashcroft poll. To be clear here, I’m not accusing Lord Ashcroft and his team of sloppy work. Instead I’m calling out that the way that I have extrapolated their figures may not be sustainable. Indeed, if my extrapolation is valid, this would imply that the Ashcroft model over estimated the proportion of 55-64 year olds voting. Thus it must have underestimated the proportion of voters in some other age group. Putting aside the likely fact that I have probably used their figures in an unintended manner, could it be that the much-maligned turnout of younger people has been misrepresented?

To test the validity of this hypothesis, I turned to a later poll by Omnium. To be sure this was based on a sample size of around 2,000 as opposed to Ashcroft’s 12,000, but it does paint a significantly different picture. Their distribution of voter turnout by age group was as follows:

Ages (years) Turnout %
18-24 64
25-39 65
40-54 66
55-64 74
65+ 90

I have to say that the Omnium age groups are a bit idiosyncratic, so I have taken advantage of the fact that the figures for 25-54 are essentially the same to create a schedule that matches the Ashcroft groups as follows:

Ages (years) Turnout %
18-24 64
25-34 65
35-44 65
45-54 65
55-64 74
65+ 90

The Omnium model suggests that younger voters may have turned out in greater numbers than might be thought based on the Ashcroft data. In turn this would suggest that a much greater percentage of 18-24 year olds turned out for the Referendum (64%) than for the last General Election (43%); contrast this with an estimated 18-24 turnout figure of 47% based on the just increase in turnout between the General Election and the Referendum. The Omnium estimates do still however recognise that turnout was still greater in the 55+ brackets, which supports the pattern seen in other elections.
Humpty Dumpty

While it may well be that the Leave / Remain splits based on the Ashcroft figures are reasonable, I’m less convinced that extrapolating these same figures to make claims about actual voting numbers by age group (as I have done) is tenable. Perhaps it would be better to view each age cohort as a mini sample to be treated independently. Based on the analysis above, I doubt that the turnout figures I have extrapolated from the Ashcroft breakdown by age group are robust. However, that is not the same as saying that the Ashcroft data is flawed, or that the Omnium figures are correct. Indeed the Omnium data (at least those elements published on their web-site) don’t include an analysis of whether the people in their sample voted Leave or Remain, so direct comparison is not going to be possible. Performing calculation gymnastics such as using the Omnium turnout for each age group in combination with the Ashcroft voting splits for Leave and Remain for the same age groups actually leads to a rather different Referendum result, so I’m not going to plunge further down this particular rabbit hole.

In summary, my supposedly simple trip to the destitution of an enhanced Brexit Infographic has proved unexpectedly arduous, winding and beset by troubles. These challenges have proved so great that I’ve abandoned the journey and will be instead heading for home.
Which dreamed it?

Based on my work so far, I have severe doubts about the accuracy of some of the age-based exhibits I have published (versions of which have also appeared on many web-sites, the BBC to offer just one example, scroll down to “How different age groups voted” and note that the percentages cited reconcile to mine). I believe that my logic and calculations are sound, but it seems that I am making too many assumptions about how I can leverage the Ashcroft data. After posting this article, I will accordingly go back and annotate each of my previous posts and link them to these later findings.

I think the broader lesson to be learnt is that estimates are just that, attempts (normally well-intentioned of course) to come up with figures where the actual numbers are not accessible. Sometimes this is a very useful – indeed indispensable – approach, sometimes it is less helpful. In either case estimation should always be approached with caution and the findings ideally sense-checked in the way that I have tried to do above.

Occam’s razor would suggest that when the stats tell you something that seems incredible, then 99 times out of 100 there is an error or inaccurate assumption buried somewhere in the model. This applies when you are creating the model yourself and doubly so where you are relying upon figures calculated by other people. In the latter case not only is there the risk of their figures being inaccurate, there is the incremental risk that you interpret them wrongly, or stretch their broader application to breaking point. I was probably guilty of one or more of the above sins in my earlier articles. I’d like my probable misstep to serve as a warning to other people when they too look to leverage statistics in new ways.

A further point is the most advanced concepts I have applied in my calculations above are addition, subtraction, multiplication and division. If these basic operations – even in the hands of someone like me who is relatively familiar with them – can lead to the issues described above, just imagine what could result from the more complex mathematical techniques (e.g. ambition, distraction, uglification and derision) used by even entry-level data scientists. This perhaps suggests an apt aphorism: Caveat calculator!

Beware the Jabberwock, my son! // The jaws that bite, the claws that catch! // Beware the Jubjub bird, and shun // The frumious Bandersnatch!


Data Management as part of the Data to Action Journey

Data Information Insight Action (w700)

| Larger Version | Detailed and Annotated Version (as PDF) |

This brief article is actually the summation of considerable thought and reflects many elements that I covered in my last two pieces (5 Themes from a Chief Data Officer Forum and 5 More Themes from a Chief Data Officer Forum), in particular both the triangle I used as my previous Data Management visualisation and Peter Aiken’s original version, which he kindly allowed me to reproduce on this site (see here for more information about Peter).

What I began to think about was that both of these earlier exhibits (and indeed many that I have seen pertaining to Data Management and Data Governance) suggest that the discipline forms a solid foundation upon which other areas are built. While there is a lot of truth in this view, I have come round to thinking that Data Management may alternatively be thought of as actively taking part in a more dynamic process; specifically the same iterative journey from Data to Information to Insight to Action and back to Data again that I have referenced here several times before. I have looked to combine both the static, foundational elements of Data Management and the dynamic, process-centric ones in the diagram presented at the top of this article; a more detailed and annotated version of which is available to download as a PDF via the link above.

I have also introduced the alternative path from Data to Insight; the one that passes through Statistical Analysis. Data Management is equally critical to the success of this type of approach. I believe that the schematic suggests some of the fluidity that is a major part of effective Data Management in my experience. I also hope that the exhibit supports my assertion that Data Management is not an end in itself, but instead needs to be considered in terms of the outputs that it helps to generate. Pristine data is of little use to an organisation if it is not then exploited to form insights and drive actions. As ever, this need to drive action necessitates a focus on cultural transformation, an area that is covered in many other parts of this site.

This diagram also calls to mind the subject of where and how the roles of Chief Analytics Officer and Chief Data Officer intersect and whether indeed these should be separate roles at all. These are questions to which – as promised on several previous occasions – I will return to in future articles. For now, maybe my schematic can give some data and information practitioners a different way to view their craft and the contributions that it can make to organisational success.


Forming an Information Strategy: Part III – Completing the Strategy

Forming an Information Strategy
I – General Strategy II – Situational Analysis III – Completing the Strategy

Maybe we could do with some better information, but how to go about getting it? Hmm...

This article is the final of three which address how to formulate an Information Strategy. I have written a number of other articles which touch on this subject [1] and have also spoken about the topic [2]. However I realised that I had never posted an in-depth review of this important area. This series of articles seeks to remedy this omission.

The first article, Part I – General Strategy, explored the nature of strategy, laid some foundations and presented a framework of questions which will need to be answered in order to formulate any general strategy. The second, Part II – Situational Analysis, explained how to adapt the first element of this general framework – The Situational Analysis – to creating an Information Strategy. In Part I, I likened formulating an Information Strategy to a journey, Part III – Completing the Strategy sees us reaching the destination by working through the rest of the general framework and showing how this can be used to produce a fully-formed Information Strategy.

As with all of my other articles, this essay is not intended as a recipe for success, a set of instructions which – if slavishly followed – will guarantee the desired outcome. Instead the reader is invited to view the following as a set of observations based on what I have learnt during a career in which the development of both Information Strategies and technology strategies in general have played a major role.
A Recap of the Strategic Framework

Forth Rail Bridge

I closed Part I of this series by presenting a set of questions, the answers to which will facilitate the formation of any strategy. These have a geographic / journey theme and are as follows:

  1. Where are we?
  2. Where do we want to be instead and why?
  3. How do we get there, how long will it take and what will it cost?
  4. Will the trip be worth it?
  5. What else can we do along the way?

Part II explained the process of answering question 1 through the medium of a Situational Analysis. It is worth pointing out at this juncture that the Situational Analysis will also naturally form the first phase of the more lengthy process of gathering and analysing business requirements. For the purposes of the rest of this article, when such requirements are mentioned, they are taken as being the embryonic ones captured as part of the Situational Analysis.

In this final article I will focus on how to approach obtaining answers to questions 2 to 5. Having spent quite some time considering question 1 in the previous chapter, the content here will be somewhat briefer for the remaining questions; not least as I have covered some of this territory in earlier articles [3].
2. Where do we want to be instead and why?

My thoughts here split into two sub-sections. The second, What does Good look like?, is (as will be obvious from the title) more forward looking than backward. It covers reasons why the destination may be worth the journey. The first is more to do with why staying in the current location may not be a great idea [4]. However, one motivation for not staying put is that somewhere else may well be better. For this reason, there is not definitive border between these two sub-sections and it will be evident from the text that they instead bleed into each other.

2a. Drivers for Change

Change Next Exit

People often say that the gains that result from Information Programmes are intangible. Of course some may indeed be fairly intangible, but even the most ephemeral of these will not be entirely immune from some sort of valuation. Other benefits, when examined closely enough, can turn out to be surprisingly tangible [5]. In making a case for change (and of course the expenditure associated with this) it is good to try to have a balance of tangible and intangible factors. Here is a selection which may be applicable:

Internal IT drivers

  • These often centre around both the cost and confusion associated with a fragmented and inconsistent Information Landscape; something which, even as we head in to 2015, is still not atypical.
  • Opportunity costs may arise from an inability to combine data from different repositories or to roll up data to cover an entire organisation.
  • There is also a case to be made here around things like the licensing costs that result from having too many information repositories and too many tools being used to access them.
  • However, the cost of remediating such fragmentation can often appear in the shape of additional IT headcount devoted to maintaining a complex landscape and additional business headcount devoted to remediating information shortcomings.

Productivity gains

  • Less number crunching, more business-focussed analysis. Often an organisation’s most highly qualified (and highly paid) staff can spend much of their time repeating quotidian tasks that computers could do far more reliably. Freeing up such able and creative people to add more business value should be an objective and should have benefits.
  • At one company I estimated that teams would spend 5-7 days assembling the information necessary to support a meeting with one of a number of key business partners or a major client; our goal became to provide the same information effectively instantaneously; these types of benefits can be costed and also tend to resonate with business stakeholders.

Increasing sales / improving profitability

  • All information programmes (indeed most any business activity) should be dedicated to increasing profitability of course. In some specific industries the leverage of high-quality information is more readily associated with profitability than others. However, with enough time spent understanding the dynamics of an organisation, I would suggest that it is possible to make this linkage in a credible manner in pretty much any industry sector.
  • With respect to sales, sometimes if you want to increase say cross-selling, a very effective way is simply to measure it, maybe by department and salesperson. If there is some reliable way to track this, improvements in cross-selling will inevitably follow.

Mitigating operational risk

  • More reliable, unbiased and transparent production of information can address a number of operational risks; what these are specifically will vary from organisation to organisation.
  • However, most years see some organisation or another have to restate they results – there have been cases where adding two figures rather than subtracting them has led to a later restatement. Cases can often be built around the specific pain points in an organisation, or sometimes even near misses that were caught at the 11th hour.
  • Equally the cost of checking and re-checking figures before publication can be extremely high.

It is also generally worth asking business users what value they would ascribe to improved information, for example what things could they do under new arrangements that they cannot do now? It is important here that any benefits – and in particular any ones which prove to be intangible – are expressed in business language, not technical jargon.

2b. What does Good look like?

OK this dates me - I don't care!

Answering this question is predicated on both experience of successful information improvement programmes and a degree of knowledge about the general information market. There are two main elements here, what does good look like technically and what does it look like from a process / people perspective.

To cover the technical first, this is the simpler area, not least as we have understood how to develop robust, flexible and highly-performing information architectures for at least 15 years.

Integrated Information Architecture (click to view a larger version in a new tab)

The basics are shown in the diagram above [6]. Questions to consider here include:

  • What would a new information architecture look like?
  • What are the characteristics of the new which would indicate that it is an improvement on the old, can these be articulated to non-technical people?
  • What are required elements and how do they relate to the high-level needs captured in the Situational Analysis?
  • How does the proposed architecture relate to incumbent technologies and current staff skills?
  • Can any elements of existing information provision be leveraged, either temporarily or on an ongoing basis?
  • What has worked for other organisations and why would this be pertinent to the organisation in question?
  • Are any new developments in technology pertinent?

Arguably the more important area is the non-technical. Here there is a range of items to consider, some of which are captured in the following exhibit [7]:

Information Process (click to view a larger version  in a new tab)

I could spend an separate set of articles commenting on the elements of the above diagram; indeed I already have and interested readers are directed to the footnotes for links to some of these [8]. However it is worth pointing out the critical role to be played by both user education (a more apt phrase than training) and formal Data Governance. Also certain elements of information tend to work well when they sit within a regular business process; such as a monthly or quarterly review of specific aspects of results and future projections.
3. How do we get there, how long will it take and what will it cost?

Tube ticket machines

3a. Outline an Indicative Programme of Work

I am not going to offer Programme Planning 101 here, but briefly the first step in putting together an indicative programme of work is to decompose the overall journey into chunks, each of which can then be estimated. Each chunk should cover a group of reports / analyses and include activities from requirements gathering through to testing and finally deployment [9]. For the purposes of an indicative programme within a strategy document, the strategist can rely upon both information gathered in the Situational Analysis and their own experience of how to best decompose such work. Ultimately the size and number of the chunks should be dictated by business need, but at this stage estimates can be based upon experience and reasonable assumptions.

It is important that each chunk (or sub-chunk) delivers value and offers an opportunity for the approach and progress to be reviewed. A further factor to consider when estimating these chunks is that they should be delivered at a pace which allows them to be properly digested by users; resource allocations should reflect this. For each chunk the strategist should consider the type and quantum of resource required and the timing with which these are applied.

The indicative programme plan should also include a first phase which relates to reviewing the plan itself. Forming a strategy involves less people than running a programme. Even if initial estimation is carried out very diligently, it is likely that further issues will emerge once more detailed work later commences. As the information programme team ramps up, it is important that time is allocated for new team members to kick the tyres on the plan and make recommendations for improvement.

3b. How much will it cost?

Coins on scales

A big element of cost estimates will be a by-product of the indicative programme plan, which will cover programme duration and the amount of resource required at different points. Some further questions to consider when looking to catalogue costs include the following:

  • What are baseline costs for current information provision?
  • To what degree to these need to be incurred in parallel to an information improvement programme, are there ways to reduce these legacy costs to free up funds for the central programme?
  • What transitional costs are needed to execute the Information Strategy?
    • Hardware and software: is change necessary?
    • People: what is the best balance between internal, contract and outsourced resources, to what degree can existing staff be leveraged without compromising their current responsibilities?
    • How will costs vary by programme phase, will these taper as elements of older information systems are replaced by new facilities?
    • Can costs be reduced by having people play different roles at different points in the programme?
  • What costs will be ongoing once the strategy has been executed?
  • How do these compare to the current baseline?
  • Sometimes one aim of an Information Strategy will be to reduce to cost of ongoing support and maintenance, if so, how will this be achieved and how will any transition be managed?

A consideration here is whether the most important thing is to maximise speed of delivery or minimise risk? Things that will reduce risk could include: initial exploratory phases; starting with a small number of programme resources and increasing these based only on success; and instigating appropriate governance processes. However each of these will also increase duration and therefore cost. In some areas a trade off will be necessary and which side of these equations is more important will vary from organisation to organisation.
4. Will the trip be worth it?

Pros and cons

Answering parts of question 2 will help with getting a handle on potential benefits of executing an Information Strategy. Work on question 3 will get us an idea of the timeframes and costs involved. There is a need to combine the two of these into a cost / benefit analysis. This should be an honest and transparent assessment of the potential payback of adopting the Information Strategy. Given that most Information Strategies will take more than a year to implement and that benefits may equally be realised on an ongoing basis, it will generally make sense to look at figures over a 3-5 year period. It may be possible to draw up a quasi-P&L statement showing the impact of adopting the strategy, such an approach can resonate with senior stakeholders.

Points to recall and questions to consider here include:

  • Costs will emerge from the Indicative Programme Plan, but remember the ongoing costs of maintaining existing information capabilities.
  • As with most initiatives, the benefits of information programmes split into tangible and intangible components:
    • Where possible make benefits tangible even if this requires a degree of guesstimation [10].
    • Remember that many supposed intangibles can be estimated with some thought.
  • What benefits have other companies seen from similar programmes, particularly ones in the same industry sector?
  • Is it possible to perform “what if?” scenarios with current and future capabilities; could better information could have led to better outcomes? [11]
  • Ask business people to estimate the impact of better information.
  • Intangible benefits resonate where they are expressed in clear business language, not IT speak.

It should be borne in mind here that the cost / benefit analysis may not add up. If this is the case, then either a less expensive approach is more suitable for the company, or the potential benefits need to be looked at again. Where progress can genuinely not be made on either of these areas, the responsible strategist will acknowledge that doing nothing may well be the logical approach for the organisation in question.
5. What else can we do along the way?

Here be elephants

Finally, it is worth noting that short-term tactical deliveries can strongly support a strategy [12]. Interim work can meet urgent business needs in a timely manner. This is a substantial benefit in itself and also evidences progress in the area of improving information capabilities. It also demonstrates that that the programme team understands commercial pressures. This type of work is also complementary in that it can be used to:

  • Validate some elements of the cost / benefit analysis.
  • Round out requirements gathering.
  • Highlight any areas which have been overlooked.
  • Provide invaluable deployment and training experience, which can be leveraged for the implementation of more strategic capabilities.

It can also be useful make mistakes early and with small deliverables, not later with major ones. For these reasons, it is suggested that any Information Strategy should embrace “throw away” work. However this should be reflected in the overall programme plan and resources should be specifically allocated to this area. If this is not done, then tactical work can easily overwhelm the team and prevent progress on more strategic areas from being made; generally a death knell for a programme.
A Recap of the Main Points

  1. Carry out a Situational Analysis.
  2. As part of this, start the process of capturing High-level Business Requirements.
  3. Establish Drivers for Change, what benefits can be realised by better information, or by producing information in a better way?
  4. Ask “What Does Good Look Like?”, from both a technical and a process / people point of view.
  5. Develop an Indicative Programme of Work with realistic resource estimates and durations.
  6. Estimate Current, Transitional and Ongoing Costs.
  7. Itemise some of the major Interim Deliverables.
  8. Create a Cost / Benefits Analysis.

Bringing everything together

Chickie in dee Basget! Ing vurn spuur dee Chickie, Uun yeh vurn spay dee Basget!

There is a need to take the detailed work described over the course of the last three articles and the documentation which has been created as part of the process and to distill these down into a format that is digestible by senior management. There is no silver bullet here, summarising screeds of detail in a way that preserves the main points and presents them in a way that resonates is not easy. It takes judgement, an understanding of how businesses operate and strong analytical, writing and often diagrammatic skills. These will not be acquired by reading a blog article, but by honing experience and expertise over many years of work. To an extent, producing relevant and cogent summaries is where good IT professionals earn their money.

Unfortunately, at the time of writing, there is no book entitled Summarising Complex Issues for Dummies [13], [14].

This article and its two predecessors have been akin to listing the ingredients required to make a complex meal. While it is difficult to make great food without good ingredients or with some key spice missing, these things are not sufficient to ensure culinary excellence; what is also needed is a competent chef [15]. I cook a lot myself and, whenever I try a recipe for the first time, it can be a bit fraught. Sometimes I don’t get all of the elements of the meal ready at the same time, sometimes while I’m paying attention to reading the instructions for one part, another part boils over, or gets burnt. These problems with cooking tend dissipate with repetition. In the same way, what is generally needed in developing a sound Information Strategy is the equivalents great ingredients, a competent chef and an experienced one as well.

Forming an Information Strategy
I – General Strategy II – Situational Analysis III – Completing the Strategy


These include (in chronological order):

IRM European Data Warehouse and Business Intelligence Conference
– November 2012
Where this is the case, I will of course provide links back to my previous work.
Some of the factors here may come to light as a result of the previous Situational Analysis of course.
I grapple with estimating the potential payback of Information Programmes in a series of earlier articles:

This is an expanded version of the diagram I posted as part of Using multiple business intelligence tools in an implementation – Part I back in May 2009. I have elided details such as the fine structure of the warehouse (staging, relational, multidimensional etc.), master data sources and also which parts of it are accessed by different tools and different types of users. In a severe breach with the traditional IT approach, I have also left some arrows out.
This is an updated version of an exhibit I put together working with an actuarial colleague back in 2001, early in my journey into information improvement programmes.
These include my trilogy on the change management aspects of information programmes:

and a number of articles relating to Data Governance / Data Quality, notably:

Sometimes the first level of decomposition will need to be broken up into further and smaller chunks with this process iterating until the strategist reaches tasks which they are happy to estimate with a degree of certainty.
It may make sense to have different versions of the cost / benefit analysis, more conservative ones including only the most tangible benefits and more aggressive ones taking in to account benefits which have to be somewhat less certain.
Again see the series of three articles starting with Using historical data to justify BI investments – Part I.
For further thoughts on the strategic benefits of tactical work see:

Given both the two interpretations of this phrase and the typical audience for summaries of strategies, perhaps this is a fortunate thing.
I did however find the following title:

I can't however seem to find either Quantum Chromodynamics or Brain Surgery for Dummies

Contrary to the image above, a muppet (in the English sense of the word) won’t suffice.



Trouble at the top


Several weeks back now, I presented at IRM’s collocated European Master Data Management Summit and Data Governance Conference. This was my second IRM event, having also spoken at their European Data Warehouse and Business Intelligence Conference back in 2010. The conference was impeccably arranged and the range of speakers was both impressive and interesting. However, as always happens to me, my ability to attend meetings was curtailed by both work commitments and my own preparations. One of these years I will go to all the days of a seminar and listen to a wider variety of speakers.

Anyway, my talk – entitled Making Business Intelligence an Integral part of your Data Quality Programme – was based on themes I had introduced in Using BI to drive improvements in data quality and developed in Who should be accountable for data quality?. It centred on the four-pillar framework that I introduced in the latter article (yes I do have a fetish for four-pillar frameworks as per):

The four pillars of improved data quality

Given my lack of exposure to the event as a whole, I will restrict myself to writing about a comment that came up in the question section of my slot. As per my article on presenting in public, I try to always allow time at the end for questions as this can often be the most interesting part of the talk; for delegates and for me. My IRM slot was 45 minutes this time round, so I turned things over to the audience after speaking for half-an-hour.

There were a number of good questions and I did my best to answer them, based on past experience of both what had worked and what had been less successful. However, one comment stuck in my mind. For obvious reasons, I will not identify either the delegate, or the organisation that she worked for; but I also had a brief follow-up conversation with her afterwards.

She explained that her organisation had in place a formal data governance process and that a lot of time and effort had been put into communicating with the people who actually entered data. In common with my first pillar, this had focused on educating people as to the importance of data quality and how this fed into the organisation’s objectives; a textbook example of how to do things, on which the lady in question should be congratulated. However, she also faced an issue; one that is probably more common than any of us information professionals would care to admit. Her problem was not at the bottom, or in the middle of her organisation, but at the top.

So how many miles per gallon do you get out of that?

In particular, though data governance and a thorough and consistent approach to both the entry of data and transformation of this to information were all embedded into the organisation; this did not prevent the leaders of each division having their own people take the resulting information, load it into Excel and “improve” it by “adjusting anomalies”, “smoothing out variations”, “allowing for the impact of exceptional items”, “better reflecting the opinions of field operatives” and the whole panoply of euphemisms for changing figures so that they tell a more convenient story.

In one sense this was rather depressing, someone having got so much right, but still facing challenges. However, it also chimes with another theme that I have stressed many times under the banner of cultural transformation; it is crucially important than any information initiative either has, or works assiduously to establish, the active support of all echelons of the organisation. In some of my most successful BI/DW work, I have had the benefit of the direct support of the CEO. Equally, it is is very important to ensure that the highest levels of your organisation buy in before commencing on a stepped-change to its information capabilities.

I am way overdue employing another sporting analogy - odd however how must of my rugby-related ones tend to be non-explicit

My experience is that enhanced information can have enormous payback. But it is risky to embark on an information programme without this being explicitly recognised by the senior management team. If you avoid laying this important foundation, then this is simply storing up trouble for the future. The best BI/DW projects are totally aligned with the strategic goals of the organisation. Given this, explaining their objectives and soliciting executive support should be all the easier. This is something that I would encourage my fellow information professionals to seek without exception.

How to use your BI Tool to Highlight Deficiencies in Data

My interview with Microsoft’s Bruno Aziza (@brunoaziza), which I trailed in Another social media-inspired meeting, was published today on his interesting and entertaining site.

You can take a look at the canonical version here and the YouTube version appears below:

The interview touches on themes that I have discussed in:


Thanks to Jim Harris’ OCDQ Blog

I would like to start 2011 by thanking Jim Harris for selecting one of my articles – Who should be accountable for data quality? – as a Best Data Quality Blog Post Of 2010 on his Obsessive Compulsive Data Quality blog.

I would recommend Jim’s excellent site as a great repository for current thinking and best practise in this crucial area.

The Business Intelligence / Data Quality symbiosis

The possible product of endosymbiosis of proteobacteria and eukaryots

As well as sounding like the title of an episode of The Big Bang Theory, the above phrase is one I just used when commenting on an article from the Data and Process Advantage Blog.

I rather like it and think it encapsulates the points that I have tried to make in my earlier post, Using BI to drive improvements in data quality.

I’m not sure whether Google evidence would stand up in court, but I may have coined a new phrase here:

Search for “Business Intelligence Data Quality symbiosis”