Since its launch in August of this year, the peterjamesthomas.com Data and Analytics Dictionary has received a welcome amount of attention with various people on different social media platforms praising its usefulness, particularly as an introduction to the area. A number of people have made helpful suggestions for new entries or improvements to existing ones. I have also been rounding out the content with some more terms relating to each of Data Governance, Big Data and Data Warehousing. As a result, The Dictionary now has over 80 main entries (not including ones that simply refer the reader to another entry, such as Linear Regression, which redirects to Model).
“It is a truth universally acknowledged, that an organisation in possession of some data, must be in want of a Chief Data Officer”
— Growth and Governance, by Jane Austen (1813) 
I wrote about a theoretical job description for a Chief Data Officer back in November 2015 . While I have been on “paternity leave” following the birth of our second daughter, a couple of genuine CDO job specs landed in my inbox. While unable to respond for the aforementioned reasons, I did leaf through the documents. Something immediately struck me; they were essentially wish-lists covering a number of data-related fields, rather than a description of what a CDO might actually do. Clearly I’m not going to cite the actual text here, but the following is representative of what appeared in both requirement lists:
Solid commercial understanding and 5 years spent in [insert industry sector here]
The above list may have descended into farce towards the end, but I would argue that the problems started to occur much earlier. The above is not a description of what is required to be a successful CDO, it’s a description of a Swiss Army Knife. There is also the minor practical point that, out of a World population of around 7.5 billion, there may well be no one who ticks all the boxes .
Let’s make the fallacy of this type of job description clearer by considering what a simmilar approach would look like if applied to what is generally the most senior role in an organisation, the CEO. Whoever drafted the above list of requirements would probably characterise a CEO as follows:
The best salesperson in the organisation
The best accountant in the organisation
The best M&A person in the organisation
The best customer service operative in the organisation
The best facilities manager in the organisation
The best janitor in the organisation
The best purchasing clerk in the organisation
The best lawyer in the organisation
The best programmer in the organisation
The best marketer in the organisation
The best product developer in the organisation
The best HR person in the organisation, etc., etc., …
Of course a CEO needs to be none of the above, they need to be a superlative leader who is expert at running an organisation (even then, they may focus on plotting the way forward and leave the day to day running to others). For the avoidance of doubt, I am not saying that a CEO requires no domain knowledge and has no expertise, they would need both, however they don’t have to know every aspect of company operations better than the people who do it.
The same argument applies to CDOs. Domain knowledge probably should span most of what is in the job description (save for maybe the three items with footnotes), but knowledge is different to expertise. As CDOs don’t grow on trees, they will most likely be experts in one or a few of the areas cited, but not all of them. Successful CDOs will know enough to be able to talk to people in the areas where they are not experts. They will have to be competent at hiring experts in every area of a CDO’s purview. But they do not have to be able to do the job of every data-centric staff member better than the person could do themselves. Even if you could identify such a CDO, they would probably lose their best staff very quickly due to micromanagement.
A CDO has to be a conductor of both the data function orchestra and of the use of data in the wider organisation. This is a talent in itself. An internationally renowned conductor may have previously been a violinist, but it is unlikely they were also a flautist and a percussionist. They do however need to be able to tell whether or not the second trumpeter is any good or not; this is not the same as being able to play the trumpet yourself of course. The conductor’s key skill is in managing the efforts of a large group of people to create a cohesive – and harmonious – whole.
The CDO is of course still a relatively new role in mainstream organisations . Perhaps these job descriptions will become more realistic as the role becomes more familiar. It is to be hoped so, else many a search for a new CDO will end in disappointment.
Having twisted her text to my own purposes at the beginning of this article, I will leave the last words to Jane Austen:
“A scheme of which every part promises delight, can never be successful; and general disappointment is only warded off by the defence of some little peculiar vexation.”
Most readers will immediately spot the obvious mistake here. Of course all three of these requirements should be mandatory.
To take just one example, gaining a PhD in a numerical science, a track record of highly-cited papers and also obtaining an MBA would take most people at least a few weeks of effort. Is it likely that such a person would next focus on a PRINCE2 or TOGAF qualification?
I find myself frequently being asked questions around terminology in Data and Analytics and so thought that I would try to define some of the more commonly used phrases and words. My first attempt to do this can be viewed in a new page added to this site (this also appears in the site menu):
I plan to keep this up-to-date as the field continues to evolve.
I hope that my efforts to explain some concepts in my main area of specialism are both of interest and utility to readers. Any suggestions for new entries or comments on existing ones are more than welcome.
This article draws extensively on elements of the framework I use to both highlight and manage risks on data programmes. It has its genesis in work that I did early in 2012 (but draws on experience from the years before this). I have tried to refresh the content since then to reflect new thinking and new developments in the data arena.
What are my motivations in publishing this article? Well I have both designed and implemented data and information programmes for over 17 years. In the majority of cases my programme work has been a case of executing a data strategy that I had developed myself . While I have generally been able to steer these programmes to a successful outcome , there have been both bumps in the road and the occasional blind alley, requiring a U-turn and another direction to be selected. I have also been able to observe data programmes that ran in parallel to mine in different parts of various organisations. Finally, I have often been asked to come in and address issues with an existing data programme; something that appears to happens all too often. In short I have seen a lot of what works and what does not work. Having also run other types of programmes , I can also attest to data programmes being different. Failure to recognise this difference and thus approaching a data programme just like any other piece of work is one major cause of issues .
Before I get into my list proper, I wanted to pause to highlight a further couple of mistakes that I have seen made more than once; ones that are more generic in nature and thus don’t appear on my list of 20 risks. The first is to assume that the way that an organisation’s data is controlled and leveraged can be improved in a sustainable way by just kicking off a programme. What is more important in my experience is to establish a data function, which will then help with both the governance and exploitation of data. This data function, ideally sitting under a CDO, will of course want to initiate a range of projects, from improving data quality, to sprucing up reporting, to establishing better analytical capabilities. Best practice is to gather these activities into a programme, but things work best if the data function is established first, owns such a programme and actively partakes in its execution.
As well as the issue of ongoing versus transitory accountability for data and the undoubted damage that poorly coordinated change programmes can inflict on data assets, another driver for first establishing a data function is that data needs will always be there. On the governance side, new systems will be built, bought and integrated, bringing new data challenges. On the analytical side, there will always be new questions to be answered, or old ones to be reevaluated. While data-centric efforts will generate many projects with start and end dates, the broad stream of data work continues on in a way that, for example, the implementation of a new B2C capability does not.
The second is to believe that you will add lasting value by outsourcing anything but targeted elements of your data programme. This is not to say that there is no place for such arrangements, which I have used myself many times, just that one of the lasting benefits of gimlet-like focus on data is the IP that is built up in the data team; IP that in my experience can be leveraged in many different and beneficial ways, becoming a major asset to the organisation .
Having made these introductory comments, let’s get on to the main list, which is divided into broadly chronological sections, relating to stages of the programme. The 10 risks which I believe are either most likely to materialise, or which will probably have the greatest impact are highlighted in pale yellow.
Not appreciating the size of work for both business and technology resources.
Team is set up to fail – it is neither responsive enough to business needs (resulting in yet more “unofficial” repositories and additional fragmentation), nor is appropriate progress is made on its central objective.
Not establishing a dedicated team.
The team never escapes from “the day job” or legacy / BAU issues; the past prevents the future from being built.
Not establishing a unified and collaborative team.
Team is plagued by people pursuing their own agendas and trashing other people’s approaches, this consumes management time on non-value-added activities, leads to infighting and dissipates energy.
Staff lack skills and prior experience of data programmes.
Time spent educating people rather than getting on with work. Sub-optimal functionality, slippages, later performance problems, higher ongoing support costs.
Not establishing an appropriate management / governance structure.
Programme is not aligned with business needs, is not able to get necessary time with business users and cannot negotiate the inevitable obstacles that block its way. As a result, the programme gets “stuck in the mud”.
Failing to recognise ongoing local needs when centralising.
Local business units do not have their pressing needs attended to and so lose confidence in the programme and instead go their own way. This leads to duplication of effort, increased costs and likely programme failure.
With risk 2 an analogy is trying to build a house in your spare time. If work can only be done in evenings or at the weekend, then this is going to take a long time. Nevertheless organisations too frequently expect data programmes to be absorbed in existing headcount and fitted in between people’s day jobs.
We can we extend the building metaphor to cover risk 4. If you are going to build your own house, it would help that you understand carpentry, plumbing, electricals and brick-laying and also have a grasp on the design fundamentals of how to create a structure that will withstand wind rain and snow. Too often companies embark on data programmes with staff who have a bit of a background in reporting or some related area and with managers who have never been involved in a data programme before. This is clearly a recipe for disaster.
Risk 5 reminds us that governance is also important – both to ensure that the programme stays focussed on business needs and also to help the team to negotiate the inevitable obstacles. This comes back to a successful data programme needing to be more than just a technology project.
Programme Execution Risks
Poor programme management.
The programme loses direction. Time is expended on non-core issues. Milestones are missed. Expenditure escalates beyond budget.
Poor programme communication.
Stakeholders have no idea what is happening . The programme is viewed as out of touch / not pertinent to business issues. Steering does not understand what is being done or why. Prospective users have no interest in the programme.
Big Bang approach.
Too much time goes by without any value being created. The eventual Big Bang is instead a damp squib. Large sums of money are spent without any benefits.
Endless search for the perfect solution / adherence to overly theoretical approaches.
Programme constantly polishes rocks rather than delivering. Data models reflect academic purity rather than real-world performance and maintenance needs.
Lack of focus on interim deliverables.
Business units become frustrated and seek alternative ways to meet their pressing needs. This leads to greater fragmentation and reputational damage to programme.
Insufficient time spent understanding source system data and how data is transformed as it flows between systems.
Data capabilities that do not reflect business transactions with fidelity. There is inconsistency with reports directly drawn from source systems. Reconciliation issues arise (see next point).
If analytical capabilities do not tell a consistent story, they will not be credible and will not be used.
Strong approach to data quality.
Data facilities are seen as inaccurate because of poor data going into them. Data facilities do not match actual business events due to either massaging of data or exclusion of transactions with invalid attributes.
Probably the single most common cause of failure with data programmes – and indeed or ERP projects and acquisitions and any other type of complex endeavour – is risk 7, poor programme management. Not only do programme managers have to be competent, they should also be steeped in data matters and have a good grasp of the factors that differentiate data programmes from more general work.
Relating to the other highlighted risks in this section, the programme could spend two years doing work without surfacing anything much and then, when they do make their first delivery, this is a dismal failure. In the same vein, exclusive focus on strategic capabilities could prevent attention being paid to pressing business needs. At the other end of the spectrum, interim deliveries could spiral out of control, consuming all of the data team’s time and meaning that the strategic objective is never reached. A better approach is that targeted and prioritised interims help to address pressing business needs, but also inform more strategic work. From the other perspective, progress on strategic work-streams should be leveraged whenever it can be, perhaps in less functional manners that the eventual solution, but good enough and also helping to make sure that the final deliveries are spot on .
User Requirement Risks
Not enough up-front focus on understanding key business decisions and the information necessary to take them.
Analytic capabilities do not focus on what people want or need, leading to poor adoption and benefits not being achieved.
In the absence of the above, the programme becoming a technology-driven one.
The business gets what IT or Change think that they need, not what is actually needed. There is more focus on shiny toys than on actionable information. The programme forgets the needs of its customers.
A focus on replicating what the organisation already has but in better tools, rather than creating what it wants.
Beautiful data visualisations that tell you close to nothing. Long lists of existing reports with their fields cross-referenced to each other and a new solution that is essentially the lowest common denominator of what is already in place; a step backwards.
The other most common reasons for data programme failure is a lack of focus on user needs and insufficient time spent with business people to ensure that systems reflect their requirements .
Lack of leverage of new data capabilities in front-end / digital systems.
These systems are less effective. The data team is jealous about its capabilities being the only way that users should get information, rather than adopting a more pragmatic and value-added approach.
It is important for the data team to realise that their work, however important, is just one part of driving a business forward. Opportunities to improve other system facilities by the leverage of new data structures should be taken wherever possible.
Education is an afterthought, training is technology- rather than business-focused.
People neither understand the capabilities of new analytical tools, nor how to use them to derive business value. Again this leads to poor adoption and little return on investment.
Declaring success after initial implementation and training.
Without continuing to water the immature roots, the plant withers. Early adoption rates fall and people return to how they were getting information pre-launch. This means that the benefits of the programme not realised.
Finally excellent technical work needs to be complemented with equal attention to business-focussed education, training using real-life scenarios and assiduous follow up. These things will make or break the programme .
Of course I don’t claim that the above list is exhaustive. You could successfully mitigate all of the above risks on your data programme, but still get sunk by some other unforeseen problem arising. There is a need to be flexible and to adapt to both events and how your organisation operates; there are no guarantees and no foolproof recipes for success .
My recommendation to data professionals is to develop your own approach to risk management based on your own experience, your own style and the culture within which you are operating. If just a few of the items on my list of risks can be usefully amalgamated into this, then I will feel that this article has served its purpose. If you are embarking on a data programme, maybe your first one, then be warned that these are hard and your reserves of perseverance will be tested. I’d suggest leveraging whatever tools you can find in trying to forge ahead.
It is also maybe worth noting that, somewhat contrary to my point that data programmes are different, a few of the risks that I highlight above could be tweaked to apply to more general programmes as well. Hopefully the things that I have learnt over the last couple of decades of running data programmes will be something that can be of assistance to you in your own work.
For my thoughts on developing data (or interchangeably) information strategies see:
I verbally “scribbled” something quite like the exhibit above recently in conversation with a longstanding professional associate. This was while we were discussing where the CDO role currently sat in some organisations and his or her span of responsibilities. We agreed that – at least in some cases – the role was defined sub-optimally with reference to the axes in my virtual diagram.
This discussion reminded me that I was overdue a piece commenting on November’s IRM(UK) CDO Executive Forum; the third in a sequence that I have covered in these pages , . In previous CDO Exec Forum articles, I have focussed mainly on the content of the day’s discussions. Here I’m going to be more general and bring in themes from the parent event; IRM(UK) Enterprise Data / Business Intelligence 2016. However I will later return to a theme central to the Exec Forum itself; the one that is captured in the graphic at the head of this article.
As well as attending the CDO Forum, I was speaking at the umbrella event. The title of my talk was Data Management, Analytics, People: An Eternal Golden Braid.
The real book, whose title I had plagiarised, is Gödel, Escher and Bach, an Eternal Golden braid, by Pulitzer-winning American Author and doyen of 1970s pop-science books, Douglas R. Hofstadter . This book, which I read in my youth, explores concepts in consciousness, both organic and machine-based, and their relation to recursion and self-reference. The author argued that these themes were major elements of the work of each of Austrian Mathematician Kurt Gödel (best known for his two incompleteness theorems), Dutch graphic artist Maurits Cornelis Escher (whose almost plausible, but nevertheless impossible buildings and constantly metamorphosing shapes adorn both art galleries and college dorms alike) and German composer Johann Sebastian Bach (revered for both the beauty and mathematical elegance of his pieces, particularly those for keyboard instruments). In an age where Machine Learning and other Artificial Intelligence techniques are moving into the mainstream – or at least on to our Smartphones – I’d recommend this book to anyone who has not had the pleasure of reading it.
In my talk, I didn’t get into anything as metaphysical as Hofstadter’s essays that intertwine patterns in Mathematics, Art and Music, but maybe some of the spirit of his book rubbed off on my much lesser musings. In any case, I felt that my session was well-received and one particular piece of post-presentation validation had me feeling rather like these guys for the rest of the day:
What happened was that a longstanding internet contact  sought me out and commended me on both my talk and the prescience of my July 2009 article, Is the time ripe for appointing a Chief Business Intelligence Officer? He argued convincingly that this foreshadowed the emergence of the Chief Data Officer. While it is an inconvenient truth that Visa International had a CDO eight years earlier than my article appeared, on re-reading it, I was forced to acknowledge that there was some truth in his assertion.
To return to the matter in hand, one point that I made during my talk was that Analytics and Data Management are two sides of the same coin and that both benefit from being part of the same unitary management structure. By this I mean each area reporting into an Executive who has a strong grasp of what they do, rather than to a general manager. More specifically, I would see Data Compliance work and Data Synthesis work each being the responsibility of a CDO who has experience in both areas.
It may seem that crafting and implementing data policies is a million miles from data visualisation and machine learning, but to anyone with a background in the field, they are much more strongly related. Indeed, if managed well (which is often the main issue), they should be mutually reinforcing. Thus an insightful model can support business decision-making, but its authors would generally be well-advised to point out any areas in which their work could be improved by better data quality. Efforts to achieve the latter then both improve the usefulness of the model and help make the case for further work on data remediation; a virtuous circle.
Here we get back to the vertical axis in my initial diagram. In many organisations, the CDO can find him or herself at the extremities. Particularly in Financial Services, an industry which has been exposed to more new regulation than many in recent years, it is not unusual for CDOs to have a Risk or Compliance background. While this is very helpful in areas such as Governance, it is less of an asset when looking to leverage data to drive commercial advantage.
Symmetrically, if a rookie CDO was a Data Scientist who then progressed to running teams of Data Scientists, they will have a wealth of detailed knowledge to fall back on when looking to guide business decisions, but less familiarity with the – sometimes apparently thankless, and generally very arduous – task of sorting out problems in data landscapes.
Despite this, it is not uncommon to see CDOs who have a background in just one of these two complementary areas. If this is the case, then the analytics expert will have to learn bureaucratic and programme skills as quickly as they can and the governance guru will need to expand their horizons to understand the basics of statistical modelling and the presentation of information in easily digestible formats. It is probably fair to say that the journey to the centre is somewhat perilous when either extremity is the starting point.
Let’s now think about the second and horizontal axis. In some organisations, a newly appointed CDO will be freshly emerged from the ranks of IT (in some they may still report to the CIO, though this is becoming more of an anomaly with each passing year). As someone whose heritage is in IT (though also from very early on with a commercial dimension) I understand that there are benefits to such a career path, not least an in-depth understanding of at least some of the technologies employed, or that need to be employed. However a technology master who is also a business neophyte is unlikely to set the world alight as a newly-minted CDO. Such people will need to acquire new skills, but the learning curve is steep.
To consider the other extreme of this axis, it is undeniable that a CDO organisation will need to undertake both technical and technological work (or at least to guide this in other departments). Therefore, while an in-depth understanding of a business, its products, markets, customers and competitors will be of great advantage to a new CDO, without at least a reasonable degree of technical knowledge, they may struggle to connect with some members of their team; they may not be able to immediately grasp what technology tasks are essential and which are not; and they may not be able to paint an accurate picture of what good looks like in the data arena. Once more rapid assimilation of new information and equally rapid acquisition of new skills will be called for.
At this point it will be pretty obvious that my central point here is that the “sweet spot” for a CDO, the place where they can have greatest impact on an organisation and deliver the greatest value, is at the centre point of both of these axes. When I was talking to my friend about this, we agreed that one of the reasons why not many CDOs sit precisely at this nexus is because there are few people with equal (or at least balanced) expertise in the business and technology fields; few people who understand both data synthesis and data compliance equally well; and vanishingly few who sit in the centre of both of these ranges.
Perhaps these facts would also have been apparent from revewing the CDO job description I posted back in November 2015 as part of Wanted – Chief Data Officer. However, as always, a picture paints a thousand words and I rather like the compass-like exhibit I have come up with. Hopefully it conveys a similar message more rapidly and more viscerally.
To bring things back to the IRM(UK) CDO Executive Forum, I felt that issues around where delegates sat on my CDO “sweet spot” diagram (or more pertinently where they felt that they should sit) were a sub-text to many of our discussions. It is worth recalling that the mainstream CDO is still an emergent role and a degree of confusion around what they do, how they do it and where they sit in organisations is inevitable. All CxO roles (with the possible exception of the CEO) have gone through similar journeys. It is probably instructive to contrast the duties of a Chief Risk Officer before 2008 with the nature and scope of their responsibilities now. It is my opinion that the CDO role (and individual CDOs) will travel an analogous path and eventually also settle down to a generally accepted set of accountabilities.
In the meantime, if your organisation is lucky enough to have hired one of the small band of people whose experience and expertise already place them in the CDO “sweet spot”, then you are indeed fortunate. If not, then not all is lost, but be prepared for your new CDO to do a lot of learning on the job before they too can join the rather exclusive club of fully rounded CDOs.
As an erstwhile Mathematician, I’ve never seen a framework that I didn’t want to generalise. It occurs to me and – I assume – will also occur to many readers that the North / South and East / West diagram I have created could be made even more compass-like by the addition of North East / South West and North West / South East axes, with our idealised CDO sitting in the middle of these spectra as well .
Readers can debate amongst themselves what the extremities of these other dimensions might be. I’ll suggest just a couple: “Change” and “Business as Usual”. Given how organisations seem to have evolved in recent years, it is often unfortunately a case of never the twain shall meet with these two areas. However a good CDO will need to be adept at both and, from personal experience, I would argue that mastery of one does not exclude mastery of the other.
The main reasons for delay were a house move and a succession of illnesses in my family – me included – so I’m going to give myself a pass.
The sub-title was A Metaphorical Fugue On The Data ⇨ Information ⇨ Insight ⇨ Action Journey in The Spirt Of Douglas R. Hofstadter, which points to the inspiration behind my talk rather more explicity.
Douglas R. Hofstadter is the son of Nobel-wining physicist Robert Hofstadter. Prize-winning clearly runs in the Hofstadter family, much as with the Braggs, Bohrs, Curies, Euler-Chelpins, Kornbergs, Siegbahns, Tinbergens and Thomsons.
I am omitting any names or other references to save his blushes.
I could have gone for three or four dimensional Cartesian coordinates as well I realise, but sometimes (very rarely it has to be said) you can have too much Mathematics.
When I posted my Brexit infographic reflecting the age of voters an obvious extension was to add an indication of the number of people in each age bracket who did not vote as well as those who did. This seemed a relatively straightforward task, but actually proved to be rather troublesome (this may be an example of British understatement). Maybe the caution I gave about statistical methods having a large impact on statistical outcomes in An Inconvenient Truth should have led me to expect such issues. In any case, I thought that it would be instructive to talk about the problems I stumbled across and to – once again – emphasise the perils of over-extending statistical models.
Regular readers will recall that my Brexit Infographic (reproduced above) leveraged data from an earlier article, A Tale of two [Brexit] Data Visualisations. As cited in this article, the numbers used were from two sources:
In the notes section of A Tale of two [Brexit] Data Visualisations I [prophetically] stated that the breakdown of voting by age group was just an estimate. Based on what I have discovered since, I’m rather glad that I made this caveat explicit.
The Pool of Tears
In order to work out the number of people in each age bracket who did not vote, an obvious starting point would be the overall electorate, which the UK Electoral Commission stated as being 46,500,001. As we know that 33,551,983 people voted (an actual figure rather than an estimate), then this is where the turnout percentage of 72.2% (actually 72.1548%) came from (33,551,983 / 45,500,001).
A clarifying note, the electorate figures above refer to people who are eligible to vote. Specifically, in order to vote in the UK Referendum, people had to meet the following eligibility criteria (again drawn from the UK Electoral Commission):
To be eligible to vote in the EU Referendum, you must be:
A British or Irish citizen living in the UK, or
A Commonwealth citizen living in the UK who has leave to remain in the UK or who does not require leave to remain in the UK, or
A British citizen living overseas who has been registered to vote in the UK in the last 15 years, or
An Irish citizen living overseas who was born in Northern Ireland and who has been registered to vote in Northern Ireland in the last 15 years.
EU citizens are not eligible to vote in the EU Referendum unless they also meet the eligibility criteria above.
So far, so simple. The next thing I needed to know was how the electorate was split by age. This is where we begin to run into problems. One place to start is the actual population of the UK as at the last census (2011). This is as follows:
% of total
If I roll up the above figures to create the same age groups as in the Ashcroft analysis (something that requires splitting the 15-19 range, which I have assumed can be done uniformly), I get:
% of total
The UK Government isn’t interested in the views of people under 18, so eliminating this row we get:
% of total
As mentioned, the above figures are from 2011 and the UK population has grown since then. Web-site WorldOMeters offers an extrapolated population of 65,124,383 for the UK in 2016 (this is as at 12th July 2016; if extrapolation and estimates make you queasy, I’d suggest closing this article now!). I’m going to use a rounder figure of 65,125,000 people; there is no point pretending that precision exists where it clearly doesn’t. Making the assumption that such growth is uniform across all age groups (please refer to my previous bracketed comment!), then the above exhibit can also be extrapolated to give us:
% of total
Looking Glass House
So our – somewhat fabricated – figure for the 18+ UK population in 2016 is 51,210,887, let’s just call this 51,200,000. As at the beginning of this article the electorate for the 2016 UK Referendum was 45,500,000 (dropping off the 1 person with apologies to him or her). The difference is explicable based on the eligibility criteria quoted above. I now have a rough age group break down of the 51.2 million population, how best to apply this to the 45.5 million electorate?
I’ll park this question for the moment and instead look to calculate a different figure. Based on the Ashcroft model, what percentage of the UK population (i.e. the 51.2 million) voted in each age group? We can work this one out without many complications as follows:
Turnout % (B/A)
(B) = Size of each age group in the Ashcroft sample as a percentage multiplied by the total number of people voting (see A Tale of two [Brexit] Data Visualisations).
Remember here that actual turnout figures have electorate as the denominator, not population. As the electorate is less than the population, this means that all of the turnout percentages should actually be higher than the ones calculated (e.g. the overall turnout with respect to electorate is 72.2% whereas my calculated turnout with respect to population is 65.5%). So given this, how to explain the 94.8% turnout of 55-64 year olds? To be sure this group does reliably turn out to vote, but did essentially all of them (remembering that the figures in the above table are too low) really vote in the referendum? This seems less than credible.
The turnout for 55-64 year olds in the 2015 General Election has been estimated at 77%, based on an overall turnout of 66.1% (web-site UK Political Info; once more these figures will have been created based on techniques similar to the ones I am using here). If we assume a uniform uplift across age ranges (that “assume” word again!) then one might deduce that an increase in overall turnout from 66.1% to 72.2%, might lead to the turnout in the 55-64 age bracket increasing from 77% to 84%. 84% turnout is still very high, but it is at least feasible; close to 100% turnout in from this age group seems beyond the realms of likelihood.
So what has gone wrong? Well so far the only culprit I can think of is the distribution of voting by age group in the Ashcroft poll. To be clear here, I’m not accusing Lord Ashcroft and his team of sloppy work. Instead I’m calling out that the way that I have extrapolated their figures may not be sustainable. Indeed, if my extrapolation is valid, this would imply that the Ashcroft model over estimated the proportion of 55-64 year olds voting. Thus it must have underestimated the proportion of voters in some other age group. Putting aside the likely fact that I have probably used their figures in an unintended manner, could it be that the much-maligned turnout of younger people has been misrepresented?
To test the validity of this hypothesis, I turned to a later poll by Omnium. To be sure this was based on a sample size of around 2,000 as opposed to Ashcroft’s 12,000, but it does paint a significantly different picture. Their distribution of voter turnout by age group was as follows:
I have to say that the Omnium age groups are a bit idiosyncratic, so I have taken advantage of the fact that the figures for 25-54 are essentially the same to create a schedule that matches the Ashcroft groups as follows:
The Omnium model suggests that younger voters may have turned out in greater numbers than might be thought based on the Ashcroft data. In turn this would suggest that a much greater percentage of 18-24 year olds turned out for the Referendum (64%) than for the last General Election (43%); contrast this with an estimated 18-24 turnout figure of 47% based on the just increase in turnout between the General Election and the Referendum. The Omnium estimates do still however recognise that turnout was still greater in the 55+ brackets, which supports the pattern seen in other elections.
While it may well be that the Leave / Remain splits based on the Ashcroft figures are reasonable, I’m less convinced that extrapolating these same figures to make claims about actual voting numbers by age group (as I have done) is tenable. Perhaps it would be better to view each age cohort as a mini sample to be treated independently. Based on the analysis above, I doubt that the turnout figures I have extrapolated from the Ashcroft breakdown by age group are robust. However, that is not the same as saying that the Ashcroft data is flawed, or that the Omnium figures are correct. Indeed the Omnium data (at least those elements published on their web-site) don’t include an analysis of whether the people in their sample voted Leave or Remain, so direct comparison is not going to be possible. Performing calculation gymnastics such as using the Omnium turnout for each age group in combination with the Ashcroft voting splits for Leave and Remain for the same age groups actually leads to a rather different Referendum result, so I’m not going to plunge further down this particular rabbit hole.
In summary, my supposedly simple trip to the destitution of an enhanced Brexit Infographic has proved unexpectedly arduous, winding and beset by troubles. These challenges have proved so great that I’ve abandoned the journey and will be instead heading for home.
Which dreamed it?
Based on my work so far, I have severe doubts about the accuracy of some of the age-based exhibits I have published (versions of which have also appeared on many web-sites, the BBC to offer just one example, scroll down to “How different age groups voted” and note that the percentages cited reconcile to mine). I believe that my logic and calculations are sound, but it seems that I am making too many assumptions about how I can leverage the Ashcroft data. After posting this article, I will accordingly go back and annotate each of my previous posts and link them to these later findings.
I think the broader lesson to be learnt is that estimates are just that, attempts (normally well-intentioned of course) to come up with figures where the actual numbers are not accessible. Sometimes this is a very useful – indeed indispensable – approach, sometimes it is less helpful. In either case estimation should always be approached with caution and the findings ideally sense-checked in the way that I have tried to do above.
Occam’s razor would suggest that when the stats tell you something that seems incredible, then 99 times out of 100 there is an error or inaccurate assumption buried somewhere in the model. This applies when you are creating the model yourself and doubly so where you are relying upon figures calculated by other people. In the latter case not only is there the risk of their figures being inaccurate, there is the incremental risk that you interpret them wrongly, or stretch their broader application to breaking point. I was probably guilty of one or more of the above sins in my earlier articles. I’d like my probable misstep to serve as a warning to other people when they too look to leverage statistics in new ways.
A further point is the most advanced concepts I have applied in my calculations above are addition, subtraction, multiplication and division. If these basic operations – even in the hands of someone like me who is relatively familiar with them – can lead to the issues described above, just imagine what could result from the more complex mathematical techniques (e.g. ambition, distraction, uglification and derision) used by even entry-level data scientists. This perhaps suggests an apt aphorism: Caveat calculator!
What I began to think about was that both of these earlier exhibits (and indeed many that I have seen pertaining to Data Management and Data Governance) suggest that the discipline forms a solid foundation upon which other areas are built. While there is a lot of truth in this view, I have come round to thinking that Data Management may alternatively be thought of as actively taking part in a more dynamic process; specifically the same iterative journey from Data to Information to Insight to Action and back to Data again that I have referenced here several times before. I have looked to combine both the static, foundational elements of Data Management and the dynamic, process-centric ones in the diagram presented at the top of this article; a more detailed and annotated version of which is available to download as a PDF via the link above.
I have also introduced the alternative path from Data to Insight; the one that passes through Statistical Analysis. Data Management is equally critical to the success of this type of approach. I believe that the schematic suggests some of the fluidity that is a major part of effective Data Management in my experience. I also hope that the exhibit supports my assertion that Data Management is not an end in itself, but instead needs to be considered in terms of the outputs that it helps to generate. Pristine data is of little use to an organisation if it is not then exploited to form insights and drive actions. As ever, this need to drive action necessitates a focus on cultural transformation, an area that is covered in many other parts of this site.
This diagram also calls to mind the subject of where and how the roles of Chief Analytics Officer and Chief Data Officer intersect and whether indeed these should be separate roles at all. These are questions to which – as promised on several previous occasions – I will return to in future articles. For now, maybe my schematic can give some data and information practitioners a different way to view their craft and the contributions that it can make to organisational success.
This article is the final of three which address how to formulate an Information Strategy. I have written a number of other articles which touch on this subject  and have also spoken about the topic . However I realised that I had never posted an in-depth review of this important area. This series of articles seeks to remedy this omission.
The first article, Part I – General Strategy, explored the nature of strategy, laid some foundations and presented a framework of questions which will need to be answered in order to formulate any general strategy. The second, Part II – Situational Analysis, explained how to adapt the first element of this general framework – The Situational Analysis – to creating an Information Strategy. In Part I, I likened formulating an Information Strategy to a journey, Part III – Completing the Strategy sees us reaching the destination by working through the rest of the general framework and showing how this can be used to produce a fully-formed Information Strategy.
As with all of my other articles, this essay is not intended as a recipe for success, a set of instructions which – if slavishly followed – will guarantee the desired outcome. Instead the reader is invited to view the following as a set of observations based on what I have learnt during a career in which the development of both Information Strategies and technology strategies in general have played a major role.
A Recap of the Strategic Framework
I closed Part I of this series by presenting a set of questions, the answers to which will facilitate the formation of any strategy. These have a geographic / journey theme and are as follows:
Where are we?
Where do we want to be instead and why?
How do we get there, how long will it take and what will it cost?
Will the trip be worth it?
What else can we do along the way?
Part II explained the process of answering question 1 through the medium of a Situational Analysis. It is worth pointing out at this juncture that the Situational Analysis will also naturally form the first phase of the more lengthy process of gathering and analysing business requirements. For the purposes of the rest of this article, when such requirements are mentioned, they are taken as being the embryonic ones captured as part of the Situational Analysis.
In this final article I will focus on how to approach obtaining answers to questions 2 to 5. Having spent quite some time considering question 1 in the previous chapter, the content here will be somewhat briefer for the remaining questions; not least as I have covered some of this territory in earlier articles .
2. Where do we want to be instead and why?
My thoughts here split into two sub-sections. The second, What does Good look like?, is (as will be obvious from the title) more forward looking than backward. It covers reasons why the destination may be worth the journey. The first is more to do with why staying in the current location may not be a great idea . However, one motivation for not staying put is that somewhere else may well be better. For this reason, there is not definitive border between these two sub-sections and it will be evident from the text that they instead bleed into each other.
2a. Drivers for Change
People often say that the gains that result from Information Programmes are intangible. Of course some may indeed be fairly intangible, but even the most ephemeral of these will not be entirely immune from some sort of valuation. Other benefits, when examined closely enough, can turn out to be surprisingly tangible . In making a case for change (and of course the expenditure associated with this) it is good to try to have a balance of tangible and intangible factors. Here is a selection which may be applicable:
Internal IT drivers
These often centre around both the cost and confusion associated with a fragmented and inconsistent Information Landscape; something which, even as we head in to 2015, is still not atypical.
Opportunity costs may arise from an inability to combine data from different repositories or to roll up data to cover an entire organisation.
There is also a case to be made here around things like the licensing costs that result from having too many information repositories and too many tools being used to access them.
However, the cost of remediating such fragmentation can often appear in the shape of additional IT headcount devoted to maintaining a complex landscape and additional business headcount devoted to remediating information shortcomings.
Less number crunching, more business-focussed analysis. Often an organisation’s most highly qualified (and highly paid) staff can spend much of their time repeating quotidian tasks that computers could do far more reliably. Freeing up such able and creative people to add more business value should be an objective and should have benefits.
At one company I estimated that teams would spend 5-7 days assembling the information necessary to support a meeting with one of a number of key business partners or a major client; our goal became to provide the same information effectively instantaneously; these types of benefits can be costed and also tend to resonate with business stakeholders.
Increasing sales / improving profitability
All information programmes (indeed most any business activity) should be dedicated to increasing profitability of course. In some specific industries the leverage of high-quality information is more readily associated with profitability than others. However, with enough time spent understanding the dynamics of an organisation, I would suggest that it is possible to make this linkage in a credible manner in pretty much any industry sector.
With respect to sales, sometimes if you want to increase say cross-selling, a very effective way is simply to measure it, maybe by department and salesperson. If there is some reliable way to track this, improvements in cross-selling will inevitably follow.
Mitigating operational risk
More reliable, unbiased and transparent production of information can address a number of operational risks; what these are specifically will vary from organisation to organisation.
However, most years see some organisation or another have to restate their results – there have been cases where adding two figures rather than subtracting them has led to a later restatement. Cases can often be built around the specific pain points in an organisation, or sometimes even near misses that were caught at the 11th hour.
Equally the cost of checking and re-checking figures before publication can be extremely high.
It is also generally worth asking business users what value they would ascribe to improved information, for example what things could they do under new arrangements that they cannot do now? It is important here that any benefits – and in particular any ones which prove to be intangible – are expressed in business language, not technical jargon.
2b. What does Good look like?
Answering this question is predicated on both experience of successful information improvement programmes and a degree of knowledge about the general information market. There are two main elements here, what does good look like technically and what does it look like from a process / people perspective.
To cover the technical first, this is the simpler area, not least as we have understood how to develop robust, flexible and highly-performing information architectures for at least 15 years.
The basics are shown in the diagram above . Questions to consider here include:
What would a new information architecture look like?
What are the characteristics of the new which would indicate that it is an improvement on the old, can these be articulated to non-technical people?
What are required elements and how do they relate to the high-level needs captured in the Situational Analysis?
How does the proposed architecture relate to incumbent technologies and current staff skills?
Can any elements of existing information provision be leveraged, either temporarily or on an ongoing basis?
What has worked for other organisations and why would this be pertinent to the organisation in question?
Are any new developments in technology pertinent?
Arguably the more important area is the non-technical. Here there is a range of items to consider, some of which are captured in the following exhibit :
I could spend an separate set of articles commenting on the elements of the above diagram; indeed I already have and interested readers are directed to the footnotes for links to some of these . However it is worth pointing out the critical role to be played by both user education (a more apt phrase than training) and formal Data Governance. Also certain elements of information tend to work well when they sit within a regular business process; such as a monthly or quarterly review of specific aspects of results and future projections.
3. How do we get there, how long will it take and what will it cost?
3a. Outline an Indicative Programme of Work
I am not going to offer Programme Planning 101 here, but briefly the first step in putting together an indicative programme of work is to decompose the overall journey into chunks, each of which can then be estimated. Each chunk should cover a group of reports / analyses and include activities from requirements gathering through to testing and finally deployment . For the purposes of an indicative programme within a strategy document, the strategist can rely upon both information gathered in the Situational Analysis and their own experience of how to best decompose such work. Ultimately the size and number of the chunks should be dictated by business need, but at this stage estimates can be based upon experience and reasonable assumptions.
It is important that each chunk (or sub-chunk) delivers value and offers an opportunity for the approach and progress to be reviewed. A further factor to consider when estimating these chunks is that they should be delivered at a pace which allows them to be properly digested by users; resource allocations should reflect this. For each chunk the strategist should consider the type and quantum of resource required and the timing with which these are applied.
The indicative programme plan should also include a first phase which relates to reviewing the plan itself. Forming a strategy involves less people than running a programme. Even if initial estimation is carried out very diligently, it is likely that further issues will emerge once more detailed work later commences. As the information programme team ramps up, it is important that time is allocated for new team members to kick the tyres on the plan and make recommendations for improvement.
3b. How much will it cost?
A big element of cost estimates will be a by-product of the indicative programme plan, which will cover programme duration and the amount of resource required at different points. Some further questions to consider when looking to catalogue costs include the following:
What are baseline costs for current information provision?
To what degree to these need to be incurred in parallel to an information improvement programme, are there ways to reduce these legacy costs to free up funds for the central programme?
What transitional costs are needed to execute the Information Strategy?
Hardware and software: is change necessary?
People: what is the best balance between internal, contract and outsourced resources, to what degree can existing staff be leveraged without compromising their current responsibilities?
How will costs vary by programme phase, will these taper as elements of older information systems are replaced by new facilities?
Can costs be reduced by having people play different roles at different points in the programme?
What costs will be ongoing once the strategy has been executed?
How do these compare to the current baseline?
Sometimes one aim of an Information Strategy will be to reduce to cost of ongoing support and maintenance, if so, how will this be achieved and how will any transition be managed?
A consideration here is whether the most important thing is to maximise speed of delivery or minimise risk? Things that will reduce risk could include: initial exploratory phases; starting with a small number of programme resources and increasing these based only on success; and instigating appropriate governance processes. However each of these will also increase duration and therefore cost. In some areas a trade off will be necessary and which side of these equations is more important will vary from organisation to organisation.
4. Will the trip be worth it?
Answering parts of question 2 will help with getting a handle on potential benefits of executing an Information Strategy. Work on question 3 will get us an idea of the timeframes and costs involved. There is a need to combine the two of these into a cost / benefit analysis. This should be an honest and transparent assessment of the potential payback of adopting the Information Strategy. Given that most Information Strategies will take more than a year to implement and that benefits may equally be realised on an ongoing basis, it will generally make sense to look at figures over a 3-5 year period. It may be possible to draw up a quasi-P&L statement showing the impact of adopting the strategy, such an approach can resonate with senior stakeholders.
Points to recall and questions to consider here include:
Costs will emerge from the Indicative Programme Plan, but remember the ongoing costs of maintaining existing information capabilities.
As with most initiatives, the benefits of information programmes split into tangible and intangible components:
Where possible make benefits tangible even if this requires a degree of guesstimation .
Remember that many supposed intangibles can be estimated with some thought.
What benefits have other companies seen from similar programmes, particularly ones in the same industry sector?
Is it possible to perform “what if?” scenarios with current and future capabilities; could better information could have led to better outcomes? 
Ask business people to estimate the impact of better information.
Intangible benefits resonate where they are expressed in clear business language, not IT speak.
It should be borne in mind here that the cost / benefit analysis may not add up. If this is the case, then either a less expensive approach is more suitable for the company, or the potential benefits need to be looked at again. Where progress can genuinely not be made on either of these areas, the responsible strategist will acknowledge that doing nothing may well be the logical approach for the organisation in question.
5. What else can we do along the way?
Finally, it is worth noting that short-term tactical deliveries can strongly support a strategy . Interim work can meet urgent business needs in a timely manner. This is a substantial benefit in itself and also evidences progress in the area of improving information capabilities. It also demonstrates that that the programme team understands commercial pressures. This type of work is also complementary in that it can be used to:
Validate some elements of the cost / benefit analysis.
Round out requirements gathering.
Highlight any areas which have been overlooked.
Provide invaluable deployment and training experience, which can be leveraged for the implementation of more strategic capabilities.
It can also be useful make mistakes early and with small deliverables, not later with major ones. For these reasons, it is suggested that any Information Strategy should embrace “throw away” work. However this should be reflected in the overall programme plan and resources should be specifically allocated to this area. If this is not done, then tactical work can easily overwhelm the team and prevent progress on more strategic areas from being made; generally a death knell for a programme.
A Recap of the Main Points
Carry out a Situational Analysis.
As part of this, start the process of capturing High-level Business Requirements.
Establish Drivers for Change, what benefits can be realised by better information, or by producing information in a better way?
Ask “What Does Good Look Like?”, from both a technical and a process / people point of view.
Develop an Indicative Programme of Work with realistic resource estimates and durations.
Estimate Current, Transitional and Ongoing Costs.
Itemise some of the major Interim Deliverables.
Create a Cost / Benefits Analysis.
Bringing everything together
There is a need to take the detailed work described over the course of the last three articles and the documentation which has been created as part of the process and to distill these down into a format that is digestible by senior management. There is no silver bullet here, summarising screeds of detail in a way that preserves the main points and presents them in a way that resonates is not easy. It takes judgement, an understanding of how businesses operate and strong analytical, writing and often diagrammatic skills. These will not be acquired by reading a blog article, but by honing experience and expertise over many years of work. To an extent, producing relevant and cogent summaries is where good IT professionals earn their money.
Unfortunately, at the time of writing, there is no book entitled Summarising Complex Issues for Dummies, .
This article and its two predecessors have been akin to listing the ingredients required to make a complex meal. While it is difficult to make great food without good ingredients or with some key spice missing, these things are not sufficient to ensure culinary excellence; what is also needed is a competent chef . I cook a lot myself and, whenever I try a recipe for the first time, it can be a bit fraught. Sometimes I don’t get all of the elements of the meal ready at the same time, sometimes while I’m paying attention to reading the instructions for one part, another part boils over, or gets burnt. These problems with cooking tend dissipate with repetition. In the same way, what is generally needed in developing a sound Information Strategy is the equivalents great ingredients, a competent chef and an experienced one as well.
– this series of articles presents a specific example drawn from Insurance, but the general approach can be adapted to fit other industry sectors
This is an expanded version of the diagram I posted as part of Using multiple business intelligence tools in an implementation – Part I back in May 2009. I have elided details such as the fine structure of the warehouse (staging, relational, multidimensional etc.), master data sources and also which parts of it are accessed by different tools and different types of users. In a severe breach with the traditional IT approach, I have also left some arrows out.
This is an updated version of an exhibit I put together working with an actuarial colleague back in 2001, early in my journey into information improvement programmes.
These include my trilogy on the change management aspects of information programmes:
Sometimes the first level of decomposition will need to be broken up into further and smaller chunks with this process iterating until the strategist reaches tasks which they are happy to estimate with a degree of certainty.
It may make sense to have different versions of the cost / benefit analysis, more conservative ones including only the most tangible benefits and more aggressive ones taking in to account benefits which have to be somewhat less certain.
Given my lack of exposure to the event as a whole, I will restrict myself to writing about a comment that came up in the question section of my slot. As per my article on presenting in public, I try to always allow time at the end for questions as this can often be the most interesting part of the talk; for delegates and for me. My IRM slot was 45 minutes this time round, so I turned things over to the audience after speaking for half-an-hour.
There were a number of good questions and I did my best to answer them, based on past experience of both what had worked and what had been less successful. However, one comment stuck in my mind. For obvious reasons, I will not identify either the delegate, or the organisation that she worked for; but I also had a brief follow-up conversation with her afterwards.
She explained that her organisation had in place a formal data governance process and that a lot of time and effort had been put into communicating with the people who actually entered data. In common with my first pillar, this had focused on educating people as to the importance of data quality and how this fed into the organisation’s objectives; a textbook example of how to do things, on which the lady in question should be congratulated. However, she also faced an issue; one that is probably more common than any of us information professionals would care to admit. Her problem was not at the bottom, or in the middle of her organisation, but at the top.
In particular, though data governance and a thorough and consistent approach to both the entry of data and transformation of this to information were all embedded into the organisation; this did not prevent the leaders of each division having their own people take the resulting information, load it into Excel and “improve” it by “adjusting anomalies”, “smoothing out variations”, “allowing for the impact of exceptional items”, “better reflecting the opinions of field operatives” and the whole panoply of euphemisms for changing figures so that they tell a more convenient story.
In one sense this was rather depressing, someone having got so much right, but still facing challenges. However, it also chimes with another theme that I have stressed many times under the banner of cultural transformation; it is crucially important than any information initiative either has, or works assiduously to establish, the active support of all echelons of the organisation. In some of my most successful BI/DW work, I have had the benefit of the direct support of the CEO. Equally, it is is very important to ensure that the highest levels of your organisation buy in before commencing on a stepped-change to its information capabilities.
My experience is that enhanced information can have enormous payback. But it is risky to embark on an information programme without this being explicitly recognised by the senior management team. If you avoid laying this important foundation, then this is simply storing up trouble for the future. The best BI/DW projects are totally aligned with the strategic goals of the organisation. Given this, explaining their objectives and soliciting executive support should be all the easier. This is something that I would encourage my fellow information professionals to seek without exception.