Building Momentum – How to begin becoming a Data-driven Organisation

Introduction

It is hard to find an organisation that does not aspire to being data-driven these days. While there is undoubtedly an element of me-tooism about some of these statements (or a fear of competitors / new entrants who may use their data better, gaining a competitive advantage), often there is a clear case for the better leverage of data assets. This may be to do with the stand-alone benefits of such an approach (enhanced understanding of customers, competitors, products / services etc. [1]), or as a keystone supporting a broader digital transformation.

However, in my experience, many organisations have much less mature ideas about how to achieve their data goals than they do about setting them. Given the lack of executive experience in data matters [2], it is not atypical that one of the large strategy consultants is engaged to shape a data strategy; one of the large management consultants is engaged to turn this into something executable and maybe to select some suitable technologies; and one of the large systems integrators (or increasingly off-shore organisations migrating up the food chain) is engaged to do the work, which by this stage normally relates to building technology capabilities, implementing a new architecture or some other technology-focussed programme.

Even if each of these partners does a great job – which one would hope they do at their price points – a few things invariably get lost along the way. These include:

1. A data strategy that is closely coupled to the organisation’s actual needs rather than something more general.

While there are undoubtedly benefits in adopting best practice for an industry, there is also something to be said for a more tailored approach, tied to business imperatives and which may have the possibility to define the new best practice. In some areas of business, it makes sense to take the tried and tested approach, to be a part of the herd. In others – and data is in my opinion one of these – taking a more innovative and distinctive path is more likely to lead to success.

2. Connective tissue between strategy and execution.

The distinctions between the three types of organisations I cite above are becoming more blurry (not least as each seeks to develop new revenue streams). This can lead to the strategy consultants developing plans, which get ripped up by the management consultants; the management consultants revisiting the initial strategy; the systems integrators / off-shorers replanning, or opening up technical and architecture discussions again. Of course this means the client paying at least twice for this type of work. What also disappears is the type of accountability that comes when the same people are responsible for developing a strategy, turning this into a practical plan and then executing this [3].

3. Focus on the cultural aspects of becoming more data-driven.

This is both one of the most important factors that determines success or failure [4] and something that – frankly because it is not easy to do – often falls by the wayside. By the time that the third external firm has been on-boarded, the name of the game is generally building something (e.g. a Data Lake, or an analytics platform) rather than the more human questions of who will use this, in what way, to achieve which business objectives.

Of course a way to address the above is to allocate some experienced people (internal or external, ideally probably a blend) who stay the course from development of data strategy through fleshing this out to execution and who – importantly – can also take a lead role in driving the necessary cultural change. It also makes sense to think about engaging organisations who are small enough to tailor their approach to your needs and who will not force a “cookie cutter” approach. I have written extensively about how – with the benefit of such people on board – to run such a data transformation programme [5]. Here I am going to focus on just one phase of such a programme and often the most important one; getting going and building momentum.

A Third Way

There are a couple of schools of thought here:

1. Focus on laying solid data foundations and thus build data capabilities that are robust and will stand the test of time.

2. Focus on delivering something ASAP in the data arena, which will build the case for further investment.

There are points in favour of both approaches and criticisms that can be made of each as well. For example, while the first approach will be necessary at some point (and indeed at a relatively early one) in order to sustain a transformation to a data-driven organisation, it obviously takes time and effort. Exclusive focus on this area can use up money, political capital and try the patience of sponsors. Few business initiatives will be funded for years if they do not begin to have at least some return relatively soon. This remains the case even if the benefits down the line are potentially great.

Equally, the second approach can seem very productive at first, but will generally end up trying to make a silk purse out of a sow’s ear [6]. Inevitably, without improvements to the underlying data landscape, limitations in the type of useful analytics that be carried out will be reached; sometimes sooner that might be thought. While I don’t generally refer to religious topics on this blog [7], the Parable of the Sower is apposite here. Focussing on delivering analytics without attending to the broader data landscape is indeed like the seed that fell on stony ground. The practice yields results that spring up, only to wilt when the sun gets hot, given that they have no real roots [8].

So what to do? Well, there is a Third Way. This involves blending both approaches. I tend to think of this in the following way:

First of all, this is a cartoon, it is not intended to indicate actual percentages, just to illustrate a general trend. In real life, it is likely that you will cycle round multiple times and indeed have different parallel work-streams at different stages. The general points I am trying to convey with this diagram are:

1. At the beginning of a data transformation programme, there should probably be more emphasis on interim delivery and tactical changes. However, imoportantly, there is never zero strategic work. As things progress, the emphasis should swing more to strategic, long-term work. But again, even in a mature programme, there is never zero tactical work. There can also of course be several iterations of such shifts in approach.

2. Interim and tactical steps should relate to not just analytics, but also to making point fixes to the data landscape where possible. It is also important to kick off diagnostic work, which will establish how bad things are and also suggest areas which could be attacked sooner rather than later; this too can initially be done on a tactical basis and then made more robust later. In general, if you consider the span of strategic data work, it makes sense to kick off cut-down (and maybe drastically cut-down) versions of many activities early on.

3. Importantly, the tactical and strategic work-streams should not be hermetically sealed. What you actually want is healthy interplay. Building some early, “quick and dirty” analytics may highlight areas that should be covered by a data audit, or where there are obvious weaknesses in a data architecture. Any data assets that are built on a more strategic basis should also be leveraged by tactical work, improving its utility and probably increasing its lifespan.

Interconnected Activities

At the beginning of this article, I present a diagram (repeated below) which covers three types of initial data activities, the sort of work that – if executed competently – can begin to generate momentum for a data programme. The exhibit also references Data Strategy.

Let’s look at each of these four things in some more detail:

1. Analytic Point Solutions

Where data has historically been locked up in either hard-to-use repositories or in source systems themselves, liberating even a bit of it can be very helpful. This does not have to be with snazzy tools (unless you want to showcase the art of the possible). An anecdote might help to explain.

At one organisation, they had existing reporting that was actually not horrendous, but it was hard to access, hard to parameterise and hard to do follow-on analysis on. I took it upon myself to run 30 plus reports on a weekly and monthly basis, download the contents to Excel, front these with some basic graphs and make these all available on an intranet. This meant that people from Country A or Department B could go straight to their figures rather than having to run fiddly reports. It also meant that they had an immediate visual overview – including some comparisons to prior periods and trends over time (which were not available in the original reports). Importantly, they also got a basic pivot table, which they could use to further examine what was going on. These simple steps (if a bit laborious for me) had a massive impact. I later replaced the Excel with pages I wrote in a new web-reporting tool we built in house. Ultimately, my team moved these to our strategic Analytics platform.

This shows how point solutions can be very valuable and also morph into more strategic facilities over time.

2. Data Process Improvements

Data issues may be to do with a range of problems from poor validation in systems, to bad data integration, but immature data processes and insufficient education for data entry staff are often key conributors to overall problems. Identifying such issues and quantifying their impact should be the province of a Data Audit, which is something I would recommend considering early on in a data programme. Once more this can be basic at first, considering just superficial issues, and then expand over time.

While fixing some data process problems and making a stepped change in data quality will both probably take time an effort, it may be possible to identify and target some narrower areas in which progress can be made quite quickly. It may be that one key attribute necessary for analysis is poorly entered and validated. Some good communications around this problem can help, better guidance for people entering it is also useful and some “quick and dirty” reporting highlighting problems and – hopefully – tracking improvement can make a difference quicker than you might expect [9].

3. Data Architecture Enhancements

Improving a Data Architecture sounds like a multi-year task and indeed it can often be just that. However, it may be that there are some areas where judicious application of limited resource and funds can make a difference early on. A team engaged in a data programme should seek out such opportunities and expect to devote time and attention to them in parallel with other work. Architectural improvements would be best coordinated with data process improvements where feasible.

An example might be providing a web-based tool to look up valid codes for entry into a system. Of course it would be a lot better to embed this functionality in the system itself, but it may take many months to include this in a change schedule whereas the tool could be made available quickly. I have had some success with extending such a tool to allow users to build their own hierarchies, which can then be reflected in either point analytics solutions or more strategic offerings. It may be possible to later offer the tool’s functionality via web-services allowing it to be integrated into more than one system.

4. Data Strategy

I have written extensively about Data Strategy on this site [10]. What I wanted to cover here is the interplay between Data Strategy and some of the other areas I have just covered. It might be thought that Data Strategy is both carved on tablets of stone [11] and stands in splendid and theoretical isolation, but this should not ever be the case. The development of a Data Strategy should of course be informed by a situational analysis and a vision of “what good looks like” for an organisation. However, both of these things can be shaped by early tactical work. Taking cues from initial tactical work should lead to a more pragmatic strategy, more aligned to business realities.

Work in each of the three areas itemised above can play an important role in shaping a Data Strategy and – as the Data Strategy matures – it can obviously guide interim work as well. This should be an iterative process with lots of feedback.

Closing Thoughts

I have captured the essence of these thoughts in the diagram above. The important things to take away are that in order to generate momentum, you need to start to do some stuff; to extend the physical metaphor, you have to start pushing. However, momentum is a vector quantity (it has a direction as well as a magnitude [12]) and building momentum is not a lot of use unless it is in the general direction in which you want to move; so push with some care and judgement. It is also useful to realise that – so long as your broad direction is OK – you can make refinements to your direction as you pick up speed.

The above thoughts are based on my experience in a range of organisations and I am confident that they can be applied anywhere, making allowance for local cultures of course. Once momentum is established, it still needs to be maintained (or indeed increased), but I find that getting the ball moving in the first place often presents the greatest challenge. My hope is that the framework I present here can help data practitioners to get over this initial hurdle and begin to really make a difference in their organisations.

Notes

 [1] Way back in 2009, I wrote about the benefits of leveraging data to provide enhanced information. The article in question was tited Measuring the benefits of Business Intelligence. Everything I mention remains valid today in 2018. [2] See also: [3] If I many be allowed to blow my own trumpet for a moment, I have developed data / information strategies for eight organisations, turned seven of these into a costed / planned programme and executed at least the first few phases of six of these. I have always found being a consistent presence through these phases has been beneficial to the organisations I was helping, as well as helping to reduce duplication of work. [4] See my, now rather venerable, trilogy about cultural change in data / information programmes: together with the rather more recent: [5] See for example: [6] Dictionary.com offers a nice explanation of this phrase.. [7] I was raised a Catholic, but have been areligious for many years. [8] Much like $x^2+x+1=0$. For anyone interested, the two roots of this polynomial are clearly: $-\dfrac{1}{2}+\dfrac{\sqrt{3}}{2}\hspace{1mm}i\hspace{5mm}\text{and}\hspace{5mm}-\dfrac{1}{2}-\dfrac{\sqrt{3}}{2}\hspace{1mm}i$ neither of which is Real. [9] See my rather venerable article, Using BI to drive improvements in data quality, for a fuller treatment of this area. [10] For starters see: and also the Data Strategy segment of The Anatomy of a Data Function – Part I. [11] [12] See Glimpses of Symmetry, Chapter 15 – It’s Space Jim….

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

The Anatomy of a Data Function – Part III

 Part I Part II Part III

This is the third and final part of my review of the anatomy of a Data Function, Part I may be viewed here and Part II here.

In the first article, I introduced the following Data Function organogram:

and went on to cover each of Data Strategy, Analytics & Insight and Data Operations & Technology. In Part II, I discussed the two remaining Data Function areas of Data Architecture and Data Management. In this final article, I wanted to cover the Related Areas that appear on the right of the above diagram. This naturally segues into talking about the practicalities of establishing a Data Function and highlighting some problems to be avoided or managed.

As in Parts I and II, unless otherwise stated, text indented as a quotation is excerpted from the Data and Analytics Dictionary.

Related Areas

I have outlined some of the key areas with which the Data Function will work. This is not intended to be a comprehensive list and indeed the boxes may be different in different organisations. Regardless of the departments that appear here, the general approach will however be similar. I won’t go through each function in great detail here. There are some obvious points to make however. The first is an overall one that clearly a collaborative approach is mandatory. While there are undeniably some police-like attributes of any Data Function, it would be best if these were carried out by friendly community policemen or women, not paramilitaries.

So rather more:

and rather less:

Data Privacy and Information Security

Though strongly related, these areas do not generally fall under the Data Function. Indeed some legislation requires that they are separate functions. Data Privacy and Information Security are related, but also distinct from each other. Definitions are as follows:

[Data Privacy] pertains to data held by organisations about individuals (customers, counterparties etc.) and specifically to data that can be used to identify people (personally identifiable data), or is sensitive in nature, such as medical records, financial transactions and so on. There is a legal obligation to safeguard such information and many regulations around how it can be used and how long it can be retained. Often the storage and use of such data requires explicit consent from the person involved.

Data and Analytics Dictionary entry: Data Privacy

Information Security consists of the steps that are necessary to make sure that any data or information, particularly sensitive information (trade secrets, financial information, intellectual property, employee details, customer and supplier details and so on), is protected from unauthorised access or use. Threats to be guarded against would include everything from intentional industrial espionage, to ad hoc hacking, to employees releasing or selling company information. The practice of Information Security also applies to the (nowadays typical) situation where some elements of internal information is made available via the internet. There is a need here to ensure that only those people who are authenticated to access such information can do so.

Data and Analytics Dictionary entry: Information Security

Digital

Digital is not a box that would have necessarily have appeared on this chart 15, or even 10, years ago. However, nowadays this is often an important (and large) department in many organisations. Digital departments leverage data heavily; both what they gather themselves and and data drawn from other parts of the organisation. This can be to show customers their transactions, to guide next best actions, or to suggest potentially useful products or services. Given this, collaboration with the Data Function should be particularly strong.

Change Management

There are some specific points to make with respect to Change collaboration. One dimension of this was covered in Part II. Looking at things the other way round, as well as being a regular department, with what are laughingly referred to as “business as usual” responsibilities [1], the Data Function will also drive a number of projects and programmes. Depending on how this is approached in an organisation, this means either that the Data Function will need its own Project Managers etc., or to have such allocated from Change. This means that interactions with Change are bidirectional, which may be particularly challenging.

For some reason, Change departments have often ended up holding the purse strings for all projects and programmes (perhaps a less than ideal outcome), so a Data Function looking to get its own work done may run counter to this (see also the second section of this article).

IT

While the role of IT is perhaps narrower nowadays than historically [2], they are deeply involved in the world of data and the infrastructure that supports its movement around the organisation. This means that the Data Function needs to pay particular attention to its relationship with IT.

Embedded Analytics Teams

A wholly centralised approach to delivering Analytics is neither feasible, nor desirable. I generally recommend hybrid arrangements with a strong centralised group and affiliated analytical resource embedded in business teams. In some organisations such people may be part of the Data Function, or have a dotted line into it. In others the connection may be less formal. Whatever the arrangements, the best result would be if embedded analytical staff viewed themselves as part of a broader analytical and data community, which can share tips, work to standards and leverage each other’s work.

Data Stewards

Data Stewards are a concept that arises from a requirement to embed Data Governance policies and processes. Data Function Governance staff and Data Architects both need to work closely with Data Stewards. A definition is as follows:

This is a concept that arises out of Data Governance. It recognises that accountability for things like data quality, metadata and the implementation of data policies needs to be devolved to business departments and often locations. A Data Steward is the person within a particular part of an organisation who is responsible for ensuring that their data is fit for purpose and that their area adheres to data policies and guidelines.

Data and Analytics Dictionary entry: Data Steward

End User Computing

There are several good reasons for engaging with this area. First, the various EUCs that have been developed will embody some element (unsatisfied elsewhere) of requirements for the processing and or distribution of data; these needs probably need to be met. Second, EUCs can present significant risks to organisations (as well as delivering significant benefits) and ameliorating these (while hopefully retaining the benefits) should be on the list of any Data Function. Third, the people who have built EUCs tend to be knowledgeable about an organisation’s data, the sort of people who can be useful sources of information and also potential allies.

[End User Computing] is a term used to cover systems developed by people other than an organisation’s IT department or an approved commercial software vendor. It may be that such software is developed and maintained by a small group of people within a department, but more typically a single person will have created and cares for the code. EUCs may be written in mainstream languages such as Java, C++ or Python, but are frequently instead Excel- or Access-based, leveraging their shared macro/scripting language, VBA (for Visual Basic for Applications). While related to Microsoft Visual Basic (the precursor to .NET), VBA is not a stand-alone language and can only run within a Microsoft Office application, such as Excel.

Data and Analytics Dictionary entry: End User Computing (EUC)

Third Party Providers

Often such organisations may be contracted through the IT function; however the Data Function may also hire its own consultants / service providers. In either case, the Data Function will need to pay similar attention to external groups as it does to internal service providers.

Building a Data Function for the Practical Man [3]

Starting Small

It is a truth universally acknowledged, that a Leader newly in possession of a Data Function, must be in want of some staff [5]. However seldom will such a person be furnished with a budget and headcount commensurate with the task at hand; at least in the early days. Often instead, the mission, should you choose to accept it, is to begin to make a difference in the Data World with a skeleton crew at best [6]. Well no one can work miracles and so it is a question of judgement where to apply scarce resource.

My view is that this is best applied in shining a light on the existing data landscape, but in two ways. First, at the Analytics end of the spectrum, looking to unearth novel findings from an organisation’s data; the sort of task you give to a capable Data Scientist with some background in the industry sector they are operating in. Second, at the Governance end of the spectrum, documenting failures in existing data processing and reporting; in particular any that could expose the organisation to specific and tangible risks. In B2C organisations, an obvious place to look is in customer data. In B2B ones instead you can look at transactions with counterparties, or in the preparation of data for external reports, either Financial or Regulatory. Here the ideal person is a competent Data Analyst with some knowledge of the existing data landscape, in particular the compromises that have to be made to work with it.

In both cases, the objective is to tell the organisation things it does not know. Positively, a glimmer of what nuggets its data holds and the impact this could have. Negatively, examples of where a poor data landscape leads to legal, regulatory, or reputational risks.

These activities can add value early on and increase demand for more of this type of work. The first investigation can lead to the creation of a Data Science team, the second to the establishment of regular Data Audits and people to run these.

A corollary here is a point that I ceaselessly make, data exploitation and data control are two sides of the same coin. By making progress in areas that are at least superficially at antipodal locations within a Data Function, the connective tissue between them becomes more apparent.

BAU or Project?

There is a pernicious opinion held by an awful lot of people which goes as follows.

1. We have issues with our data, its quality, completeness and fitness for purpose.
2. We do not do a good enough job of leveraging our data to guide decision making.
3. Therefore we need a data project / programme to sort this out once and for all.
4. Where is the telephone number of the Change Director?

Well there is some logic to the above and setting up a data project (more likely programme) is a helpful thing to do. However, this is necessary, but not sufficient [7]. Let’s think of a comparison?

1. We need to ensure that our Financial and Management accounts are sound.
3. Therefore we need a Finance project / programme to sort this out once and for all.
4. Where is the telephone number of the Change Director?

Most CFOs would view the above as their responsibility. They have an entire function focussed on such matters. Of course they may want to run some Finance projects and Change will help with this, but a Finance Department is an ongoing necessity.

To pick another example one that illustrates just how quickly the make-up of organisations can change, just replace the word “Finance” with “Risk” in the above and “CFO” with “CRO”. While programmes may be helpful to improve either Risk or Finance, they do not run the Risk or Finance functions, the designated officers do and they have a complement of staff to assist them. It is exactly the same with data. Data programmes will enhance your use of data or control of it, but they will not ensure the day-to-day management and leverage of data in your organisation. Running “data” is the responsibility of the designated officer [8] and they should have a complement of staff to assist them as well.

The Data Function is a “business as usual” [9] function. Conveying this fact to a range of stakeholders is going to be one of the first challenges. It may be that the couple of examples I cite above can provide some ammunition for this task.

Demolishing Demoralising Demarcations

With Data Functions and their leaders both being relative emergent phenomena [10], the separation of duties between them and other areas of a business that also deal with data can be less than clear. Scanning down the Related Areas column of the overall Data Function chart, three entities stand out who may feel that they have a strong role to play in data matters: Digital, Change Management and IT.

Of course each is correct and collaboration is the best way forward. However, human nature is not always do benign and I have several times seen jockeying for position between Data, Digital, Change and IT. Route A to resolving this is of course having clarity as to everyone’s roles and a lead Executive (normally a CEO or COO) who ensures that people play nicely with each other. Back in the real world, it will be down to the leaders in each of these areas to forge some sort of consensus about who does what and why. It is probably best to realise this upfront, rather than wasting time and effort lobbying Executives to rule on things they probably have no intention of ruling on.

Nascent Data Function leaders should be aware that there will be a tendency for other teams to carve out what might be seen as the sexier elements of Data work; this can almost seem logical when – for example – a Digital team already has a full complement of web analytics staff; surely it is just a matter of pointing these at other internal data sets, right?

If we assume that the Data Function is the last of the above mentioned departments to form, then “zero sum game” thinking would dictate that whatever is accretive to the Data Function is deleterious to existing data staff in other departments. Perhaps a good place to start in combatting this mind-set is to first acknowledge it and second to take steps to allay people’s fears. It may well make sense for some staff to gravitate to the Data Function, but only if there is a compelling logic and only if all parties agree. Offering the leaders of other departments joint decision-making on such sensitive issues can be a good confidence-building step.

Setting out explicitly to help colleagues in other departments, where feasible to do so, can make very good sense and begin the necessary work of building bridges. As with most areas of human endeavour, forging good relationships and working towards the common good are both the right thing to do and put the Data Function leader in a good place as and when more contentious discussions arise.

To make this concrete, when people in another function appear to be stepping on the toes of the Data Function, instead of reacting with outrage, it may be preferable to embrace and fully understand the work that is being done. It may even make sense to support such work, even if the ultimate view is to do things a bit differently. Insisting on organisational purity and a “my way, or the highway” attitude to data matters are both steps towards a failed Data Function. Instead, engage, listen, support and – maybe over time – seek to nudge things towards your desired state.

Closing Thoughts

So we have reached the end of our anatomical journey. While maybe the information contained in these three articles would pale into insignificance compared to an actual course in human anatomy, we have nevertheless covered five main work-areas within a Data Function, splitting these down into nineteen sub-areas and cataloguing eight functions with which collaboration will be key in driving success. I have also typed over 8,000 words to convey my ideas. For those who have read all of them, thank you for your perseverance; I hope that the effort has been worthwhile and that you found some of my opinions thought-provoking.

I would also like to thank the various people who have provided positive feedback on this series via LinkedIn and Facebook. Your comments were particularly influential in shaping this final chapter.

So what are the main takeaways? Well first the word collaboration has cropped up a lot and – because data is so pervasive in organisations – the need to collaborate with a wide variety of people and departments is strong. Second, extending the human anatomy analogy, while each human shares a certain basic layout (upright, bipedal, two arms, etc.), there is considerable variation within the basic parameters. The same goes for the organogram of a Data Function that I have presented at the beginning of each of these articles. The boxes may be rearranged in some organisations, some may not sit in the Data Function in others, the amount of people allocated to each work-area will vary enormously. As with human anatomy, grasping the overall shape is more important than focussing on the inevitable variations between different people.

Third, a central concept is of course that a Data Function is necessary, not just a series of data-centric projects. Even if it starts small, some dedicated resource will be necessary and it would probably be foolish to embark on a data journey without at least a skeleton crew. Fourth, in such straitened circumstances, it is important to point early and clearly to the value of data, both in reducing potentially expensive risks and in driving insights that can save money, boost market share or improve products or services. If the budget is limited, attend to these two things first.

A fifth and final thought is how little these three articles have focussed on technology. Hadoop clusters, data visualisation suites and data governance tools all have their place, but the success or failure of data-centric work tends to pivot on more human and process considerations. This theme of technology being the least important part of data work is one I have come back to time and time again over the nine years that this blog has been published. This observation remains as true today as back in 2008.

 Part I Part II Part III

Notes

 [1] BAU should in general be filed along with other mythical creatures such as Unicorns, Bigfoot, The Kraken and The Loch Ness Monster. [2] Not least because of the rise of Data Functions, Digital Teams and stand-alone Change Organisations. [3] A title borrowed from J E Thompson’s Calculus for the Practical Man; a tome read by the young Richard Feynman in childhood. Today “Calculus for the Practical Person” might be a more inclusive title. [4] Also known as “pulling yourself up by your bootstraps”. [5] I seem to be channelling JA a lot at present – see A truth universally acknowledged…. [6] Indeed I have stated on this particular journey with just myself for company on no fewer than for occasions (these three 1, 2, 3, plus at Bupa). [7] Once a Mathematician, always a Mathematician. [8] See Alphabet Soup for some ideas about what he or she might be called. [9] See note 1. [10] Despite early high-profile CDOs beginning to appear at the turn of the millennium – Joe Bugajski was appointed VP and Chief Data Officer at Visa International in 2001 (Wikipedia).

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

The Anatomy of a Data Function – Part II

 Part I Part II Part III

This is the second part of my review of the anatomy of a Data Function, the artfully named Part I may be viewed here. As seems to happen all too often to me, this series will now extend to having a Part III, which may be viewed here.

In the first article, I introduced the following Data Function organogram:

and went on to cover each of Data Strategy, Analytics & Insight and Data Operations & Technology. In Part II, I will consider the two remaining Data Function areas of Data Architecture and Data Management. Covering Related Areas, and presenting some thoughts on how to go about setting up a Data Function and the pitfalls to be faced along the way, together form the third and final part of this trilogy.

As in Part I, unless otherwise stated, text indented as a quotation is excerpted from the Data and Analytics Dictionary.

Data Architecture

To be somewhat self-referential, this area acts a a cornerstone for the rest of the Data Function. While sometimes non-Data architects can seem to inhabit a loftier plane than most mere mortals, Data Architects (who definitively must be part of the Data Function and none of the Business, Enterprise or Solutions Architecture groups) tend to be more practical sorts with actual hands-on technical skills. Perhaps instead of the title “Architect”, “Structural Engineer” would be more appropriate. When a Data Architect draws a diagram with connected boxes, he or she generally understands how the connections work and could probably take a fair stab at implementing the linkages themselves. The other denizens of this area, such as Data Business Analysts, are also essentially pragmatic people, focused on real business outcomes. Data Architecture is a non-theoretical discipline and here I present some of the real-world activities that its members are often engaged in.

Change Portfolio Engagement

One of the most important services that a good Data Function can perform is to act as a moderator for the otherwise deleterious impact that uncontrolled (and uncoordinated) Change portfolios can have on even the best of data landscapes [1]. As I mention in another article:

Over the last decade or so, the delivery of technological change has evolved to the point where many streams of parallel work are run independently of each other with each receiving very close management scrutiny in order to ensure delivery on-time and on-budget. It should be recognised that some of this shift in modus operandi has been as a result of IT departments running projects that have spiralled out of control, or where delivery has been significantly delayed or compromised. The gimlet-like focus of Change on delivery “come Hell or High-water” represents the pendulum swinging to the other extreme.

What this shift in approach means in practice is that – as is often the case – when things go wrong or take longer than anticipated, areas of work are de-scoped to secure delivery dates. In my experience, 9 times out of 10 one of the things that gets thrown out is data-related work; be that not bothering to develop reporting on top of new systems, not integrating new data into existing repositories, not complying with data standards, or not implementing master data management.

As well as the danger of skipping necessary data related work, if some data-related work is actually undertaken, then corners may be cut to meet deadlines and budgets. It is not atypical for instance that a Change Programme, while adding their new capabilities to interfaces or ETL, compromises or overwrites existing functionality. This can mean that data-centric code is in a worse state after a Change Programme than before. My roadworks anecdote begins to feel all too apt a metaphor to employ.

Looking more broadly at Change Programmes, even without the curse of de-scopes, their focus is seldom data and the expertise of Change staff is not often in data matters. Because of this, such work can indeed seem to be analogous to continually digging up the same stretch of road for different purposes, combined with patching things up again in a manner that can sometimes be barely adequate. Extending our metaphor, the result of Change that is not controlled from a data point of view can be a landscape with lumps, bumps and pot-holes. Maybe the sewer was re-laid on time and to budget, but the road has been trashed in the process. Perhaps a new system was shoe-horned in to production, but rendered elements of an Analytical Repository useless in the process.

Excerpted from: Bumps in the Road

A primary responsibility of a properly constituted Data Function is to lean hard against the prevailing winds of Change in order to protect existing data capabilities that would otherwise likely be blown away [2]. Given the gargantuan size of most current Change teams, it makes sense to have at least a reasonable amount of Data Function resource applied to this area. Hopefully early interventions in projects and programmes can mitigate any potentially adverse impacts and perhaps even lead to Change being accretive to data landscapes, as it really ought to be.

The best approach, as with most human endeavours is a collaborative one, with Data Function staff (probably Data Architects) getting involved in new Change projects and programmes at an early stage and shaping them to be positive from a Data dimension. However, there also needs to be teeth in the process; on occasion the Data Function must be able to prevent work that would cause true damage from going ahead; hopefully powers that are used more in breach than observance.

Data Modelling

It is in this area that the practical bent of Data Architects and Data Business Analysts is seen very clearly. Data modelling mirrors the realities of systems and databases the way that Theoretical Physicists use Mathematics to model the Natural World [3]. In both cases, while there may be a degree of abstraction, the end purpose is to achieve something more concrete. A definition is as follows:

[Data Modelling is] the process of examining data sets (e.g. the database underpinning a system) in order to understand how they are structured, the relationships between their various parts and the business entities and transactions they represent. While system data will have a specific Physical Data Model (the tables it contains and their linkages), Data Modelling may instead look to create a higher-level and more abstract set of pseudo-tables, which would be easier to relate to for non-technical staff and would more closely map to business terms and activities; this is known as a Conceptual Data Model. Sitting somewhere between the two may be found Logical Data Models. There are several specific documents produced by such work, one of the most common being an Entity-Relationship diagram, e.g. a sales order has a customer and one or more line items, each of which has a product.

Data and Analytics Dictionary entry: Data Modelling

Another critical role. In my long experience of both setting up Data Functions and running Data Programmes, having good Data Business Analysts on board is often the difference between success and failure. I cannot stress enough how important this role is.

Data Business Analysts are neither regular Business Analysts, nor just Data Analysts, but rather a combination of the best of both. They do have all the requirement gathering skills of the best BAs, but complement these with Data Modelling abilities, always seeking to translate new requirements into expanded or refined Data Models. Also the way that they approach business requirements will be very specific. The optimal way to do this is by teasing out (and they collating and categorising) business questions and then determining the information needed to answer these. A good Data Business Analyst will also have strong Data Analysis skills, being able to work with unfamiliar and lightly-documented datasets to discern meaning and link this to business concepts. A definition is as follows:

A person who has extensive understanding of both business processes and the data necessary to support these. A Business Analyst is expert at discerning what people need to do. A Data Analyst is adept at working with datasets and extracting meaning from them. A Data Business Analyst can work equally happily in both worlds at the same time. When they talk to people about their requirements for information, they are simultaneously updating mental models of the data necessary to meet these needs. When they are considering how lightly-documented datasets hang together, they constantly have in mind the business purpose to which such resources may be bent.

Data and Analytics Dictionary entry: Data Business Analyst

Data Management

Again, it is worth noting that I have probably defined this area more narrowly than many. It could be argued that it should encompass the work I have under Data Architecture and maybe much of what is under Data Operations & Technology. The actual hierarchy is likely to be driven by factors like the nature of the organisation and the seniority of Managers in the Data Function. For good or ill, I have focussed Data Management more on the care and feeding of Data Assets in my recommended set-up. A definition is as follows:

The day-to-day management of data within an organisation, which encompasses areas such as Data Architecture, Data Quality, Data Governance (normally on behalf of a Data Governance Committee) and often some elements of data provision and / or regular reporting. The objective is to appropriately manage the lifecycle of data throughout the entire organisation, which both ensures the reliability of data and enables it to become a valuable and strategic asset.

In some organisations, Data Management and Analytics are part of the same organisation, in others they are separate but work closely together to achieve shared objectives.

Data and Analytics Dictionary entry: Data Management

Data Governance

There is a clear link here with some of the Data Architecture activities, particularly the Change Portfolio Engagement work-area. Governance should represent the strategic management of the data component of Change (i.e. most of Change), day-to-day collaboration would sit more in the Data Architecture area.

The management processes and policies necessary to ensure that data captured or generated within a company is of an appropriate standard to use, represents actual business facts and has its integrity preserved when transferred to repositories (e.g. Data Lakes and / or Data Warehouses, General Ledgers etc.), especially when this transfer involves aggregation or merging of different data sets. The activities that Data Governance has oversight of include the operation of and changes to Systems of Record and the activities of Data Management and Analytics departments (which may be merged into one unit, or discrete but with close collaboration).

Data Governance has a strategic role, often involving senior management. Day-to-day tasks supporting Data Governance are often carried out by a Data Management team.

Data and Analytics Dictionary entry: Data Governance

This is a relatively straightforward area to conceptualise. Rigorous and consistent definitions of master data and calculated data are indispensable in all aspects of how a Data Function operates and how an organisation both leverages and protects its data. Focusing on Metadata, a definition would be as follows:

[Metadata is] data about data. So descriptions of what appears in fields, how these relate to other fields and what concepts bigger constructs like Tables embody. This helps people unfamiliar with a dataset to understand how it hangs together and is good practice in the same way that documentation of any other type of code is good practice. Metadata can be used to support some elements of Data Discovery by less technical people. It is also invaluable when there is a need for Data Migration.

Data and Analytics Dictionary entry: Metadata

Data Audit

One of the challenges in driving Data Quality improvements in organisations is actually highlighting the problems and their impacts. Often poor Data Quality is a hidden cost, spread across many people taking longer to do their jobs than is necessary, or specific instances where interactions with business counterparties (including customers) are compromised. Organisations obviously cope – at least in general – with these issues, but they are a drag on efficiency and, in extremis, can lead to incidents which can cause significant financial loss and/or reputational damage. A way to make such problems more explicit is via a regular Data Audit, a review of data in source systems and as it travels through various data repositories. This would include some assessment of the completeness and overall quality of data, highlighting areas of particular concern. So one component might include the percentage of active records which suffer from a significant data quality issue.

It is important that any such issues are categorised. Are they the result of less than perfect data entry procedures, which could be tightened up? Are they due to deficient validation in transactional systems, where this could be improved and there may be a role for Master Data Management? Are data interfaces between systems to blame, where these need to be reengineered or potentially replaced? Are there architectural issues with systems or repositories, which will require remedial work to address?

This information needs to be rolled up and presented in an accessible manner so that those responsible for systems and processes can understand where issues lie. Data Audits, even if partially automated, take time and effort, so it may be appropriate to carry them out quarterly. In this case, it is valuable to understand how the situation is changing over time and also to track the – hopefully positive – impact of any remedial action. Experienced Data Analysts with a good appreciation of how business is conducted in the organisation are the type of resource best suited to Data Audit work.

Data Quality

Much that needs to be said here is covered in the previous section about Data Audit. Data Quality can be defined as follows:

The characteristics of data that cover how accurately and completely it mirrors real world events and thereby how much reliance can be placed on it for the purpose of generating information and insight. Enhancing Data Quality should be a primary objective of Data Management teams.

Data and Analytics Dictionary entry: Data Quality

A Data Quality team, which would work closely with Data Audit colleagues, would be focussed on helping to drive improvements. The details of such work are covered in an earlier article, from which the following is excerpted:

There are a number of elements that combine to improve the quality of data:

As with any strategy, it is ideal to have the support of all four pillars. However, I have seen greater and quicker improvements through the fourth element than with any of the others.

Excerpted from: Using BI to drive improvements in data quality

Master Data Management

There is some overlap here with Data Definitions & Metadata as mentioned above. Master Data Management has also been mentioned here in the context of Data Quality initiatives. However this specialist area tends to demand dedicated staff. A definition is as follows:

Master Data Management is the term used to both describe the set of process by which Master Data is created, changed and deleted in an organisation and also the technological tools that can facilitate these processes. There is a strong relation here to Data Governance, an area which also encompasses broader objectives. The aim of MDM is to ensure that the creation of business transactions results in valid data, which can then be leveraged confidently to create Information.

Many of the difficulties in MDM arise from items of Master Data that can change over time; for example when one counterparty is acquired by another, or an organisational structure is changed (maybe creating new departments and consolidating old ones). The challenges here include, how to report historical transactions that are tagged with Master Data that has now changed.

Data and Analytics Dictionary entry: Master Data Management

At this point, we have covered all of the work-areas within our idealised Data Function. In the third and final piece, we will consider the right-hand column of Related Areas, ones that a Data Function must collaborate with. Having covered these, the trilogy will close by offering some thoughts on the challenges of setting up a Data Function and how these may be overcome.

 Part I Part II Part III

Notes

 [1] I am old enough to recall a time before Change portfolios, I can recall no organisation in which I have worked over the last 20 years in which Change portfolios have had a positive impact on data assets; maybe I have just been unlucky, but it begins to feel more like a fundamental Physical Law. [2] I have clearly been writing about hurricanes too much recently! [3] As is seen, for example in, the Introduction to my [as yet unfinished] book on the role of Group Theory in Theoretical Physics, Glimpses of Symmetry.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

The Anatomy of a Data Function – Part I

 Part I Part II Part III

Back in Alphabet Soup, I presented a diagram covering what I think are good and bad approaches to organising Analytics and Data Management. I wanted to offer an expanded view [1] of the good organisation chart and to talk a bit about each of its components. Originally, I planned to address these objectives across two articles. As happens to me all too frequently, the piece has now expanded to become three parts. The second may be read here, and the third here.

Let’s leap right in and look at my suggested chart:

I appreciate that the above is a lot of boxes! I can feel Finance and HR staff reaching for their FTE calculators as I write. A few things to note:

1. I have avoided the temptation to add the titles of executives, managers or team leaders. Alphabet Soup itself pointed out how tough it can be to wrestle with the nomenclature. Instead I have just focussed on areas of work.

2. The term “work areas” is intentional. In larger organisations, there may be teams or individuals corresponding to each box. In smaller ones Data Function staff will wear many hats and several work areas may be covered by one person.

3. In some places, a number of work areas that I have tagged as Data Function ones may be performed in other parts of the organisation, though it is to be hoped with collaboration and coordination.

Having dealt with these caveats, let’s provide some colour on each of these progressing from top to bottom and left to right. In this first article we will consider the Data Strategy, Analytics & Insight and Data Operations & Technology areas. The second part will cover the remaining elements of Data Architecture and Data Management. The final article, considers Related Areas before also covering some of the challenges that may be faced in setting up a Data Function.

In what follows, unless otherwise stated, text indented as a quotation is excerpted from the Data and Analytics Dictionary.

Data Strategy

A clear strategy is obviously most important to establish in the early days of a Data Function. Indeed a Data Strategy may well call for the creation of a Data Function where none currently exists. For anyone interested in this process, I recommend my series of three articles on this subject [2]. However a Data Strategy is not something carved in stone, it will need to be revisited and adapted (maybe significantly) as circumstances change (e.g. after an acquisition, a change in market conditions or potentially due to the emergence of some new technology). There is thus a need for ongoing work in this area. However, as demand for strategic work will tend to be lumpy, I suggest amalgamating Data Strategy with the following two sub-areas.

Data Comms & Education

Elsewhere on this site, I have highlighted the need for effective communication, education and assiduous follow-up in data programmes [3]. Education on data matters does not stop when a data quality drive is successfully completed, or when a new set of analytical capabilities are introduced, this is a need for an ongoing commitment here. Activities falling into this work area include: publishing regular data newsletters and infographics, designing and helping to deliver training programmes, providing follow-up and support to aid the embedded used of new capabilities or to ingrain new behaviours.

Relationship Management

There is a need for all Data Function staff to establish and maintain good working relations with any colleagues they come into contact with, regardless of their level or influence. However, the nature of, generally hierarchical, organisations is that it is often prudent to pay special attention to more senior staff, or to the type of person (common in many companies) who may not be that senior, but whose opinion is influential. In aggregate these two groups of people are often described as stakeholders. Providing regular updates to stakeholders and ensuring both that they are comfortable with Data Function work and that this is aligned with their priorities can be invaluable [4]. Having senior, business-savvy Data Function people available to do this work is the most likely path to success.

Analytics & Insight

Broadly speaking the Analytics area and its sub-areas are focussed more on one-off analyses rather that the recurrent production of information [5], the latter being more the preserve of the Data Operations & Technology area. There is also more of a statistical flavour to the work carried out here.

[Analytics relates to] deriving insights from data which are generally beyond the purpose for which the data was originally captured – to be contrasted with Information which relates to the meaning inherent in data (i.e. the reason that it was captured in the first place). Analytics often employ advanced statistical techniques (logistic regression, multivariate regression, time series analysis etc.) to derive meaning from data.

Data and Analytics Dictionary entry: Analytics

Data Science

I have Data Science as a sub-area of analytics, as with most terminology used in the data arena and most organisational units that exist in Data Functions, some people might argue that I have this the wrong way round and that Data Science should be preeminent. Reconciling different points of view is not my objective here, I think most people will agree that both work areas should be covered. This comment pertains to many other parts of this article. Here is a definition of the area (or rather the people who populate it):

[Data Scientists are people who are] au fait with exploiting data in many formats from Flat Files to Data Warehouses to Data Lakes. Such individuals possess equal abilities in the data technologies (such as Big Data) and how to derive benefit from these via statistical modelling. Data Scientists are often lapsed actual scientists.

Data and Analytics Dictionary entry: Data Scientist

Data Visualisation

There is an overlap here with both the Data Science team within the Analytics & Insight area and the Business Intelligence team in the Data Operations & Technology area. Many of the outputs of a good Data Function will include graphs, charts and other such exhibits. However, here would be located the real specialists, the people who would set standards for the presentation of visual data across the Data Function and be the most able in leveraging visualisation tools. A definition of Data Visualisation is as follows:

Techniques – such as graphs – for presenting complex information in a manner in which it can be more easily digested by human observers. Based on the concept that a picture paints a thousand words (or a dozen Excel sheets).

Data and Analytics Dictionary entry: Data Visualisation

Predictive Analytics

Gartner refer to four types of Analytics: descriptive, diagnostic, predictive and prescriptive analytics. In an article I referred to these as:

• What happened?
• Why did it happen?
• What is going to happen next?
• What should we be doing?

Data and Analytics Dictionary entry: Analytics

Predictive analytics is that element of the Analytics function that aims to predict the future, “What is going to happen next?” in the above list. This can be as simple as extrapolating data based on a trend line, or can involve more sophisticated techniques such as Time Series Analysis. As with most elements of the Data Function, there is overlap between Predictive Analytics and both Data Science and Business Intelligence.

“Skunkworks”

As with Data Strategy, state-of-the-art in Analytics & Insight will continue to evolve. This part of the Data Function will aim to keep current with the latest developments and to try out new techniques and new technologies that may later be adopted more widely by Data Function colleagues. The “skunkworks” team would be staffed by capable programmers / data scientists / statisticians.

Data Operations & Technology

It could be reasonably argued that this area is part of Data Management; I probably would not object too strongly to this suggestion. However, there are some benefits to considering it separately. This is the most IT-like of the areas considered here. It recognises that data technology (being it the Hadoop suite, Data Warehouse technology, or combinations of both) is different to many other forms of technology and needs its own specialists to focus on it. It is likely that the staff in this area will also collaborate closely with IT (see the final work area in Part II) or, in some cases, supervise work carried out by IT. As well as directly creating data capabilities, Data Operations & Technology staff would be active in the day-to-day running of these; again in collaboration with colleagues from both inside and outside of the Data Function.

There is no ISO definition, but I use this term as a catch-all to describe the transformation of raw data into information that can be disseminated to business people to support decision-making.

Data and Analytics Dictionary entry: Business Intelligence

This sub-area focusses on the relatively mature task of providing Business Intelligence solutions to organisations and working with IT to support and maintain these. Good BI tools work best on a sound underlying information architecture and so there would need to also be close collaboration with Data Infrastructure staff within Data Operations & Technology as well as colleagues from Data Architecture and also Analytics & Insight.

Regular Reporting

If BI provides interactive capabilities to support decision making, Regular Reporting is about the provision of specific key reports to relevant parties on a periodic basis; daily, weekly, monthly etc. These may be burst out to people’s e-mail accounts, provided at some central location, or both. While this an area that is ideally automated, there will still be significant need for human monitoring and to support the inevitable changes.

Data Service

One of the things that any part of a Data Function will find itself doing on a very regular basis is crafting ad hoc data extracts for other departments, e.g. Marketing, Risk & Compliance etc. Sometimes such a need will be on an ongoing basis and a web-service or some other Data Integration mechanism will need to be set up. Rather than having this be something that is supported out of the general running costs of the Data Function, it makes sense to have a specific unit whose role is to fulfil these needs. Even so, there may be a need for queuing and prioritisation of requests

Data Infrastructure

This relates to the physical architecture of the data landscape (for various flavours of logical architectures, see Data Architecture in Part II). While some of the tasks here may be carried out by (or in collaboration with) IT, the Data Infrastructure team will be expert at the care and feeding of Hadoop and related technologies and have experience in the fine-tuning of Data Warehouses and Data Marts.

SWAT Team

While (as both mentioned above and also covered in Part III this article) some of the heavy lifting in data matters will be carried out by an organisation’s IT team and / or its external partners, the process for getting things done in this way can be slow, tortuous and expensive [6]. It is important that a Data Function has its own capability to make at least minor technological changes, or to build and deploy helpful data facilities without having to engage with the overall bureaucracy. The SWAT Team will have a small number of very capable and business-knowledgeable programmers, capable of quickly generating robust and functional code.

The second part of this piece picks up where I have left off here and first consider Data Architecture.

 Part I Part II Part III

Notes

 [1] I have added some functions that were absent in the previous one, mostly as they were not central to the points I was making in the previous article. [2] My trilogy on Formatting a Data / Information Strategy has the following parts: [3] While this theme runs through most of my writing, it is most explicitly referenced in the following three articles: [4] It should be noted that the relationship management described here is not the same as a Project Manager covering progress against plan. This is more of a two way conversation to ensure that the Data Function remains cognisant of stakeholder needs [5] Though of course sometimes one-off analyses have value on an ongoing basis and so need to be productionised. In such cases the Analytics & Insight team would work with the Data Operations & Technology team to achieve this. [6] No citation needed.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

The revised and expanded Data and Analytics Dictionary

Since its launch in August of this year, the peterjamesthomas.com Data and Analytics Dictionary has received a welcome amount of attention with various people on different social media platforms praising its usefulness, particularly as an introduction to the area. A number of people have made helpful suggestions for new entries or improvements to existing ones. I have also been rounding out the content with some more terms relating to each of Data Governance, Big Data and Data Warehousing. As a result, The Dictionary now has over 80 main entries (not including ones that simply refer the reader to another entry, such as Linear Regression, which redirects to Model).

The most recently added entries are as follows:

It is my intention to continue to revise this resource. Adding some more detail about Machine Learning and related areas is probably the next focus.

As ever, ideas for what to include next would be more than welcome (any suggestions used will also be acknowledged).

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

A truth universally acknowledged…

 “It is a truth universally acknowledged, that an organisation in possession of some data, must be in want of a Chief Data Officer” — Growth and Governance, by Jane Austen (1813) [1]

I wrote about a theoretical job description for a Chief Data Officer back in November 2015 [2]. While I have been on “paternity leave” following the birth of our second daughter, a couple of genuine CDO job specs landed in my inbox. While unable to respond for the aforementioned reasons, I did leaf through the documents. Something immediately struck me; they were essentially wish-lists covering a number of data-related fields, rather than a description of what a CDO might actually do. Clearly I’m not going to cite the actual text here, but the following is representative of what appeared in both requirement lists:

Mandatory Requirements:

Highly Desirable Requirements:

• PhD in Mathematics or a numerical science (with a strong record of highly-cited publications)
• MBA from a top-tier Business School
• TOGAF certification
• PRINCE2 and Agile Practitioner
• Invulnerability and X-ray vision [3]
• Mastery of the lesser incantations and a cloak of invisibility [3]
• Full, clean driving licence

The above list may have descended into farce towards the end, but I would argue that the problems started to occur much earlier. The above is not a description of what is required to be a successful CDO, it’s a description of a Swiss Army Knife. There is also the minor practical point that, out of a World population of around 7.5 billion, there may well be no one who ticks all the boxes [4].

Let’s make the fallacy of this type of job description clearer by considering what a simmilar approach would look like if applied to what is generally the most senior role in an organisation, the CEO. Whoever drafted the above list of requirements would probably characterise a CEO as follows:

• The best salesperson in the organisation
• The best accountant in the organisation
• The best M&A person in the organisation
• The best customer service operative in the organisation
• The best facilities manager in the organisation
• The best janitor in the organisation
• The best purchasing clerk in the organisation
• The best lawyer in the organisation
• The best programmer in the organisation
• The best marketer in the organisation
• The best product developer in the organisation
• The best HR person in the organisation, etc., etc., …

Of course a CEO needs to be none of the above, they need to be a superlative leader who is expert at running an organisation (even then, they may focus on plotting the way forward and leave the day to day running to others). For the avoidance of doubt, I am not saying that a CEO requires no domain knowledge and has no expertise, they would need both, however they don’t have to know every aspect of company operations better than the people who do it.

The same argument applies to CDOs. Domain knowledge probably should span most of what is in the job description (save for maybe the three items with footnotes), but knowledge is different to expertise. As CDOs don’t grow on trees, they will most likely be experts in one or a few of the areas cited, but not all of them. Successful CDOs will know enough to be able to talk to people in the areas where they are not experts. They will have to be competent at hiring experts in every area of a CDO’s purview. But they do not have to be able to do the job of every data-centric staff member better than the person could do themselves. Even if you could identify such a CDO, they would probably lose their best staff very quickly due to micromanagement.

A CDO has to be a conductor of both the data function orchestra and of the use of data in the wider organisation. This is a talent in itself. An internationally renowned conductor may have previously been a violinist, but it is unlikely they were also a flautist and a percussionist. They do however need to be able to tell whether or not the second trumpeter is any good or not; this is not the same as being able to play the trumpet yourself of course. The conductor’s key skill is in managing the efforts of a large group of people to create a cohesive – and harmonious – whole.

The CDO is of course still a relatively new role in mainstream organisations [5]. Perhaps these job descriptions will become more realistic as the role becomes more familiar. It is to be hoped so, else many a search for a new CDO will end in disappointment.

Having twisted her text to my own purposes at the beginning of this article, I will leave the last words to Jane Austen:

 “A scheme of which every part promises delight, can never be successful; and general disappointment is only warded off by the defence of some little peculiar vexation.” — Pride and Prejudice, by Jane Austen (1813)

Notes

 [1] Well if a production company can get away with Pride and Prejudice and Zombies, then I feel I am on reasonably solid ground here with this title. I also seem to be riffing on JA rather a lot at present, I used Rationality and Reality as the title of one of the chapters in my [as yet unfinished] Mathematical book, Glimpses of Symmetry. [2] Wanted – Chief Data Officer. [3] Most readers will immediately spot the obvious mistake here. Of course all three of these requirements should be mandatory. [4] To take just one example, gaining a PhD in a numerical science, a track record of highly-cited papers and also obtaining an MBA would take most people at least a few weeks of effort. Is it likely that such a person would next focus on a PRINCE2 or TOGAF qualification? [5] I discuss some elements of the emerging consensus on what a CDO should do in: 5 Themes from a Chief Data Officer Forum and 5 More Themes from a Chief Data Officer Forum.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary

I find myself frequently being asked questions around terminology in Data and Analytics and so thought that I would try to define some of the more commonly used phrases and words. My first attempt to do this can be viewed in a new page added to this site (this also appears in the site menu):

The Data and Analytics Dictionary

I plan to keep this up-to-date as the field continues to evolve.

I hope that my efforts to explain some concepts in my main area of specialism are both of interest and utility to readers. Any suggestions for new entries or comments on existing ones are more than welcome.

20 Risks that Beset Data Programmes

This article draws extensively on elements of the framework I use to both highlight and manage risks on data programmes. It has its genesis in work that I did early in 2012 (but draws on experience from the years before this). I have tried to refresh the content since then to reflect new thinking and new developments in the data arena.

Introduction

What are my motivations in publishing this article? Well I have both designed and implemented data and information programmes for over 17 years. In the majority of cases my programme work has been a case of executing a data strategy that I had developed myself [1]. While I have generally been able to steer these programmes to a successful outcome [2], there have been both bumps in the road and the occasional blind alley, requiring a U-turn and another direction to be selected. I have also been able to observe data programmes that ran in parallel to mine in different parts of various organisations. Finally, I have often been asked to come in and address issues with an existing data programme; something that appears to happens all too often. In short I have seen a lot of what works and what does not work. Having also run other types of programmes [3], I can also attest to data programmes being different. Failure to recognise this difference and thus approaching a data programme just like any other piece of work is one major cause of issues [4].

Before I get into my list proper, I wanted to pause to highlight a further couple of mistakes that I have seen made more than once; ones that are more generic in nature and thus don’t appear on my list of 20 risks. The first is to assume that the way that an organisation’s data is controlled and leveraged can be improved in a sustainable way by just kicking off a programme. What is more important in my experience is to establish a data function, which will then help with both the governance and exploitation of data. This data function, ideally sitting under a CDO, will of course want to initiate a range of projects, from improving data quality, to sprucing up reporting, to establishing better analytical capabilities. Best practice is to gather these activities into a programme, but things work best if the data function is established first, owns such a programme and actively partakes in its execution.

As well as the issue of ongoing versus transitory accountability for data and the undoubted damage that poorly coordinated change programmes can inflict on data assets, another driver for first establishing a data function is that data needs will always be there. On the governance side, new systems will be built, bought and integrated, bringing new data challenges. On the analytical side, there will always be new questions to be answered, or old ones to be reevaluated. While data-centric efforts will generate many projects with start and end dates, the broad stream of data work continues on in a way that, for example, the implementation of a new B2C capability does not.

The second is to believe that you will add lasting value by outsourcing anything but targeted elements of your data programme. This is not to say that there is no place for such arrangements, which I have used myself many times, just that one of the lasting benefits of gimlet-like focus on data is the IP that is built up in the data team; IP that in my experience can be leveraged in many different and beneficial ways, becoming a major asset to the organisation [5].

Having made these introductory comments, let’s get on to the main list, which is divided into broadly chronological sections, relating to stages of the programme. The 10 risks which I believe are either most likely to materialise, or which will probably have the greatest impact are highlighted in pale yellow.

Up-front Risks

With risk 2 an analogy is trying to build a house in your spare time. If work can only be done in evenings or at the weekend, then this is going to take a long time. Nevertheless organisations too frequently expect data programmes to be absorbed in existing headcount and fitted in between people’s day jobs.

We can we extend the building metaphor to cover risk 4. If you are going to build your own house, it would help that you understand carpentry, plumbing, electricals and brick-laying and also have a grasp on the design fundamentals of how to create a structure that will withstand wind rain and snow. Too often companies embark on data programmes with staff who have a bit of a background in reporting or some related area and with managers who have never been involved in a data programme before. This is clearly a recipe for disaster.

Risk 5 reminds us that governance is also important – both to ensure that the programme stays focussed on business needs and also to help the team to negotiate the inevitable obstacles. This comes back to a successful data programme needing to be more than just a technology project.

Programme Execution Risks

 Risk Potential Impact 7. Poor programme management. The programme loses direction. Time is expended on non-core issues. Milestones are missed. Expenditure escalates beyond budget. 8. Poor programme communication. Stakeholders have no idea what is happening [6]. The programme is viewed as out of touch / not pertinent to business issues. Steering does not understand what is being done or why. Prospective users have no interest in the programme. 9. Big Bang approach. Too much time goes by without any value being created. The eventual Big Bang is instead a damp squib. Large sums of money are spent without any benefits. 10. Endless search for the perfect solution / adherence to overly theoretical approaches. Programme constantly polishes rocks rather than delivering. Data models reflect academic purity rather than real-world performance and maintenance needs. 11. Lack of focus on interim deliverables. Business units become frustrated and seek alternative ways to meet their pressing needs. This leads to greater fragmentation and reputational damage to programme. 12. Insufficient time spent understanding source system data and how data is transformed as it flows between systems. Data capabilities that do not reflect business transactions with fidelity. There is inconsistency with reports directly drawn from source systems. Reconciliation issues arise (see next point). 13. Poor reconciliation. If analytical capabilities do not tell a consistent story, they will not be credible and will not be used. 14. Strong approach to data quality. Data facilities are seen as inaccurate because of poor data going into them. Data facilities do not match actual business events due to either massaging of data or exclusion of transactions with invalid attributes.

Probably the single most common cause of failure with data programmes – and indeed or ERP projects and acquisitions and any other type of complex endeavour – is risk 7, poor programme management. Not only do programme managers have to be competent, they should also be steeped in data matters and have a good grasp of the factors that differentiate data programmes from more general work.

Relating to the other highlighted risks in this section, the programme could spend two years doing work without surfacing anything much and then, when they do make their first delivery, this is a dismal failure. In the same vein, exclusive focus on strategic capabilities could prevent attention being paid to pressing business needs. At the other end of the spectrum, interim deliveries could spiral out of control, consuming all of the data team’s time and meaning that the strategic objective is never reached. A better approach is that targeted and prioritised interims help to address pressing business needs, but also inform more strategic work. From the other perspective, progress on strategic work-streams should be leveraged whenever it can be, perhaps in less functional manners that the eventual solution, but good enough and also helping to make sure that the final deliveries are spot on [7].

User Requirement Risks

 Risk Potential Impact 15. Not enough up-front focus on understanding key business decisions and the information necessary to take them. Analytic capabilities do not focus on what people want or need, leading to poor adoption and benefits not being achieved. 16. In the absence of the above, the programme becoming a technology-driven one. The business gets what IT or Change think that they need, not what is actually needed. There is more focus on shiny toys than on actionable information. The programme forgets the needs of its customers. 17. A focus on replicating what the organisation already has but in better tools, rather than creating what it wants. Beautiful data visualisations that tell you close to nothing. Long lists of existing reports with their fields cross-referenced to each other and a new solution that is essentially the lowest common denominator of what is already in place; a step backwards.

The other most common reasons for data programme failure is a lack of focus on user needs and insufficient time spent with business people to ensure that systems reflect their requirements [8].

Integration Risk

 Risk Potential Impact 18. Lack of leverage of new data capabilities in front-end / digital systems. These systems are less effective. The data team is jealous about its capabilities being the only way that users should get information, rather than adopting a more pragmatic and value-added approach.

It is important for the data team to realise that their work, however important, is just one part of driving a business forward. Opportunities to improve other system facilities by the leverage of new data structures should be taken wherever possible.

Deployment Risks

 Risk Potential Impact 19. Education is an afterthought, training is technology- rather than business-focused. People neither understand the capabilities of new analytical tools, nor how to use them to derive business value. Again this leads to poor adoption and little return on investment. 20. Declaring success after initial implementation and training. Without continuing to water the immature roots, the plant withers. Early adoption rates fall and people return to how they were getting information pre-launch. This means that the benefits of the programme not realised.

Finally excellent technical work needs to be complemented with equal attention to business-focussed education, training using real-life scenarios and assiduous follow up. These things will make or break the programme [9].

Summary.

Of course I don’t claim that the above list is exhaustive. You could successfully mitigate all of the above risks on your data programme, but still get sunk by some other unforeseen problem arising. There is a need to be flexible and to adapt to both events and how your organisation operates; there are no guarantees and no foolproof recipes for success [10].

My recommendation to data professionals is to develop your own approach to risk management based on your own experience, your own style and the culture within which you are operating. If just a few of the items on my list of risks can be usefully amalgamated into this, then I will feel that this article has served its purpose. If you are embarking on a data programme, maybe your first one, then be warned that these are hard and your reserves of perseverance will be tested. I’d suggest leveraging whatever tools you can find in trying to forge ahead.

It is also maybe worth noting that, somewhat contrary to my point that data programmes are different, a few of the risks that I highlight above could be tweaked to apply to more general programmes as well. Hopefully the things that I have learnt over the last couple of decades of running data programmes will be something that can be of assistance to you in your own work.

Notes

The Chief Data Officer “Sweet Spot”

I verbally “scribbled” something quite like the exhibit above recently in conversation with a longstanding professional associate. This was while we were discussing where the CDO role currently sat in some organisations and his or her span of responsibilities. We agreed that – at least in some cases – the role was defined sub-optimally with reference to the axes in my virtual diagram.

This discussion reminded me that I was overdue a piece commenting on November’s IRM(UK) CDO Executive Forum; the third in a sequence that I have covered in these pages [1], [2]. In previous CDO Exec Forum articles, I have focussed mainly on the content of the day’s discussions. Here I’m going to be more general and bring in themes from the parent event; IRM(UK) Enterprise Data / Business Intelligence 2016. However I will later return to a theme central to the Exec Forum itself; the one that is captured in the graphic at the head of this article.

As well as attending the CDO Forum, I was speaking at the umbrella event. The title of my talk was Data Management, Analytics, People: An Eternal Golden Braid [3].

The real book, whose title I had plagiarised, is Gödel, Escher and Bach, an Eternal Golden braid, by Pulitzer-winning American Author and doyen of 1970s pop-science books, Douglas R. Hofstadter [4]. This book, which I read in my youth, explores concepts in consciousness, both organic and machine-based, and their relation to recursion and self-reference. The author argued that these themes were major elements of the work of each of Austrian Mathematician Kurt Gödel (best known for his two incompleteness theorems), Dutch graphic artist Maurits Cornelis Escher (whose almost plausible, but nevertheless impossible buildings and constantly metamorphosing shapes adorn both art galleries and college dorms alike) and German composer Johann Sebastian Bach (revered for both the beauty and mathematical elegance of his pieces, particularly those for keyboard instruments). In an age where Machine Learning and other Artificial Intelligence techniques are moving into the mainstream – or at least on to our Smartphones – I’d recommend this book to anyone who has not had the pleasure of reading it.

In my talk, I didn’t get into anything as metaphysical as Hofstadter’s essays that intertwine patterns in Mathematics, Art and Music, but maybe some of the spirit of his book rubbed off on my much lesser musings. In any case, I felt that my session was well-received and one particular piece of post-presentation validation had me feeling rather like these guys for the rest of the day:

What happened was that a longstanding internet contact [5] sought me out and commended me on both my talk and the prescience of my July 2009 article, Is the time ripe for appointing a Chief Business Intelligence Officer? He argued convincingly that this foreshadowed the emergence of the Chief Data Officer. While it is an inconvenient truth that Visa International had a CDO eight years earlier than my article appeared, on re-reading it, I was forced to acknowledge that there was some truth in his assertion.

To return to the matter in hand, one point that I made during my talk was that Analytics and Data Management are two sides of the same coin and that both benefit from being part of the same unitary management structure. By this I mean each area reporting into an Executive who has a strong grasp of what they do, rather than to a general manager. More specifically, I would see Data Compliance work and Data Synthesis work each being the responsibility of a CDO who has experience in both areas.

It may seem that crafting and implementing data policies is a million miles from data visualisation and machine learning, but to anyone with a background in the field, they are much more strongly related. Indeed, if managed well (which is often the main issue), they should be mutually reinforcing. Thus an insightful model can support business decision-making, but its authors would generally be well-advised to point out any areas in which their work could be improved by better data quality. Efforts to achieve the latter then both improve the usefulness of the model and help make the case for further work on data remediation; a virtuous circle.

Here we get back to the vertical axis in my initial diagram. In many organisations, the CDO can find him or herself at the extremities. Particularly in Financial Services, an industry which has been exposed to more new regulation than many in recent years, it is not unusual for CDOs to have a Risk or Compliance background. While this is very helpful in areas such as Governance, it is less of an asset when looking to leverage data to drive commercial advantage.

Symmetrically, if a rookie CDO was a Data Scientist who then progressed to running teams of Data Scientists, they will have a wealth of detailed knowledge to fall back on when looking to guide business decisions, but less familiarity with the – sometimes apparently thankless, and generally very arduous – task of sorting out problems in data landscapes.

Despite this, it is not uncommon to see CDOs who have a background in just one of these two complementary areas. If this is the case, then the analytics expert will have to learn bureaucratic and programme skills as quickly as they can and the governance guru will need to expand their horizons to understand the basics of statistical modelling and the presentation of information in easily digestible formats. It is probably fair to say that the journey to the centre is somewhat perilous when either extremity is the starting point.

Let’s now think about the second and horizontal axis. In some organisations, a newly appointed CDO will be freshly emerged from the ranks of IT (in some they may still report to the CIO, though this is becoming more of an anomaly with each passing year). As someone whose heritage is in IT (though also from very early on with a commercial dimension) I understand that there are benefits to such a career path, not least an in-depth understanding of at least some of the technologies employed, or that need to be employed. However a technology master who is also a business neophyte is unlikely to set the world alight as a newly-minted CDO. Such people will need to acquire new skills, but the learning curve is steep.

To consider the other extreme of this axis, it is undeniable that a CDO organisation will need to undertake both technical and technological work (or at least to guide this in other departments). Therefore, while an in-depth understanding of a business, its products, markets, customers and competitors will be of great advantage to a new CDO, without at least a reasonable degree of technical knowledge, they may struggle to connect with some members of their team; they may not be able to immediately grasp what technology tasks are essential and which are not; and they may not be able to paint an accurate picture of what good looks like in the data arena. Once more rapid assimilation of new information and equally rapid acquisition of new skills will be called for.

At this point it will be pretty obvious that my central point here is that the “sweet spot” for a CDO, the place where they can have greatest impact on an organisation and deliver the greatest value, is at the centre point of both of these axes. When I was talking to my friend about this, we agreed that one of the reasons why not many CDOs sit precisely at this nexus is because there are few people with equal (or at least balanced) expertise in the business and technology fields; few people who understand both data synthesis and data compliance equally well; and vanishingly few who sit in the centre of both of these ranges.

Perhaps these facts would also have been apparent from revewing the CDO job description I posted back in November 2015 as part of Wanted – Chief Data Officer. However, as always, a picture paints a thousand words and I rather like the compass-like exhibit I have come up with. Hopefully it conveys a similar message more rapidly and more viscerally.

To bring things back to the IRM(UK) CDO Executive Forum, I felt that issues around where delegates sat on my CDO “sweet spot” diagram (or more pertinently where they felt that they should sit) were a sub-text to many of our discussions. It is worth recalling that the mainstream CDO is still an emergent role and a degree of confusion around what they do, how they do it and where they sit in organisations is inevitable. All CxO roles (with the possible exception of the CEO) have gone through similar journeys. It is probably instructive to contrast the duties of a Chief Risk Officer before 2008 with the nature and scope of their responsibilities now. It is my opinion that the CDO role (and individual CDOs) will travel an analogous path and eventually also settle down to a generally accepted set of accountabilities.

In the meantime, if your organisation is lucky enough to have hired one of the small band of people whose experience and expertise already place them in the CDO “sweet spot”, then you are indeed fortunate. If not, then not all is lost, but be prepared for your new CDO to do a lot of learning on the job before they too can join the rather exclusive club of fully rounded CDOs.

Epilogue

As an erstwhile Mathematician, I’ve never seen a framework that I didn’t want to generalise. It occurs to me and – I assume – will also occur to many readers that the North / South and East / West diagram I have created could be made even more compass-like by the addition of North East / South West and North West / South East axes, with our idealised CDO sitting in the middle of these spectra as well [6].

Readers can debate amongst themselves what the extremities of these other dimensions might be. I’ll suggest just a couple: “Change” and “Business as Usual”. Given how organisations seem to have evolved in recent years, it is often unfortunately a case of never the twain shall meet with these two areas. However a good CDO will need to be adept at both and, from personal experience, I would argue that mastery of one does not exclude mastery of the other.

Notes

 [1] See each of: [2] The main reasons for delay were a house move and a succession of illnesses in my family – me included – so I’m going to give myself a pass. [3] The sub-title was A Metaphorical Fugue On The Data ⇨ Information ⇨ Insight ⇨ Action Journey in The Spirt Of Douglas R. Hofstadter, which points to the inspiration behind my talk rather more explicity. [4] Douglas R. Hofstadter is the son of Nobel-wining physicist Robert Hofstadter. Prize-winning clearly runs in the Hofstadter family, much as with the Braggs, Bohrs, Curies, Euler-Chelpins, Kornbergs, Siegbahns, Tinbergens and Thomsons. [5] I am omitting any names or other references to save his blushes. [6] I could have gone for three or four dimensional Cartesian coordinates as well I realise, but sometimes (very rarely it has to be said) you can have too much Mathematics.

Curiouser and Curiouser – The Limits of Brexit Voting Analysis

Down the Rabbit-hole

When I posted my Brexit infographic reflecting the age of voters an obvious extension was to add an indication of the number of people in each age bracket who did not vote as well as those who did. This seemed a relatively straightforward task, but actually proved to be rather troublesome (this may be an example of British understatement). Maybe the caution I gave about statistical methods having a large impact on statistical outcomes in An Inconvenient Truth should have led me to expect such issues. In any case, I thought that it would be instructive to talk about the problems I stumbled across and to – once again – emphasise the perils of over-extending statistical models.

Regular readers will recall that my Brexit Infographic (reproduced above) leveraged data from an earlier article, A Tale of two [Brexit] Data Visualisations. As cited in this article, the numbers used were from two sources:

1. The UK Electoral Commission – I got the overall voting numbers from here.
2. Lord Ashcroft’s Poling organisation – I got the estimated distribution of votes by age group from here.

In the notes section of A Tale of two [Brexit] Data Visualisations I [prophetically] stated that the breakdown of voting by age group was just an estimate. Based on what I have discovered since, I’m rather glad that I made this caveat explicit.

The Pool of Tears

In order to work out the number of people in each age bracket who did not vote, an obvious starting point would be the overall electorate, which the UK Electoral Commission stated as being 46,500,001. As we know that 33,551,983 people voted (an actual figure rather than an estimate), then this is where the turnout percentage of 72.2% (actually 72.1548%) came from (33,551,983 / 45,500,001).

A clarifying note, the electorate figures above refer to people who are eligible to vote. Specifically, in order to vote in the UK Referendum, people had to meet the following eligibility criteria (again drawn from the UK Electoral Commission):

To be eligible to vote in the EU Referendum, you must be:

• A British or Irish citizen living in the UK, or
• A Commonwealth citizen living in the UK who has leave to remain in the UK or who does not require leave to remain in the UK, or
• A British citizen living overseas who has been registered to vote in the UK in the last 15 years, or
• An Irish citizen living overseas who was born in Northern Ireland and who has been registered to vote in Northern Ireland in the last 15 years.

EU citizens are not eligible to vote in the EU Referendum unless they also meet the eligibility criteria above.

So far, so simple. The next thing I needed to know was how the electorate was split by age. This is where we begin to run into problems. One place to start is the actual population of the UK as at the last census (2011). This is as follows:

 Ages (years) Population % of total 0–4 3,914,000 6.2 5–9 3,517,000 5.6 10–14 3,670,000 5.8 15–19 3,997,000 6.3 20–24 4,297,000 6.8 25–29 4,307,000 6.8 30–34 4,126,000 6.5 35–39 4,194,000 6.6 40–44 4,626,000 7.3 45–49 4,643,000 7.3 50–54 4,095,000 6.5 55–59 3,614,000 5.7 60–64 3,807,000 6.0 65–69 3,017,000 4.8 70–74 2,463,000 3.9 75–79 2,006,000 3.2 80–84 1,496,000 2.4 85–89 918,000 1.5 90+ 476,000 0.8 Total 63,183,000 100.0

If I roll up the above figures to create the same age groups as in the Ashcroft analysis (something that requires splitting the 15-19 range, which I have assumed can be done uniformly), I get:

 Ages (years) Population % of total 0-17 13,499,200 21.4 18-24 5,895,800 9.3 25-34 8,433,000 13.3 35-44 8,820,000 14.0 45-54 8,738,000 13.8 55-64 7,421,000 11.7 65+ 10,376,000 16.4 Total 63,183,000 100.0

The UK Government isn’t interested in the views of people under 18[citation needed], so eliminating this row we get:

 Ages (years) Population % of total 18-24 5,895,800 11.9 25-34 8,433,000 17.0 35-44 8,820,000 17.8 45-54 8,738,000 17.6 55-64 7,421,000 14.9 65+ 10,376,000 20.9 Total 49,683,800 100.0

As mentioned, the above figures are from 2011 and the UK population has grown since then. Web-site WorldOMeters offers an extrapolated population of 65,124,383 for the UK in 2016 (this is as at 12th July 2016; if extrapolation and estimates make you queasy, I’d suggest closing this article now!). I’m going to use a rounder figure of 65,125,000 people; there is no point pretending that precision exists where it clearly doesn’t. Making the assumption that such growth is uniform across all age groups (please refer to my previous bracketed comment!), then the above exhibit can also be extrapolated to give us:

 Ages (years) Population % of total 18-24 6,077,014 11.9 25-34 8,692,198 17.0 35-44 9,091,093 17.8 45-54 9,006,572 17.6 55-64 7,649,093 14.9 65+ 10,694,918 20.9 Total 51,210,887 100.0

Looking Glass House

So our – somewhat fabricated – figure for the 18+ UK population in 2016 is 51,210,887, let’s just call this 51,200,000. As at the beginning of this article the electorate for the 2016 UK Referendum was 45,500,000 (dropping off the 1 person with apologies to him or her). The difference is explicable based on the eligibility criteria quoted above. I now have a rough age group break down of the 51.2 million population, how best to apply this to the 45.5 million electorate?

I’ll park this question for the moment and instead look to calculate a different figure. Based on the Ashcroft model, what percentage of the UK population (i.e. the 51.2 million) voted in each age group? We can work this one out without many complications as follows:

 Ages (years) Population(A) Voted(B) Turnout %(B/A) 18-24 6,077,014 1,701,067 28.0 25-34 8,692,198 4,319,136 49.7 35-44 9,091,093 5,656,658 62.2 45-54 9,006,572 6,535,678 72.6 55-64 7,649,093 7,251,916 94.8 65+ 10,694,918 8,087,528 75.6 Total 51,210,887 33,551,983 65.5

(B) = Size of each age group in the Ashcroft sample as a percentage multiplied by the total number of people voting (see A Tale of two [Brexit] Data Visualisations).

Remember here that actual turnout figures have electorate as the denominator, not population. As the electorate is less than the population, this means that all of the turnout percentages should actually be higher than the ones calculated (e.g. the overall turnout with respect to electorate is 72.2% whereas my calculated turnout with respect to population is 65.5%). So given this, how to explain the 94.8% turnout of 55-64 year olds? To be sure this group does reliably turn out to vote, but did essentially all of them (remembering that the figures in the above table are too low) really vote in the referendum? This seems less than credible.

The turnout for 55-64 year olds in the 2015 General Election has been estimated at 77%, based on an overall turnout of 66.1% (web-site UK Political Info; once more these figures will have been created based on techniques similar to the ones I am using here). If we assume a uniform uplift across age ranges (that “assume” word again!) then one might deduce that an increase in overall turnout from 66.1% to 72.2%, might lead to the turnout in the 55-64 age bracket increasing from 77% to 84%. 84% turnout is still very high, but it is at least feasible; close to 100% turnout in from this age group seems beyond the realms of likelihood.

So what has gone wrong? Well so far the only culprit I can think of is the distribution of voting by age group in the Ashcroft poll. To be clear here, I’m not accusing Lord Ashcroft and his team of sloppy work. Instead I’m calling out that the way that I have extrapolated their figures may not be sustainable. Indeed, if my extrapolation is valid, this would imply that the Ashcroft model over estimated the proportion of 55-64 year olds voting. Thus it must have underestimated the proportion of voters in some other age group. Putting aside the likely fact that I have probably used their figures in an unintended manner, could it be that the much-maligned turnout of younger people has been misrepresented?

To test the validity of this hypothesis, I turned to a later poll by Omnium. To be sure this was based on a sample size of around 2,000 as opposed to Ashcroft’s 12,000, but it does paint a significantly different picture. Their distribution of voter turnout by age group was as follows:

 Ages (years) Turnout % 18-24 64 25-39 65 40-54 66 55-64 74 65+ 90

I have to say that the Omnium age groups are a bit idiosyncratic, so I have taken advantage of the fact that the figures for 25-54 are essentially the same to create a schedule that matches the Ashcroft groups as follows:

 Ages (years) Turnout % 18-24 64 25-34 65 35-44 65 45-54 65 55-64 74 65+ 90

The Omnium model suggests that younger voters may have turned out in greater numbers than might be thought based on the Ashcroft data. In turn this would suggest that a much greater percentage of 18-24 year olds turned out for the Referendum (64%) than for the last General Election (43%); contrast this with an estimated 18-24 turnout figure of 47% based on the just increase in turnout between the General Election and the Referendum. The Omnium estimates do still however recognise that turnout was still greater in the 55+ brackets, which supports the pattern seen in other elections.

Humpty Dumpty

While it may well be that the Leave / Remain splits based on the Ashcroft figures are reasonable, I’m less convinced that extrapolating these same figures to make claims about actual voting numbers by age group (as I have done) is tenable. Perhaps it would be better to view each age cohort as a mini sample to be treated independently. Based on the analysis above, I doubt that the turnout figures I have extrapolated from the Ashcroft breakdown by age group are robust. However, that is not the same as saying that the Ashcroft data is flawed, or that the Omnium figures are correct. Indeed the Omnium data (at least those elements published on their web-site) don’t include an analysis of whether the people in their sample voted Leave or Remain, so direct comparison is not going to be possible. Performing calculation gymnastics such as using the Omnium turnout for each age group in combination with the Ashcroft voting splits for Leave and Remain for the same age groups actually leads to a rather different Referendum result, so I’m not going to plunge further down this particular rabbit hole.

In summary, my supposedly simple trip to the destitution of an enhanced Brexit Infographic has proved unexpectedly arduous, winding and beset by troubles. These challenges have proved so great that I’ve abandoned the journey and will be instead heading for home.

Which dreamed it?

Based on my work so far, I have severe doubts about the accuracy of some of the age-based exhibits I have published (versions of which have also appeared on many web-sites, the BBC to offer just one example, scroll down to “How different age groups voted” and note that the percentages cited reconcile to mine). I believe that my logic and calculations are sound, but it seems that I am making too many assumptions about how I can leverage the Ashcroft data. After posting this article, I will accordingly go back and annotate each of my previous posts and link them to these later findings.

I think the broader lesson to be learnt is that estimates are just that, attempts (normally well-intentioned of course) to come up with figures where the actual numbers are not accessible. Sometimes this is a very useful – indeed indispensable – approach, sometimes it is less helpful. In either case estimation should always be approached with caution and the findings ideally sense-checked in the way that I have tried to do above.

Occam’s razor would suggest that when the stats tell you something that seems incredible, then 99 times out of 100 there is an error or inaccurate assumption buried somewhere in the model. This applies when you are creating the model yourself and doubly so where you are relying upon figures calculated by other people. In the latter case not only is there the risk of their figures being inaccurate, there is the incremental risk that you interpret them wrongly, or stretch their broader application to breaking point. I was probably guilty of one or more of the above sins in my earlier articles. I’d like my probable misstep to serve as a warning to other people when they too look to leverage statistics in new ways.

A further point is the most advanced concepts I have applied in my calculations above are addition, subtraction, multiplication and division. If these basic operations – even in the hands of someone like me who is relatively familiar with them – can lead to the issues described above, just imagine what could result from the more complex mathematical techniques (e.g. ambition, distraction, uglification and derision) used by even entry-level data scientists. This perhaps suggests an apt aphorism: Caveat calculator!