The Chief Data Officer “Sweet Spot”

22 Dec 20166 Feb 2017 Peter James Thomas business analytics, chief data officer, data governance, data management, data quality Cast of Serenity, Firefly, IRM UK

I verbally “scribbled” something quite like the exhibit above recently in conversation with a longstanding professional associate. This was while we were discussing where the CDO role currently sat in some organisations and his or her span of responsibilities. We agreed that – at least in some cases – the role was defined sub-optimally with reference to the axes in my virtual diagram.

This discussion reminded me that I was overdue a piece commenting on November’s IRM(UK) CDO Executive Forum; the third in a sequence that I have covered in these pages ^{[1], [2]}. In previous CDO Exec Forum articles, I have focussed mainly on the content of the day’s discussions. Here I’m going to be more general and bring in themes from the parent event; IRM(UK) Enterprise Data / Business Intelligence 2016. However I will later return to a theme central to the Exec Forum itself; the one that is captured in the graphic at the head of this article.

As well as attending the CDO Forum, I was speaking at the umbrella event. The title of my talk was Data Management, Analytics, People: An Eternal Golden Braid ^[3].

The real book, whose title I had plagiarised, is Gödel, Escher and Bach, an Eternal Golden braid, by Pulitzer-winning American Author and doyen of 1970s pop-science books, Douglas R. Hofstadter ^[4]. This book, which I read in my youth, explores concepts in consciousness, both organic and machine-based, and their relation to recursion and self-reference. The author argued that these themes were major elements of the work of each of Austrian Mathematician Kurt Gödel (best known for his two incompleteness theorems), Dutch graphic artist Maurits Cornelis Escher (whose almost plausible, but nevertheless impossible buildings and constantly metamorphosing shapes adorn both art galleries and college dorms alike) and German composer Johann Sebastian Bach (revered for both the beauty and mathematical elegance of his pieces, particularly those for keyboard instruments). In an age where Machine Learning and other Artificial Intelligence techniques are moving into the mainstream – or at least on to our Smartphones – I’d recommend this book to anyone who has not had the pleasure of reading it.

In my talk, I didn’t get into anything as metaphysical as Hofstadter’s essays that intertwine patterns in Mathematics, Art and Music, but maybe some of the spirit of his book rubbed off on my much lesser musings. In any case, I felt that my session was well-received and one particular piece of post-presentation validation had me feeling rather like these guys for the rest of the day:

What happened was that a longstanding internet contact ^[5] sought me out and commended me on both my talk and the prescience of my July 2009 article, Is the time ripe for appointing a Chief Business Intelligence Officer? He argued convincingly that this foreshadowed the emergence of the Chief Data Officer. While it is an inconvenient truth that Visa International had a CDO eight years earlier than my article appeared, on re-reading it, I was forced to acknowledge that there was some truth in his assertion.

To return to the matter in hand, one point that I made during my talk was that Analytics and Data Management are two sides of the same coin and that both benefit from being part of the same unitary management structure. By this I mean each area reporting into an Executive who has a strong grasp of what they do, rather than to a general manager. More specifically, I would see Data Compliance work and Data Synthesis work each being the responsibility of a CDO who has experience in both areas.

It may seem that crafting and implementing data policies is a million miles from data visualisation and machine learning, but to anyone with a background in the field, they are much more strongly related. Indeed, if managed well (which is often the main issue), they should be mutually reinforcing. Thus an insightful model can support business decision-making, but its authors would generally be well-advised to point out any areas in which their work could be improved by better data quality. Efforts to achieve the latter then both improve the usefulness of the model and help make the case for further work on data remediation; a virtuous circle.

Here we get back to the vertical axis in my initial diagram. In many organisations, the CDO can find him or herself at the extremities. Particularly in Financial Services, an industry which has been exposed to more new regulation than many in recent years, it is not unusual for CDOs to have a Risk or Compliance background. While this is very helpful in areas such as Governance, it is less of an asset when looking to leverage data to drive commercial advantage.

Symmetrically, if a rookie CDO was a Data Scientist who then progressed to running teams of Data Scientists, they will have a wealth of detailed knowledge to fall back on when looking to guide business decisions, but less familiarity with the – sometimes apparently thankless, and generally very arduous – task of sorting out problems in data landscapes.

Despite this, it is not uncommon to see CDOs who have a background in just one of these two complementary areas. If this is the case, then the analytics expert will have to learn bureaucratic and programme skills as quickly as they can and the governance guru will need to expand their horizons to understand the basics of statistical modelling and the presentation of information in easily digestible formats. It is probably fair to say that the journey to the centre is somewhat perilous when either extremity is the starting point.

Let’s now think about the second and horizontal axis. In some organisations, a newly appointed CDO will be freshly emerged from the ranks of IT (in some they may still report to the CIO, though this is becoming more of an anomaly with each passing year). As someone whose heritage is in IT (though also from very early on with a commercial dimension) I understand that there are benefits to such a career path, not least an in-depth understanding of at least some of the technologies employed, or that need to be employed. However a technology master who is also a business neophyte is unlikely to set the world alight as a newly-minted CDO. Such people will need to acquire new skills, but the learning curve is steep.

To consider the other extreme of this axis, it is undeniable that a CDO organisation will need to undertake both technical and technological work (or at least to guide this in other departments). Therefore, while an in-depth understanding of a business, its products, markets, customers and competitors will be of great advantage to a new CDO, without at least a reasonable degree of technical knowledge, they may struggle to connect with some members of their team; they may not be able to immediately grasp what technology tasks are essential and which are not; and they may not be able to paint an accurate picture of what good looks like in the data arena. Once more rapid assimilation of new information and equally rapid acquisition of new skills will be called for.

At this point it will be pretty obvious that my central point here is that the “sweet spot” for a CDO, the place where they can have greatest impact on an organisation and deliver the greatest value, is at the centre point of both of these axes. When I was talking to my friend about this, we agreed that one of the reasons why not many CDOs sit precisely at this nexus is because there are few people with equal (or at least balanced) expertise in the business and technology fields; few people who understand both data synthesis and data compliance equally well; and vanishingly few who sit in the centre of both of these ranges.

Perhaps these facts would also have been apparent from revewing the CDO job description I posted back in November 2015 as part of Wanted – Chief Data Officer. However, as always, a picture paints a thousand words and I rather like the compass-like exhibit I have come up with. Hopefully it conveys a similar message more rapidly and more viscerally.

To bring things back to the IRM(UK) CDO Executive Forum, I felt that issues around where delegates sat on my CDO “sweet spot” diagram (or more pertinently where they felt that they should sit) were a sub-text to many of our discussions. It is worth recalling that the mainstream CDO is still an emergent role and a degree of confusion around what they do, how they do it and where they sit in organisations is inevitable. All CxO roles (with the possible exception of the CEO) have gone through similar journeys. It is probably instructive to contrast the duties of a Chief Risk Officer before 2008 with the nature and scope of their responsibilities now. It is my opinion that the CDO role (and individual CDOs) will travel an analogous path and eventually also settle down to a generally accepted set of accountabilities.

In the meantime, if your organisation is lucky enough to have hired one of the small band of people whose experience and expertise already place them in the CDO “sweet spot”, then you are indeed fortunate. If not, then not all is lost, but be prepared for your new CDO to do a lot of learning on the job before they too can join the rather exclusive club of fully rounded CDOs.

Epilogue

As an erstwhile Mathematician, I’ve never seen a framework that I didn’t want to generalise. It occurs to me and – I assume – will also occur to many readers that the North / South and East / West diagram I have created could be made even more compass-like by the addition of North East / South West and North West / South East axes, with our idealised CDO sitting in the middle of these spectra as well ^[6].

Readers can debate amongst themselves what the extremities of these other dimensions might be. I’ll suggest just a couple: “Change” and “Business as Usual”. Given how organisations seem to have evolved in recent years, it is often unfortunately a case of never the twain shall meet with these two areas. However a good CDO will need to be adept at both and, from personal experience, I would argue that mastery of one does not exclude mastery of the other.

Notes

^[1]	See each of: 5 Themes from a Chief Data Officer Forum 5 More Themes from a Chief Data Officer Forum and Themes from a Chief Data Officer Forum – the 180 day perspective
^[2]	The main reasons for delay were a house move and a succession of illnesses in my family – me included – so I’m going to give myself a pass.
^[3]	The sub-title was A Metaphorical Fugue On The Data ⇨ Information ⇨ Insight ⇨ Action Journey in The Spirt Of Douglas R. Hofstadter, which points to the inspiration behind my talk rather more explicity.
^[4]	Douglas R. Hofstadter is the son of Nobel-wining physicist Robert Hofstadter. Prize-winning clearly runs in the Hofstadter family, much as with the Braggs, Bohrs, Curies, Euler-Chelpins, Kornbergs, Siegbahns, Tinbergens and Thomsons.
^[5]	I am omitting any names or other references to save his blushes.
^[6]	I could have gone for three or four dimensional Cartesian coordinates as well I realise, but sometimes (very rarely it has to be said) you can have too much Mathematics.

Follow @peterjthomas

More Statistics and Medicine

8 Nov 20161 Jan 2017 Peter James Thomas Mathematics & Science, Statistics diagnostic tests, false positives, medical profession, risk assessment

Weighing Medicine in the balance

I wrote last on the intersection of these two disciplines back in March 2011 (Medical Malpractice). What has prompted me to return to the subject is some medical tests that I was offered recently. If the reader will forgive me, I won’t go into the medical details – and indeed have also obfuscated some of the figures I was quoted – but neither are that relevant to the point that I wanted to make. This point relates to how statistics are sometimes presented in medical situations and – more pertinently – the disconnect between how these may be interpreted by the man or woman in the street, as opposed to what is actually going on.

Rather than tie myself in knots, let’s assume that the test is for a horrible disease called PJT Syndrome ^[1]. Let’s further assume that I am told that the test on offer has an accuracy of 80% ^[2]. This in and of itself is a potentially confusing figure. Does the test fail to detect the presence of PJT Syndrome 20% of the time, or does it instead erroneously detect PJT Syndrome, when the patient is actually perfectly healthy, 20% of the time? In this case, after an enquiry, I was told that a negative result was a negative result, but that a positive one did not always mean that the subject suffered from PJT Syndrome; so the issue is confined to false positives, not false negatives. This definition of 80% accuracy is at least a little clearer.

So what is a reasonable person to deduce from the 80% figure? Probably that if they test positive, that there is an 80% certainty that they have PJT Syndrome. I think that my visceral reaction would probably be along those lines. However, such a conclusion can be incorrect, particularly where the incidence of PJT Syndrome is low in a population. I’ll try to explain why.

If we know that PJT Syndrome occurs in 1 in every 100 people on average, what does this mean for the relevance of our test results? Let’s take a graphical look at a wholly representative population of exactly 100 people. The PJT Syndrome sufferer appears in red at the bottom right.

1 in 100

Now what is the result of the 80% accuracy of our test, remembering that this means that 20% of people taking it will be falsely diagnosed as having PJT Syndrome? Well 20% of 100 is – applying a complex algorithm – approximately 20 people. Let’s flag these up on our population schematic in grey.

20 in 100

So 20 people have the wrong diagnosis. One is correctly identified as having PJT Syndrome and 79 are correctly identified as not having PJT Syndrome; so a total of 80 have the right diagnosis.

What does this mean for those 21 people who have been unfortunate enough to test positive for PJT Syndrome (the one person coloured red and the 20 coloured grey)? Well only one of them actually has the malady. So, if I test positive, my chances of actually having PJT Syndrome are not 80% as we originally thought, but instead 1 in 21 or 4.76%. So my risk is still low having tested positive. It is higher than the risk in the general population, which is 1 in 100, or 1%, but not much more so.

The problem arises if having a condition is rare (here 1 in 100) and the accuracy of a test is low (here it is wrong for 20% of people taking it). If you consider that the condition that I was being offered a test for actually has an incidence of around 1 in 20,000 people, then with an 80% accurate test we would get the following:

In a population of 20,000 one 1 person has the condition
In the same population a test with our 80% accuracy means that 20% of people will test positive for it when they are perfectly healthy, this amounts to 4,000 people
So in total, 4,001 people will test positive, 1 correctly, 4,000 erroneously
Which means that a positive test tells me my odds of having the condition being tested for are 1 in 4,001, or 0.025%; still a pretty unlikely event

Low accuracy tests and rare conditions are a very bad combination. As well as causing people unnecessary distress, the real problem is where the diagnosis leads potential suffers to take actions (e.g. undergoing further diagnosis, which could be invasive, or even embarking on a course of treatment) which may themselves have the potential to cause injury to the patient.

I am not of course suggesting that people ignore medical advice, but Doctors are experts in medicine and not statistics. When deciding what course of action to take in a situation similar to one I recently experienced, taking the time to more accurately assess risks and benefits is extremely important. Humans are well known to overestimate some risks (and underestimate others), there are circumstances when crunching the numbers and seeing what they tell you is not only a good idea, it can help to safeguard your health.

For what it’s worth, I opted out of these particular tests.

Notes

^[1]	A terrible condition which renders sufferers unable to express any thought in under 1,000 words.
^[2]	Not the actual figure quoted, but close to it.

Follow @peterjthomas

Curiouser and Curiouser – The Limits of Brexit Voting Analysis

12 Jul 201614 Jan 2017 Peter James Thomas data quality, data science, infographics, Statistics alice in wonderland, ashcroft polling, omnium polling, through the looking glass and what alice found there

An original illustration from Charles Lutwidge Dodgson's seminal work would have been better, but sadly none such seems to be extant

Down the Rabbit-hole

When I posted my Brexit infographic reflecting the age of voters an obvious extension was to add an indication of the number of people in each age bracket who did not vote as well as those who did. This seemed a relatively straightforward task, but actually proved to be rather troublesome (this may be an example of British understatement). Maybe the caution I gave about statistical methods having a large impact on statistical outcomes in An Inconvenient Truth should have led me to expect such issues. In any case, I thought that it would be instructive to talk about the problems I stumbled across and to – once again – emphasise the perils of over-extending statistical models.

Brexit ages infographic — Click to download a larger PDF version in a new window.

Regular readers will recall that my Brexit Infographic (reproduced above) leveraged data from an earlier article, A Tale of two [Brexit] Data Visualisations. As cited in this article, the numbers used were from two sources:

The UK Electoral Commission – I got the overall voting numbers from here.
Lord Ashcroft’s Poling organisation – I got the estimated distribution of votes by age group from here.

In the notes section of A Tale of two [Brexit] Data Visualisations I [prophetically] stated that the breakdown of voting by age group was just an estimate. Based on what I have discovered since, I’m rather glad that I made this caveat explicit.

The Pool of Tears

In order to work out the number of people in each age bracket who did not vote, an obvious starting point would be the overall electorate, which the UK Electoral Commission stated as being 46,500,001. As we know that 33,551,983 people voted (an actual figure rather than an estimate), then this is where the turnout percentage of 72.2% (actually 72.1548%) came from (33,551,983 / 45,500,001).

A clarifying note, the electorate figures above refer to people who are eligible to vote. Specifically, in order to vote in the UK Referendum, people had to meet the following eligibility criteria (again drawn from the UK Electoral Commission):

To be eligible to vote in the EU Referendum, you must be:

A British or Irish citizen living in the UK, or

A Commonwealth citizen living in the UK who has leave to remain in the UK or who does not require leave to remain in the UK, or

A British citizen living overseas who has been registered to vote in the UK in the last 15 years, or

An Irish citizen living overseas who was born in Northern Ireland and who has been registered to vote in Northern Ireland in the last 15 years.

EU citizens are not eligible to vote in the EU Referendum unless they also meet the eligibility criteria above.

So far, so simple. The next thing I needed to know was how the electorate was split by age. This is where we begin to run into problems. One place to start is the actual population of the UK as at the last census (2011). This is as follows:

Ages (years)	Population	% of total
0–4	3,914,000	6.2
5–9	3,517,000	5.6
10–14	3,670,000	5.8
15–19	3,997,000	6.3
20–24	4,297,000	6.8
25–29	4,307,000	6.8
30–34	4,126,000	6.5
35–39	4,194,000	6.6
40–44	4,626,000	7.3
45–49	4,643,000	7.3
50–54	4,095,000	6.5
55–59	3,614,000	5.7
60–64	3,807,000	6.0
65–69	3,017,000	4.8
70–74	2,463,000	3.9
75–79	2,006,000	3.2
80–84	1,496,000	2.4
85–89	918,000	1.5
90+	476,000	0.8
Total	63,183,000	100.0

If I roll up the above figures to create the same age groups as in the Ashcroft analysis (something that requires splitting the 15-19 range, which I have assumed can be done uniformly), I get:

Ages (years)	Population	% of total
0-17	13,499,200	21.4
18-24	5,895,800	9.3
25-34	8,433,000	13.3
35-44	8,820,000	14.0
45-54	8,738,000	13.8
55-64	7,421,000	11.7
65+	10,376,000	16.4
Total	63,183,000	100.0

The UK Government isn’t interested in the views of people under 18^{[citation needed]}, so eliminating this row we get:

Ages (years)	Population	% of total
18-24	5,895,800	11.9
25-34	8,433,000	17.0
35-44	8,820,000	17.8
45-54	8,738,000	17.6
55-64	7,421,000	14.9
65+	10,376,000	20.9
Total	49,683,800	100.0

As mentioned, the above figures are from 2011 and the UK population has grown since then. Web-site WorldOMeters offers an extrapolated population of 65,124,383 for the UK in 2016 (this is as at 12th July 2016; if extrapolation and estimates make you queasy, I’d suggest closing this article now!). I’m going to use a rounder figure of 65,125,000 people; there is no point pretending that precision exists where it clearly doesn’t. Making the assumption that such growth is uniform across all age groups (please refer to my previous bracketed comment!), then the above exhibit can also be extrapolated to give us:

Ages (years)	Population	% of total
18-24	6,077,014	11.9
25-34	8,692,198	17.0
35-44	9,091,093	17.8
45-54	9,006,572	17.6
55-64	7,649,093	14.9
65+	10,694,918	20.9
Total	51,210,887	100.0

Looking Glass House

So our – somewhat fabricated – figure for the 18+ UK population in 2016 is 51,210,887, let’s just call this 51,200,000. As at the beginning of this article the electorate for the 2016 UK Referendum was 45,500,000 (dropping off the 1 person with apologies to him or her). The difference is explicable based on the eligibility criteria quoted above. I now have a rough age group break down of the 51.2 million population, how best to apply this to the 45.5 million electorate?

I’ll park this question for the moment and instead look to calculate a different figure. Based on the Ashcroft model, what percentage of the UK population (i.e. the 51.2 million) voted in each age group? We can work this one out without many complications as follows:

Ages (years)	Population (A)	Voted (B)	Turnout % (B/A)
18-24	6,077,014	1,701,067	28.0
25-34	8,692,198	4,319,136	49.7
35-44	9,091,093	5,656,658	62.2
45-54	9,006,572	6,535,678	72.6
55-64	7,649,093	7,251,916	94.8
65+	10,694,918	8,087,528	75.6
Total	51,210,887	33,551,983	65.5

(B) = Size of each age group in the Ashcroft sample as a percentage multiplied by the total number of people voting (see A Tale of two [Brexit] Data Visualisations).

Remember here that actual turnout figures have electorate as the denominator, not population. As the electorate is less than the population, this means that all of the turnout percentages should actually be higher than the ones calculated (e.g. the overall turnout with respect to electorate is 72.2% whereas my calculated turnout with respect to population is 65.5%). So given this, how to explain the 94.8% turnout of 55-64 year olds? To be sure this group does reliably turn out to vote, but did essentially all of them (remembering that the figures in the above table are too low) really vote in the referendum? This seems less than credible.

The turnout for 55-64 year olds in the 2015 General Election has been estimated at 77%, based on an overall turnout of 66.1% (web-site UK Political Info; once more these figures will have been created based on techniques similar to the ones I am using here). If we assume a uniform uplift across age ranges (that “assume” word again!) then one might deduce that an increase in overall turnout from 66.1% to 72.2%, might lead to the turnout in the 55-64 age bracket increasing from 77% to 84%. 84% turnout is still very high, but it is at least feasible; close to 100% turnout in from this age group seems beyond the realms of likelihood.

So what has gone wrong? Well so far the only culprit I can think of is the distribution of voting by age group in the Ashcroft poll. To be clear here, I’m not accusing Lord Ashcroft and his team of sloppy work. Instead I’m calling out that the way that I have extrapolated their figures may not be sustainable. Indeed, if my extrapolation is valid, this would imply that the Ashcroft model over estimated the proportion of 55-64 year olds voting. Thus it must have underestimated the proportion of voters in some other age group. Putting aside the likely fact that I have probably used their figures in an unintended manner, could it be that the much-maligned turnout of younger people has been misrepresented?

To test the validity of this hypothesis, I turned to a later poll by Omnium. To be sure this was based on a sample size of around 2,000 as opposed to Ashcroft’s 12,000, but it does paint a significantly different picture. Their distribution of voter turnout by age group was as follows:

Ages (years)	Turnout %
18-24	64
25-39	65
40-54	66
55-64	74
65+	90

I have to say that the Omnium age groups are a bit idiosyncratic, so I have taken advantage of the fact that the figures for 25-54 are essentially the same to create a schedule that matches the Ashcroft groups as follows:

Ages (years)	Turnout %
18-24	64
25-34	65
35-44	65
45-54	65
55-64	74
65+	90

The Omnium model suggests that younger voters may have turned out in greater numbers than might be thought based on the Ashcroft data. In turn this would suggest that a much greater percentage of 18-24 year olds turned out for the Referendum (64%) than for the last General Election (43%); contrast this with an estimated 18-24 turnout figure of 47% based on the just increase in turnout between the General Election and the Referendum. The Omnium estimates do still however recognise that turnout was still greater in the 55+ brackets, which supports the pattern seen in other elections.

Humpty Dumpty

While it may well be that the Leave / Remain splits based on the Ashcroft figures are reasonable, I’m less convinced that extrapolating these same figures to make claims about actual voting numbers by age group (as I have done) is tenable. Perhaps it would be better to view each age cohort as a mini sample to be treated independently. Based on the analysis above, I doubt that the turnout figures I have extrapolated from the Ashcroft breakdown by age group are robust. However, that is not the same as saying that the Ashcroft data is flawed, or that the Omnium figures are correct. Indeed the Omnium data (at least those elements published on their web-site) don’t include an analysis of whether the people in their sample voted Leave or Remain, so direct comparison is not going to be possible. Performing calculation gymnastics such as using the Omnium turnout for each age group in combination with the Ashcroft voting splits for Leave and Remain for the same age groups actually leads to a rather different Referendum result, so I’m not going to plunge further down this particular rabbit hole.

In summary, my supposedly simple trip to the destitution of an enhanced Brexit Infographic has proved unexpectedly arduous, winding and beset by troubles. These challenges have proved so great that I’ve abandoned the journey and will be instead heading for home.

Which dreamed it?

Based on my work so far, I have severe doubts about the accuracy of some of the age-based exhibits I have published (versions of which have also appeared on many web-sites, the BBC to offer just one example, scroll down to “How different age groups voted” and note that the percentages cited reconcile to mine). I believe that my logic and calculations are sound, but it seems that I am making too many assumptions about how I can leverage the Ashcroft data. After posting this article, I will accordingly go back and annotate each of my previous posts and link them to these later findings.

I think the broader lesson to be learnt is that estimates are just that, attempts (normally well-intentioned of course) to come up with figures where the actual numbers are not accessible. Sometimes this is a very useful – indeed indispensable – approach, sometimes it is less helpful. In either case estimation should always be approached with caution and the findings ideally sense-checked in the way that I have tried to do above.

Occam’s razor would suggest that when the stats tell you something that seems incredible, then 99 times out of 100 there is an error or inaccurate assumption buried somewhere in the model. This applies when you are creating the model yourself and doubly so where you are relying upon figures calculated by other people. In the latter case not only is there the risk of their figures being inaccurate, there is the incremental risk that you interpret them wrongly, or stretch their broader application to breaking point. I was probably guilty of one or more of the above sins in my earlier articles. I’d like my probable misstep to serve as a warning to other people when they too look to leverage statistics in new ways.

A further point is the most advanced concepts I have applied in my calculations above are addition, subtraction, multiplication and division. If these basic operations – even in the hands of someone like me who is relatively familiar with them – can lead to the issues described above, just imagine what could result from the more complex mathematical techniques (e.g. ambition, distraction, uglification and derision) used by even entry-level data scientists. This perhaps suggests an apt aphorism: Caveat calculator!

Beware the Jabberwock, my son! // The jaws that bite, the claws that catch! // Beware the Jubjub bird, and shun // The frumious Bandersnatch!

Follow @peterjthomas

How Age was a Critical Factor in Brexit

7 Jul 201614 Jan 2017 Peter James Thomas data visualisation, infographics, Statistics Brexit

In my last article, I looked at a couple of ways to visualise the outcome of the recent UK Referendum on Europen Union membership. There I was looking at how different visual representations highlight different attributes of data.

I’ve had a lot of positive feedback about my previous Brexit exhibits and I thought that I’d capture the zeitgeist by offering a further visual perspective, perhaps one more youthful than the venerable pie chart; namely an infographic. My attempt to produce one of these appears above and a full-size PDF version is also just a click away.

For caveats on the provenance of the data, please also see the previous article’s notes section.

Addendum

I have leveraged age group distributions from the Ascroft Polling organisation to create this exhibits. Other sites – notably the BBC – have done the same and my figures reconcile to the interpretations in other places. However, based on further analysis, I have some reason to think that either there are issues with the Ashcroft data, or that I have leveraged it in ways that the people who compiled it did not intend. Either way, the Ashcroft numbers lead to the conclusion that close to 100% of 55-64 year olds voted in the UK Referendum, which seems very, very unlikely. I have contacted the Ashcroft Polling organisation about this and will post any reply that I receive.

– Peter James Thomas, 14^th July 2016

Follow @peterjthomas

A Tale of Two [Brexit] Data Visualisations

6 Jul 201614 Jul 2016 Peter James Thomas data visualisation, Statistics bar chart, Brexit, pie chart

I’m continuing with the politics and data visualisation theme established in my last post. However, I’ll state up front that this is not a political article. I have assiduously stayed silent [on this blog at least] on the topic of my country’s future direction, both in the lead up to the 23rd June poll and in its aftermath. Instead, I’m going to restrict myself to making a point about data visualisation; both how it can inform and how it can mislead.

Brexit Bar — UK Referendum on EU Membership – Percentage voting by age bracket (see notes)

The exhibit above is my version of one that has appeared in various publications post referendum, both on-line and print. As is referenced, its two primary sources are the UK Electoral Commission and Lord Ashcroft’s polling organisation. The reason why there are two sources rather than one is explained in the notes section below.

With the caveats explained below, the above chart shows the generational divide apparent in the UK Referendum results. Those under 35 years old voted heavily for the UK to remain in the EU; those with ages between 35 and 44 voted to stay in pretty much exactly the proportion that the country as a whole voted to leave; and those over 45 years old voted increasingly heavily to leave as their years advanced.

One thing which is helpful about this exhibit is that it shows in what proportion each cohort voted. This means that the type of inferences I made in the previous paragraph leap off the page. It is pretty clear (visually) that there is a massive difference between how those aged 18-24 and those aged 65+ thought about the question in front of them in the polling booth. However, while the percentage based approach illuminates some things, it masks others. A cursory examination of the chart above might lead one to ask – based on the area covered by red rectangles – how it was that the Leave camp prevailed? To pursue an answer to this question, let’s consider the data with a slightly tweaked version of the same visualisation as below:

Brexit Bar 2 — UK Referendum on EU Membership – Numbers voting by age bracket (see notes)

[Aside: The eagle-eyed amongst you may notice a discrepancy between the figures shown on the total bars above and the actual votes cast, which were respectively: Remain: 16,141k and Leave: 17,411k. Again see the notes section for an explanation of this.]

A shift from percentages to actual votes recorded casts some light on the overall picture. It now becomes clear that, while a large majority of 18-24 year olds voted to Remain, not many people in this category actually voted. Indeed while, according to the 2011 UK Census, the 18-24 year category makes up just under 12% of all people over 18 years old (not all of whom would necessarily be either eligible or registered to vote) the Ashcroft figures suggest that well under half of this group cast their ballot, compared to much higher turnouts for older voters (once more see the notes section for caveats).

This observation rather blunts the assertion that the old voted in ways that potentially disadvantaged the young; the young had every opportunity to make their voice heard more clearly, but didn’t take it. Reasons for this youthful disengagement from the political process are of course beyond the scope of this article.

However it is still hard (at least for the author’s eyes) to get the full picture from the second chart. In order to get a more visceral feeling for the dynamics of the vote, I have turned to the much maligned pie chart. I also chose to use the even less loved “exploded” version of this.

Brexit Flag — UK Referendum on EU Membership – Number voting by age bracket (see notes)

Here the weight of both the 65+ and 55+ Leave vote stands out as does the paucity of the overall 18-24 contribution; the only two pie slices too small to accommodate an internal data label. This exhibit immediately shows where the referendum was won and lost in a way that is not as easy to glean from a bar chart.

While I selected an exploded pie chart primarily for reasons of clarity, perhaps the fact that the resulting final exhibit brings to mind a shattered and reassembled Union Flag was also an artistic choice. Unfortunately, it seems that this resemblance has a high likelihood of proving all too prophetic in the coming months and years.

Addendum

I have leveraged age group distributions from the Ascroft Polling organisation to create these exhibits. Other sites – notably the BBC – have done the same and my figures reconcile to the interpretations in other places. However, based on further analysis, I have some reason to think that either there are issues with the Ashcroft data, or that I have leveraged it in ways that the people who compiled it did not intend. Either way, the Ashcroft numbers lead to the conclusion that close to 100% of 55-64 year olds voted in the UK Referendum, which seems very, very unlikely. I have contacted the Ashcroft Polling organisation about this and will post any reply that I receive.

– Peter James Thomas, 14^th July 2016

Notes

Caveat: I am neither a professional political pollster, nor a statistician. Instead I’m a Pure Mathematician, with a basic understanding of some elements of both these areas. For this reason, the following commentary may not be 100% rigorous; however my hope is that it is nevertheless informative.

In the wake of the UK Referendum on EU membership, a lot of attempts were made to explain the result. Several of these used splits of the vote by demographic attributes to buttress the arguments that they were making. All of the exhibits in this article use age bands, one type of demographic indicator. Analyses posted elsewhere looked at things like the influence of the UK’s social grade classifications (A, B, C1 etc.) on voting patterns, the number of immigrants in a given part of the country, the relative prosperity of different areas and how this has changed over time. Other typical demographic dimensions might include gender, educational achievement or ethnicity.

However, no demographic information was captured as part of the UK referendum process. There is no central system which takes a unique voting ID and allocates attributes to it, allowing demographic dicing and slicing (to be sure a partial and optional version of this is carried out when people leave polling stations after a General Election, but this was not done during the recent referendum).

So, how do so many demographic analyses suddenly appear? To offer some sort of answer here, I’ll take you through how I built the data set behind the exhibits in this article. At the beginning I mentioned that I relied on two data sources, the actual election results published by the UK Electoral Commission and the results of polling carried out by Lord Ashcroft’s organisation. The latter covered interviews with 12,369 people selected to match what was anticipated to be the demographic characteristics of the actual people voting. As with most statistical work, properly selecting a sample with no inherent biases (e.g. one with the same proportion of people who are 65 years or older as in the wider electorate) is generally the key to accuracy of outcome.

Importantly demographic information is known about the sample (which may also be reweighted based on interview feedback) and it is by assuming that what holds true for the sample also holds true for the electorate that my charts are created. So if X% of 18-24 year olds in the sample voted Remain, the assumption is that X% of the total number of 18-24 year olds that voted will have done the same.

12,000 plus is a good sample size for this type of exercise and I have no reason to believe that Lord Ashcroft’s people were anything other than professional in selecting the sample members and adjusting their models accordingly. However this is not the same as having definitive information about everyone who voted. So every exhibit you see relating to the age of referendum voters, or their gender, or social classification is based on estimates. This is a fact that seldom seems to be emphasised by news organisations.

The size of Lord Ashchoft’s sample also explains why the total figures for Leave and Remain on my second exhibit are different to the voting numbers. This is because 5,949 / 12,369 = 48.096% (looking at the sample figures for Remain) whereas 16,141,241 / 33,551,983 = 48.108% (looking at the actual voting figures for Remain). Both figures round to 48.1%, but the small difference in the decimal expansions, when applied to 33 million people, yields a slightly different result.

Follow @peterjthomas

Showing uncertainty in a Data Visualisation

1 Jul 20161 Jul 2016 Peter James Thomas data visualisation Fivethirtyeight, US Election

My attention was drawn to the above exhibit by a colleague. It is from the FiveThirtyEight web-site and one of several exhibits included in an analysis of the standing of the two US Presidential hopefuls.

In my earlier piece, Data Visualisation – A Scientific Treatment, I argued for more transparency in showing the inherent variability associated with the numbers spat out by statistical models. My specific call back then was for the use of error bars.

The FiveThirtyEight exhibit deals with this same challenge in a manner which I find elegant, clean and eminently digestible. It contains many different elements of information, but remains an exhibit whose meaning is easy to absorb. It’s an approach I will probably look to leverage myself next time I have a similar need.

Follow @peterjthomas

Themes from a Chief Data Officer Forum – the 180 day perspective

20 May 20164 Feb 2017 Peter James Thomas big data, business, change management, chief data officer, cultural transformation, data governance, data management, education, strategy dama, IRM UK

The author would like to acknowledge the input and assistance of his fellow delegates, both initially at the IRM(UK) CDO Executive Forum itself and later in reviewing earlier drafts of this article. As ever, responsibility for any errors or omissions remains mine alone.

Introduction

Time flies as Virgil observed some 2,045 years ago. A rather shorter six months back I attended the inaugural IRM(UK) Chief Data Officer Executive Forum and recently I returned for the second of what looks like becoming biannual meetings. Last time the umbrella event was the IRM(UK) Enterprise Data and Business Intelligence Conference 2015 ^[1], this session was part of the companion conference: IRM(UK) Master Data Management Summit / and Data Governance Conference 2016.

This article looks to highlight some of the areas that were covered in the forum, but does not attempt to be exhaustive, instead offering an impressionistic view of the meeting. One reason for this (as well as the author’s temperament) is that – as previously – in order to allow free exchange of ideas, the details of the meeting are intended to stay within the confines of the room.

Last November, ten themes emerged from the discussions and I attempted to capture these over two articles. The headlines appear in the box below:

Themes from the previous Forum:

One area of interest for me was how things had moved on in the intervening months and I’ll look to comment on this later.

By way of background, some of the attendees were shared with the November 2015 meeting, but there was also a smattering of new faces, including the moderator, Peter Campbell, President of DAMA’s Belgium and Luxembourg chapter. Sectors represented included: Distribution, Extractives, Financial Services, and Governmental.

The discussions were wide ranging and perhaps less structured than in November’s meeting, maybe a facet of the familiarity established between some delegates at the previous session. However, there were four broad topics which the attendees spent time on: Management of Change (Theme 5); Data Privacy / Trust; Innovation; and Value / Business Outcomes.

While clearly the second item on this list has its genesis in the European Commission’s recently adopted General Data Protection Regulation (GDPR ^[2]), it is interesting to note that the other topics suggest that some elements of the CDO agenda appear to have shifted in the last six months. At the time of the last meeting, much of what the group talked about was foundational or even theoretical. This time round there was both more of a practical slant to the conversation, “how do we get things done?” and a focus on the future, “how do we innovate in this space?”

Perhaps this also reflects that while CDO 1.0s focussed on remedying issues with data landscapes and thus had a strong risk mitigation flavour to their work, CDO 2.0s are starting to look more at value-add and delivering insight (Theme 6). Of course some organisations are yet to embark on any sort of data-related journey (CDO 0.0 maybe), but in the more enlightened ones at least, the CDO’s focus is maybe changing, or has already changed (Theme 3).

Some flavour of the discussions around each of the above topics is provided below, but as mentioned above, these observations are both brief and impressionistic:

Management of Change

The title of Managing Change has been chosen (by the author) to avoid any connotations of Change Management. It was recognised by the group that there are two related issues here. The first is the organisational and behavioural change needed to both ensure that data is fit-for-purpose and that people embrace a more numerical approach to decision-making; perhaps this area is better described as Cultural Transformation. The second is the fact (also alluded to at the previous forum) that Change Programmes tend to have the effect of degrading data assets over time, especially where monetary or time factors lead data-centric aspects of project to be de-scoped.

On Cultural Transformation, amongst a number of issues discussed, the need to answer the question “What’s in it for me?” stood out. This encapsulates the human aspect of driving change, the need to engage with stakeholders ^[3] (at all levels) and the importance of sound communication of what is being done in the data space and – more importantly – why. These are questions to which an entire sub-section of this blog is devoted.

On the potentially deleterious impact of Change ^[4] on data landscapes, it was noted that whatever CDOs build, be these technological artefacts or data-centric processes, they must be designed to be resilient in the face of both change and Change.

Data Privacy / Trust

As referenced above, the genesis of this topic was GDPR. However, it was interesting that the debate extended from this admittedly important area into more positive territory. This related to the observation that the care with which an organisation treats its customers’ or business partners’ data (and the level of trust which this generates) can potentially become a differentiator or even a source of competitive advantage. It is good to report an essentially regulatory requirement possibly morphing into a more value-added set of activities.

Innovation

It might be expected that discussions around this topic would focus on perennials such as Big Data or Advanced Analytics. Instead the conversation was around other areas, such as distributed / virtualised data and the potential impact of Block Chain technology ^[5] on Data Management work. Inevitably The Internet of Things ^[6] also featured, together with the ethical issues that this can raise. Other areas discussed were as diverse as the gamification of Data Governance and Social Physics, so we cast the net widely.

Value / Business Outcomes

Here we have the strongest link back into the original ten themes (specifically Theme 6). Of course the acme of data strategies is of little use if it does not deliver positive business outcomes. In many organisations, focus on just remediating issues with the current data landscape could consume a massive chunk of overall Change / IT expenditure. This is because data issues generally emanate from a wide variety of often linked and frequently long-standing organisational weaknesses. These can be architectural, integrational, procedural, operational or educational in nature. One of the challenges for CDOs everywhere is how to parcel up their work in a way that adds value, gets things done and is accretive to both the overall Business and Data strategies (which are of course intimately linked as per Theme 10). There is also the need to balance foundational work with more tactical efforts; the former is necessary for lasting benefits to be secured, but the latter can showcase the value of Data Management and thus support further focus on the area.

While the risk aspect of data issues gets a foot in the door of the Executive Suite, it is only by demonstrating commercial awareness and linking Data Management work to increased business value that any CDO is ever going to get traction. (Theme 6).

The next IRM(UK) CDO Executive Forum will take place on 9th November 2016 in London – if you would like to apply for a place please e-mail jeremy.hall@irmuk.co.uk.

Notes

^[1]	I’ll be speaking at IRM(UK) ED&BI 2016 in November. Book early to avoid disappointment!
^[2]	Wikipedia offers a digestible summary of the regulation here. Anyone tempted to think this is either a parochial or arcane area is encouraged to calculate what the greater of €20 million and 4% of their organisation’s worldwide turnover might be and then to consider that the scope of the Regulation covers any company (regardless of its domicile) that processes the data of EU residents.
^[3]	I’ve been itching to use this classic example of stakeholder management for some time:
^[4]	The capital “c” is intentional.
^[5]	Harvard Business Review has an interesting and provocative article on the subject of Block Chain technology.
^[6]	GIYF

Follow @peterjthomas

Data Management as part of the Data to Action Journey

24 Dec 20156 Feb 2017 Peter James Thomas business intelligence, chief data officer, data governance, data management, data quality, infographics peter aiken

Data Information Insight Action (w700)

| Larger Version | Detailed and Annotated Version (as PDF) |

This brief article is actually the summation of considerable thought and reflects many elements that I covered in my last two pieces (5 Themes from a Chief Data Officer Forum and 5 More Themes from a Chief Data Officer Forum), in particular both the triangle I used as my previous Data Management visualisation and Peter Aiken’s original version, which he kindly allowed me to reproduce on this site (see here for more information about Peter).

What I began to think about was that both of these earlier exhibits (and indeed many that I have seen pertaining to Data Management and Data Governance) suggest that the discipline forms a solid foundation upon which other areas are built. While there is a lot of truth in this view, I have come round to thinking that Data Management may alternatively be thought of as actively taking part in a more dynamic process; specifically the same iterative journey from Data to Information to Insight to Action and back to Data again that I have referenced here several times before. I have looked to combine both the static, foundational elements of Data Management and the dynamic, process-centric ones in the diagram presented at the top of this article; a more detailed and annotated version of which is available to download as a PDF via the link above.

I have also introduced the alternative path from Data to Insight; the one that passes through Statistical Analysis. Data Management is equally critical to the success of this type of approach. I believe that the schematic suggests some of the fluidity that is a major part of effective Data Management in my experience. I also hope that the exhibit supports my assertion that Data Management is not an end in itself, but instead needs to be considered in terms of the outputs that it helps to generate. Pristine data is of little use to an organisation if it is not then exploited to form insights and drive actions. As ever, this need to drive action necessitates a focus on cultural transformation, an area that is covered in many other parts of this site.

This diagram also calls to mind the subject of where and how the roles of Chief Analytics Officer and Chief Data Officer intersect and whether indeed these should be separate roles at all. These are questions to which – as promised on several previous occasions – I will return to in future articles. For now, maybe my schematic can give some data and information practitioners a different way to view their craft and the contributions that it can make to organisational success.

Follow @peterjthomas

5 More Themes from a Chief Data Officer Forum

17 Nov 20154 Feb 2017 Peter James Thomas big data, business analytics, business intelligence, chief data officer, data governance, data management, data science big data, dama, IRM UK, peter aiken

A rather famous theme

This article is the second of two pieces reflecting on the emerging role of the Chief Data Officer. Each article covers 5 themes. You can read the first five themes here.

As with the first article, I would like to thank both Peter Aiken, who reviewed a first draft of this piece and provided useful clarifications and additional insights, and several of my fellow delegates, who also made helpful suggestions around the text. Again any errors of course remain my responsibility.

Introduction Redux

After reviewing a draft of the first article in this series and also scanning an outline of this piece, one of the other attendees at the inaugural IRM(UK) / DAMA CDO Executive Forum rightly highlighted that I had not really emphasised the strategic aspects of the CDO’s work; both data / information strategy and the close linkage to business strategy. I think the reason for this is that I spend so much of my time on strategic work that I’ve internalised the area. However, I’ve come to the not unreasonable conclusion that internalisation doesn’t work so well on a blog, so I will call out this area up-front (as well as touching on it again in Theme 10 below).

For more of my views on strategy formation in the data / information space please see my trilogy of articles starting with: Forming an Information Strategy: Part I – General Strategy.

With that said, I’ll pick up where we left off with the themes that arose in the meeting:

Theme 6 – While some CDO roles have their genesis in risk mitigation, most are focussed on growth

Epidermal growth factor receptor

This theme gets to the CDO / CAO debate (which I will be writing about soon). It is true that the often poor state of data governance in organisations is one reason why the CDO role has emerged and also that a lot of CDO focus is inevitably on this area. The regulatory hurdles faced by many industries (e.g. Solvency II in my current area of Insurance) also bring a significant focus on compliance to the CDO role. However, in the unanimous view of the delegates, while cleaning the Augean Stables is important and equally organisations which fail to comply with regulatory requirements tend to have poor prospects, most CDOs have a growth-focussed agenda. Their primary objective is to leverage data (or to facilitate its leverage) to drive growth and open up new opportunities. Of course good data management is a prerequisite for achieving this objective in a sustainable manner, but it is not an end in itself. Any CDO who allows themself to be overwhelmed by what should just be part of their role is probably heading in the same direction as a non-compliant company.

Theme 7 – New paradigms are data / analytics-centric not application-centric

Applications & Data

Historically, technology landscapes used to be application-centric. Often there would be a cluster of systems in the centre (ideally integrated with each other in some way) and each with their own analytics capabilities; a CRM system with customer analytics “out-of-the-box” (whatever that really means in practice), an ERP system with finance analytics and maybe supply-chain analytics, digital estates with web analytics and so on. Even if there was a single-central system (those of us old enough will still remember the ERP vision), then this would tend to have various analytical repositories around it used by different parts of the organisation for different purposes. Equally some of the enterprise data warehouses I have built have included specialist analytical repositories, e.g. to support pricing, or risk, or other areas.

Today a new paradigm is emerging. Under this, rather than being at the periphery, data and analytics are in the centre, operating in a more joined-up manner. Many companies have already banked the automation and standardisation benefits of technology and are now looking instead to exploit the (often considerably larger) information and insight benefits ^[1]. This places information and insight assets at the centre of the landscape. It also means that finally information needs can start to drive system design and selection, not the other way round.

Theme 8 – Data and Information need to be managed together

Data and Information in harness

We see a further parallel with the CAO vs CDO debate here ^[2]. After 27 years with at least one foot in IT (though often in hybrid roles with dual business / IT reporting) and 15 explicitly in the data and information space, I really fail to see how data and information are anything other than two sides of the same coin.

To people who say that the CAO is the one who really understands the business and the CDO worries instead about back-end data governance, I would reply that an engine is only as good as the fuel that you put into it. I’d over-extend the analogy (as is my wont ^[3]) by saying that the best engineers will have a thorough understanding of:

what purpose the engine will be applied to – racing car, or lorry (truck)
the parameters within which it is required to perform
the actual performance requirements
what that means in terms of designing the engine
what inputs the engine will have: petrol/diesel/bio-fuel/electricity
what outputs it will produce (with no reference to poor old Volkswagen intended)

It may be that the engineering team has experts in various areas from metallurgy, to electronics, to chemistry, to machining, to quality control, to noise and vibration suppression, to safety, to general materials science and that these are required to work together. But whoever is in charge of overall design, and indeed overall production, would need to have knowledge spanning all these areas and would in addition need to ensure that specialists under their supervision worked harmoniously together to get the best result.

Data is the basic building block of information. Information is the embodiment of things that people want or need to know. You cannot generate information (let alone insight) without a very strong understanding of data. You can neither govern, nor exploit, data in any useful way without knowledge of the uses to which it will be put. Like the chief product engineer, there is a need for someone who understands all of the elements, all of the experts working on these and can bring them together just as harmoniously ^[4]).

Theme 9 – Data Science is not enough

If you don't understand the notation, you've failed in your application to be a Data Scientist

In Part One of this article I repeated an assertion about the typical productivity of data scientists:

“Data Scientists are only 10-20% productive; if you start a week-long piece of work on Monday, the actual statistical analysis will commence on Friday afternoon; the rest of the time is battling with the data”

While the many data scientists I know would attest to the truth of this, there is a broader point to be made. That is the need for what can be described as Data Interpreters. This role is complementary to the data science community, acting as an interface between those with PhDs in statistics and the rest of the world. At IRM(UK) ED&BI one speaker even went so far as to present a photo graph of two ladies who filled these ying and yang roles at a European organisation.

More broadly, the advent of data science, while welcome, has not obviated the need to pass from data through information to get to insight for most of an organisation’s normal measurements. Of course an ability to go straight from data to insight is also a valuable tool, but it is not suitable for all situations. There are also a number of things to be aware of before uncritically placing full reliance on statistical models ^[5].

Theme 10 – Information is often a missing link between Business and IT strategies

Business => Information => IT

This was one of the most interesting topics of discussion at the forum and we devoted substantial time to exploring issues and opportunities in this area. The general sense was that – as all agreed – IT strategy needs to be aligned with business strategy ^[6]. However, there was also agreement that this can be hard and in many ways is getting harder. With IT leaders nowadays often consumed by the need to stay abreast of both technology opportunities (e.g. cloud computing) and technology threats (e.g. cyber crime) as well as inevitably having both extensive business as usual responsibilities and significant technology transformation programmes to run, it could be argued that some IT departments are drifting away from their business partners; not through any desire to do so, but just because of the nature (and volume) of current work. Equally with the increasing pace of business change, few non-IT executives can spend as much time understanding the role of technology as was once perhaps the case.

Given that successful information work must have a foot in both the business and technology camps (“what do we want to do with our data?” and “what data do we have available to work with?” being just two pertinent questions), the argument here was that an information strategy can help to build a bridge these two increasingly different worlds. Of course this chimes with the feedback on the primacy of strategy that I got on my earlier article from another delegate; and which I reference at the beginning of this piece. It also is consistent with my own view that the data → information → insight → action journey is becoming an increasingly business-focused one.

A couple of CDO Forum delegates had already been thinking about this area and went so far as to present models pertaining to a potential linkage, which they had either created or adapted from academic journals. These placed information between business and IT pillars not just with respect to strategy but also architecture and implementation. This is a very interesting area and one which I hope to return to in coming weeks.

Concluding thoughts

As I mentioned in Part One, the CDO Forum was an extremely useful and thought-provoking event. One thing which was of note is that – despite the delegates coming from many different backgrounds, something which one might assume would be a barrier to effective communication – they shared a common language, many values and comparable views on how to take the areas of data management and data exploitation forward. While of course delegates at an such an eponymous Forum might be expected to emphasise the importance of their position, it was illuminating to learn just how seriously a variety organisations were taking the CDO role and that CDOs were increasingly becoming agents of growth rather than just risk and compliance tsars.

Amongst the many other themes captured in this piece and its predecessor, perhaps a stand-out was how many organisations view the CDO as a firmly commercial / strategic role. This can only be a positive development and my hope is that CDOs can begin to help organisations to better understand the asset that their data represents and then start the process of leveraging this to unlock its substantial, but often latent, business value.

Notes

^[1]	See Measuring the benefits of Business Intelligence
^[2]	Someone really ought to write an article about that! UPDATE: They now have in: The Chief Data Officer “Sweet Spot” and Alphabet Soup
^[3]	See Analogies for some further examples as well as some of the pitfalls inherent in such an approach.
^[4]	I cover this duality in many places in this blog, for the reader who would like to learn more about my perspectives on the area, A bad workman blames his [Business Intelligence] tools is probably a good place to start; this links to various other resources on this site.
^[5]	I cover some of these here, including (in reverse chronological order): An Inconvenient Truth Patterns patterns everywhere – The Sequel – c/o xkcd.com Patterns patterns everywhere
^[6]	I tend to be allergic to the IT / Business schism as per: Business is from Mars and IT is from Venus (incidentally the first substantive article on I wrote for this site), but at least it serves some purpose in this discussion, rather than leading to unproductive “them and us” syndrome, that is sadly all to often the outcome.

Follow @peterjthomas

5 Themes from a Chief Data Officer Forum

11 Nov 20154 Feb 2017 Peter James Thomas big data, business analytics, business intelligence, chief data officer, data governance, data management big data, dama, IRM UK, peter aiken

A rather famous theme

This article is the first of two pieces reflecting on the emerging role of the Chief Data Officer. Each article will cover 5 themes and the concluding chapter may be viewed here.

I would like to thank both Peter Aiken, who reviewed a first draft of this piece and provided useful clarifications and additional insights, and several of my fellow delegates, who also made helpful suggestions around the text. Any errors of course remain my responsibility.

Introduction

As previously trailed, I attended the IRM(UK) Enterprise Data & Business Intelligence seminar on 3rd and 4th November. On the first of these days I sat on a panel talking about approaches to leveraging data “beyond the Big Data hype”. This involved fielding some interesting questions, both from the Moderator – Mike Simons – and the audience; I’ll look to pen something around a few of these in coming days. It was also salutary that each one of the panellists cast themselves as sceptics with respect to Big Data (the word “Luddite” was first discussed as an appropriate description, only to then be discarded); feeling that it was a very promising technology but a long way from the universal panacea it is often touted to be.

However it is on the second day of the event that I wanted to focus in this article. During this I was asked to attend the inaugural Chief Data Officer Executive Forum, sponsored by long-term IRM partner DAMA, the international data management association. This day-long event was chaired by data management luminary Peter Aiken, Associate Professor of Information Systems at Virginia Commonwealth University and Founding Director of data management consultancy Data Blueprint.

The forum consisted of a small group of people working in the strongly-related arenas of data management, data governance, analytics, warehousing and information architecture. Some attendees formally held the title of CDO, some carried out functions overlapping or analogous to the CDO. This is probably not surprising given the emergent nature of the CDO role in many industries.

There was a fair mix of delegate backgrounds, including people who previously held commercial roles, or ones in each of finance, risk and technology (a spread that I referred to in my pre-conference article). The sectors attendees worked in ranged from banking, to manufacturing, to extractives, to government to insurance. A handful of DAMA officers made up the final bakers’ dozen of “wise men” ^[1].

Discussions were both wide-ranging and very open, so I am not going to go into specifics of what people said, or indeed catalogue the delegates or their organisations. However, I did want to touch on some of the themes which arose from our interchanges and I will leaven these with points made in Peter Aiken’s excellent keynote address, which started the day in the best possible way.

Theme 1 – Chief Data Officer is a full-time job

Not a part-time activity

In my experience in business, things happen when an Executive is accountable for them and things languish when either a committee looks at an area (= no accountability), or the work receives only middle-management attention (= no authority). If both being a guardian of an organisation’s data (governance) and caring about how this is leveraged to deliver value (exploitation) are important things, then they merit Executive ownership.

Equally it can be tempting to throw the data and information agenda to an existing Executive, maybe one who already plays in the information arena such as the CFO. The problem with this is that I don’t know many CFOs who have a lot of spare time. They tend to have many priorities already. Let’s say that your average CFO has 20 main things that they worry about. When they add data and information to this mix, then let’s be optimistic and say this slots in at number 15. Is this really going to lead to paradigm-shifting work on data exploitation or data governance?

For most organisations the combination of Data Governance and Data Exploitation is a huge responsibility in terms of both scope and complexity. It is not work to be approached lightly and definitively not territory where a part-timer will thrive.

Peter Aiken also emphasizes that a newly appointed CDO may well find him or herself looking to remediate years of neglect for areas such as data management. The need to address such issues suggests that focus is required.

To turn things round, how many organisations of at least a reasonable size have one of their executives act as CFO on a part time basis?

Theme 2 – The CDO most logically reports into a commercial area (CEO or COO)

Where does the CDO fit?

I’d echo Peter Aiken’s comments that IT departments and the CIOs who lead them have achieved great things in the past decades (I’ve often been part of the teams doing just this). However today (often as a result of just such successes) the CIO’s remit is vast. Even just care and feeding of the average organisation’s IT estate is a massive responsibility. If you add in typical transformation programmes as well, it is easy to see why most CIOs are extremely busy.

Another interesting observation is that the IT project mindset – while wholly suitable for the development, purchase and integration of transaction processing systems – is less aligned with data-centric work. This is because data evolves. Peter Aiken also talks about data operating at a different cadence, by which he means the flow or rhythm of events, especially the pattern in which something is experienced.

More prosaically, anyone who has seen the impact of a set of parallel and uncoordinated projects on a previously well-designed data warehouse will be able to attest to the project and asset mindsets not mingling too well in the information arena. Also, unlike much IT work, data-centric activities are not always ones that can be characterised by having a beginning, middle and end; then tend to be somewhat more open ended as an organisation’s data seldom is static and its information needs have similar dynamism.

Instead, the exploitation of an organisation’s data is essentially a commercial exercise which is 100% targeted at better business decision making. This work should be focussed on adding value (see also Theme 5 below). Both of these facts argue for the responsible function reporting outside of IT (but obviously with a very strong technical flavour). Logical reporting lines are thus into either the CEO or COO, assuming that the latter is charged with the day-to-day operations of the business ^[2].

Theme 3 – The span of CDO responsibilities is still evolving

Answers on a postcard...

While there are examples of CDOs being appointed in the early 2000s, the role has really only recently impinged on the collective corporate consciousness. To an extent, many organisations have struggled with the data → information → insight → action journey, so it is unsurprising that the precise role of the CDO is at present not entirely clear. Is CDO a governance-focussed role, or an information-generating role, or both? How does a CDO relate to a Chief Analytics Officer, or are they the same thing? ^[3]

It is evident that there is some confusion here. On the assumption (see Theme 2 above) that the CDO sits outside IT, then how does it relate to IT and where should data-centric development resource be deployed? How does the CDO relate to compliance and risk? ^[4]

The other way of looking at this is that there is a massive opportunity for embryonic CDOs to define their function and span of control. We have had CFOs and their equivalents for centuries (longer if you go back to early Babylonian Accounting), how exciting would it be to frame the role and responsibilities of an entirely new C-level executive?

Theme 4 – Data Management is an indispensable foundation for Analytics, Visualisation and Statistical Modelling

Look out for vases containing scorpions...

Having been somewhat discursive on the previous themes, here I will be brief. I’ve previously argued that a picture paints a thousand words ^[5] and here I’ll simply include my poor attempt at replicating an exhibit that I have borrowed from Peter Aiken’s deck. I think it speaks for itself:

Data Governance Triangle

You can view Peter’s original, which I now realise diverges rather a lot from my attempt to reproduce it, here.

I’ll close this section by quoting a statistic from the plenary sessions of the seminar: “Data Scientists are only 10-20% productive; if you start a week-long piece of work on Monday, the actual statistical analysis will commence on Friday afternoon; the rest of the time is battling with the data” ^[6].

CDOs should be focussed on increasing the productivity of all staff (Data Scientists included) by attending to necessary foundational work in the various areas highlighted in the exhibit above.

Theme 5 – The CDO is in the business of driving cultural change, not delivering shiny toys

When there's something weird on your board of dash / When there's something weird and it's kinda crass / Who you gonna call?

While all delegates agreed that a CDO needs to deliver business value, a distinction was made between style and substance. As an example, Big Data is a technology – an exciting one which allows us to do things we have not done before, but still a technology. It needs to be supported and rounded out by attention to process and people. The CDO should be concerned about all three of these dimensions (see also Theme 4 above).

I mentioned at the beginning of this article that some of the attendees at the CDO forum hailed from the extractive industries. We had some excellent discussions about how safety has been embedded in the culture of such organisations. But we also spoke about just how long this has taken and how much effort was required to bring about the shift in mindset. As always, changing human behaviour is not a simple or quick thing. If one goal of a CDO is to embed reliance on credible information (including robust statistical models) into an organisation’s DNA, then early progress is not to be anticipated; instead the CDO should be dug in for the long-term and have vast reserves of perseverance.

As regular readers will be unsurprised to learn, I’m delighted with this perspective. Indeed tranches of this blog are devoted precisely to the important area ^[7]. I am also somewhat allergic to a focus on fripperies at the expense of substance, something I discussed most directly in “All that glisters is not gold” – some thoughts on dashboards. These perspectives seem to be well-aligned with the stances being adopted by many CDOs.

As with any form of change, the group unanimously felt that good communication lay at the heart of success. A good CDO needs to be a consummate communicator.

Tune in next time…

I have hopefully already given some sense of the span of topics the CDO Executive Forum discussed. The final article in this short series covers a further 5 themes and then look to link these together with some more general conclusions about what a CDO should do and how they should do it.

Notes

^[1]	Somewhat encouragingly three of these were actually wise women, then maybe I am setting the bar too low!
^[2]	Though if reporting to a COO, the CDO will need to make sure that they stay close to wherever business strategy is developed; perhaps the CEO, perhaps a senior strategy or marketing executive.
^[3]	I plan to write on the CDO / CAO dichotomy in coming weeks. UPDATE: I guess it took more than a few weeks, but now see: The Chief Data Officer “Sweet Spot” and Alphabet Soup
^[4]	I will expand on this area in Theme 6, which will be part of the second article in this series.
^[5]	I actually have the cardinality wrong here as per my earlier article.
^[6]	I will return to this point in Theme 9, which again will be part of the second article in the series.
^[7]	A list of articles about cultural change in the context of information programmes may be viewed here.