Latest Interviews / Podcasts

Interviews and Podcasts

The interviews that I conduct with leaders in their fields as part of my “In-depth” series have hopefully brought a new and interesting aspect to this site. However, often the boot is on the other foot and I am the person being interviewed about my experience and expertise in the data field and related matters [1]. Maybe interviewing other people helps me when I am in turn interviewed, maybe it’s the other way round. Whatever the case, I enjoyed recording the two conversations appearing below (thanks to the interviewers in both cases) and hope that the content is of interest to readers.

In both instances a link to the site originally publishing the interview is followed by a locally hosted version of the audio track and then a download option. I’d encourage readers to explore the other excellent interviews contained on both sites.



 
Enterprise Management 360° Podcast – 31st July 2018

 



 
Venturi Voice 3650° Podcast – 22nd April 2018

 

Downloadable link: Conducting a Data Orchestra

 
If you would like to interview me for your site or periodical, of if you are just interested in further exploring some of the themes I discuss in these two interviews, then please feel free to get in contact.
 


 
Notes

 
[1]
 
A list of other video interviews and podcasts I have taken part in can be viewed in the Media section of this site.

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

As Nice as Pie

If you can't get your graphing tool to do the shading, just add some clip art of cosmologists discussing the unusual curvature of space in the area.

© Randall Munroe of xkcd.com – Image adjusted to fit dimensions of this page

Work by the inimitable Randall Munroe, author of long-running web-comic, xkcd.com, has been featured (with permission) multiple times on these pages [1]. The above image got me thinking that I had not penned a data visualisation article since the series starting with Hurricanes and Data Visualisation: Part I – Rainbow’s Gravity nearly a year ago. Randall’s perspective led me to consider that staple of PowerPoint presentations, the humble and much-maligned Pie Chart.


 
While the history is not certain, most authorities credit the pioneer of graphical statistics, William Playfair, with creating this icon, which appeared in his Statistical Breviary, first published in 1801 [2]. Later Florence Nightingale (a statistician in case you were unaware) popularised Pie Charts. Indeed a Pie Chart variant (called a Polar Chart) that Nightingale compiled appears at the beginning of my article Data Visualisation – A Scientific Treatment.

I can’t imagine any reader has managed to avoid seeing a Pie Chart before reading this article. But, just in case, here is one (Since writing Rainbow’s Gravity – see above for a link – I have tried to avoid a rainbow palette in visualisations, hence the monochromatic exhibit):

Basic Pie Chart

The above image is a representation of the following dataset:

 
Label Count
A 4,500
B 3,000
C 3,000
D 3,000
E 4,500
Total 18,000
 

The Pie Chart consists of a circle divided in to five sectors, each is labelled A through E. The basic idea is of course that the amount of the circle taken up by each sector is proportional to the count of items associated with each category, A through E. What is meant by the innocent “amount of the circle” here? The easiest way to look at this is that going all the way round a circle consumes 360°. If we consider our data set, the total count is 18,000, which will equate to 360°. The count for A is 4,500 and we need to consider what fraction of 18,000 this represents and then apply this to 360°:

\dfrac{4,500}{18,000}\times 360^o=\dfrac{1}{4}\times 360^o=90^o

So A must take up 90°, or equivalently one quarter of the total circle. Similarly for B:

\dfrac{3,000}{18,000}\times 360^o=\dfrac{1}{6}\times 360^o=60^o

Or one sixth of the circle.

If we take this approach then – of course – the sum of all of the sectors must equal the whole circle and neither more nor less than this (pace Randall). In our example:

 
Label Degrees
A 90°
B 60°
C 60°
D 60°
E 90°
Total 360°
 

So far, so simple. Now let’s consider a second data-set as follows:

 
Label Count
A 9,480,301
B 6,320,201
C 6,320,200
D 6,320,201
E 9,480,301
Total 37,921,204
 

What does its Pie Chart look like? Well it’s actually rather familiar, it looks like this:

Basic Pie Chart

This observation stresses something important about Pie Charts. They show how a number of categories contribute to a whole figure, but they only show relative figures (percentages of the whole if you like) and not the absolute figures. The totals in our two data-sets differ by a factor of over 2,100 times, but their Pie Charts are identical. We will come back to this point again later on.


 
Pie Charts have somewhat fallen into disrepute over the years. Some of this is to do with their ubiquity, but there is also at least one more substantial criticism. This is that the human eye is bad at comparing angles, particularly if they are not aligned to some reference point, e.g. a vertical. To see this consider the two Pie Charts below (please note that these represent a different data set from above – for starters, there are only four categories plotted as opposed to five earlier on):

Comparative Pie Charts

The details of the underlying numbers don’t actually matter that much, but let’s say that the left-hand Pie Chart represents annual sales in 2016, broken down by four product lines. The right-hand chart has the same breakdown, but for 2017. This provides some context to our discussions.

Suppose what is of interest is how the sales for each product line in the 2016 chart compare to their counterparts in the right-hand one; e.g. A and A’, B and B’ and so on. Well for the As, we have the helpful fact that they both start from a vertical line and then swing down and round, initially rightwards. This can be used to gauge that A’ is a bit bigger than A. What about B and B’? Well they start in different places and end in different places, looking carefully, we can see that B’ is bigger than B. C and C’ are pretty easy, C is a lot bigger. Then we come to D and D’, I find this one a bit tricky, but we can eventually hazard a guess that they are pretty much the same.

So we can compare Pie Charts and talk about how sales change between two years, what’s the problem? The issue is that it takes some time and effort to reach even these basic conclusions. How about instead of working out which is bigger, A or A’, I ask the reader to guess by what percentage A’ is bigger. This is not trivial to do based on just the charts.

If we really want to look at year-on-year growth, we would prefer that the answer leaps off the page; after all, isn’t that the whole point of visualisations rather than tables of numbers? What if we focus on just the right-hand diagram? Can you say with certainty which is bigger, A or C, B or D? You can work to an answer, but it takes longer than should really be the case for a graphical exhibit.

Aside:

There is a further point to be made here and it relates to what we said Pie Charts show earlier in this piece. What we have in our two Pie Charts above is the make-up of a whole number (in the example we have been working through, this is total annual sales) by categories (product lines). These are percentages and what we have been doing above is to compare the fact that A made up 30% of the total sales in 2016 and 33% in 2017. What we cannot say based on just the above exhibits is how actual sales changed. The total sales may have gone up or down, the Pie Chat does not tell us this, it just deals in how the make-up of total sales has shifted.

Some people try to address this shortcoming, which can result in exhibits such as:

Comparative Pie Charts - with Growth

Here some attempt has been made to show the growth in the absolute value of sales year on year. The left-hand Pie Chart is smaller and so we assume that annual sales have increased between 2016 and 2017. The most logical thing to do would be to have the change in total area of the two Pie Charts to be in proportion to the change in sales between the two years (in this case – based on the underlying data – 2017 sales are 69% bigger than 2016 sales). However, such an approach, while adding information, makes the task of comparing sectors from year to year even harder.


 
The general argument is that Nested Bar Charts are better for the type of scenario I have presented and the types of questions I asked above. Looking at the same annual sales data this way we could generate the following graph:

Comparative Bar Charts

Aside:

While Bar Charts are often used to show absolute values, what we have above is the same “percentage of the whole” data that was shown in the Pie Charts. We have already covered the relative / absolute issue inherent in Pie Charts, from now on, each new chart will be like a Pie Chart inasmuch as it will contain relative (percentage of the whole) data, not absolute. Indeed you could think about generating the bar graph above by moving the Pie Chart sectors around and squishing them into new shapes, while preserving their area.

The Bar Chart makes the yearly comparisons a breeze and it is also pretty easy to take a stab at percentage differences. For example B’ looks about a fifth bigger than B (it’s actually 17.5% bigger) [3]. However, what I think gets lost here is a sense of the make-up of the elements of the two sets. We can see that A is the biggest value in the first year and A’ in the second, but it is harder to gauge what percentage of the overall both A and A’ represent.

To do this better, we could move to a Stacked Bar Chart as follows (again with the same sales data):

Stacked Bar Chart

Aside:

Once more, we are dealing with how proportions have changed – to put it simply the height of both “skyscrapers” is the same. If we instead shifted to absolute values, then our exhibit might look more like:

Stacked Bar Chart (Absolute Values)

The observant reader will note that I have also added dashed lines linking the same category for each year. These help to show growth. Regardless of what angle to the horizontal the lower line for a category makes, if it and the upper category line diverge (as for B and B’), then the category is growing; if they converge (as for C and C’), the category is shrinking [4]. Parallel lines indicate a steady state. Using this approach, we can get a better sense of the relative size of categories in the two years.


 
However, here – despite the dashed lines – we lose at least some of of the year-on-year comparative power of the Nested Bar Chart above. In turn the Nested Bar Chart loses some of the attributes of the original Pie Chart. In truth, there is no single chart which fits all purposes. Trying to find one is analogous to trying to find a planar projection of a sphere that preserves angles, distances and areas [5].

Rather than finding the Philosopher’s Stone [6] of an all-purpose chart, the challenge for those engaged in data visualisation is to anticipate the central purpose of an exhibit and to choose a chart type that best resonates with this. Sometimes, the Pie Chart can be just what is required, as I found myself in my article, A Tale of Two [Brexit] Data Visualisations, which closed with the following image:

Brexit Flag
UK Referendum on EU Membership – Number voting by age bracket (see caveats in original article)

Or, to put it another way:

You may very well be well bred
Chart aesthetics filling your head
But there’s always some special case, time or place
To replace perfect taste

For instance…

Never cry ’bout a Chart of Pie
You can still do fine with a Chart of Pie
People may well laugh at this humble graph
But it can be just the thing you need to help the staff

Never cry ’bout a Chart of Pie
Though without due care things can go awry
Bars are fine, Columns shine
Lines are ace, Radars race
Boxes fly, but never cry about a Chart of Pie

With apologies to the Disney Corporation!


 
Addendum:

It was pointed out to me by Adam Carless that I had omitted the following thing of beauty from my Pie Chart menagerie. How could I have forgotten?

3D Pie Chart

It is claimed that some Theoretical Physicists (and most Higher Dimensional Geometers) can visualise in four dimensions. Perhaps this facility would be of some use in discerning meaning from the above exhibit.
 


 
Notes

 
[1]
 
Including:

 
[2]
 
Playfair also most likely was the first to introduce line, area and bar charts.
 
[3]
 
Recall again we are comparing percentages, so 50% is 25% bigger than 40%.
 
[4]
 
This assertion would not hold for absolute values, or rather parallel lines would indicate that the absolute value of sales (not the relative one) had stayed constant across the two years.
 
[5]
 
A little-known Mathematician, going by the name of Gauss, had something to say about this back in 1828 – Disquisitiones generales circa superficies curvas. I hope you read Latin.
 
[6]
 
The Philosopher's Stone

No, not that one!.

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Convergent Evolution

Ichthyosaur and Dolphin

No this article has not escaped from my Maths & Science section, it is actually about data matters. But first of all, channeling Jennifer Aniston [1], “here comes the Science bit – concentrate”.


 
Shared Shapes

The Theory of Common Descent holds that any two organisms, extant or extinct, will have a common ancestor if you roll the clock back far enough. For example, each of fish, amphibians, reptiles and mammals had a common ancestor over 500 million years ago. As shown below, the current organism which is most like this common ancestor is the Lancelet [2].

Chordate Common Ancestor

To bring things closer to home, each of the Great Apes (Orangutans, Gorillas, Chimpanzees, Bonobos and Humans) had a common ancestor around 13 million years ago.

Great Apes Common Ancestor

So far so simple. As one would expect, animals sharing a recent common ancestor would share many attributes with both it and each other.

Convergent Evolution refers to something else. It describes where two organisms independently evolve very similar attributes that were not features of their most recent common ancestor. Thus these features are not inherited, instead evolutionary pressure has led to the same attributes developing twice. An example is probably simpler to understand.

The image at the start of this article is of an Ichthyosaur (top) and Dolphin. It is striking how similar their body shapes are. They also share other characteristics such as live birth of young, tail first. The last Ichthyosaur died around 66 million years ago alongside many other archosaurs, notably the Dinosaurs [3]. Dolphins are happily still with us, but the first toothed whale (not a Dolphin, but probably an ancestor of them) appeared around 30 million years ago. The ancestors of the modern Bottlenose Dolphins appeared a mere 5 million years ago. Thus there is tremendous gap of time between the last Ichthyosaur and the proto-Dolphins. Ichthyosaurs are reptiles, they were covered in small scales [4]. Dolphins are mammals and covered in skin not massively different to our own. The most recent common ancestor of Ichthyosaurs and Dolphins probably lived around quarter of a billion years ago and looked like neither of them. So the shape and other attributes shared by Ichthyosaurs and Dolphins do not come from a common ancestor, they have developed independently (and millions of years apart) as adaptations to similar lifestyles as marine hunters. This is the essence of Convergent Evolution.

That was the Science, here comes the Technology…


 
A Brief Hydrology of Data Lakes

From 2000 to 2015, I had some success [5] with designing and implementing Data Warehouse architectures much like the following:

Data Warehouse Architecture (click to view larger version in a new window)

As a lot of my work then was in Insurance or related fields, the Analytical Repositories tended to be Actuarial Databases and / or Exposure Management Databases, developed in collaboration with such teams. Even back then, these were used for activities such as Analytics, Dashboards, Statistical Modelling, Data Mining and Advanced Visualisation.

Overlapping with the above, from around 2012, I began to get involved in also designing and implementing Big Data Architectures; initially for narrow purposes and later Data Lakes spanning entire enterprises. Of course some architectures featured both paradigms as well.

One of the early promises of a Data Lake approach was that – once all relevant data had been ingested – this would be directly leveraged by Data Scientists to derive insight.

Over time, it became clear that it would be useful to also have some merged / conformed and cleansed data structures in the Data Lake. Once the output of Data Science began to be used to support business decisions, a need arose to consider how it could be audited and both data privacy and information security considerations also came to the fore.

Next, rather than just being the province of Data Scientists, there were moves to use Data Lakes to support general Data Discovery and even business Reporting and Analytics as well. This required additional investments in metadata.

The types of issues with Data Lake adoption that I highlighted in Draining the Swamp earlier this year also led to the advent of techniques such as Data Curation [6]. In parallel, concerns about expensive Data Science resource spending 80% of their time in Data Wrangling [7] led to the creation of a new role, that of Data Engineer. These people take on much of the heavy lifting of consolidating, fixing and enriching datasets, allowing the Data Scientists to focus on Statistical Analysis, Data Mining and Machine Learning.

Big Data Architecture (click to view larger version in a new window)

All of which leads to a modified Big Data / Data Lake architecture, embodying people and processes as well as technology and looking something like the exhibit above.

This is where the observant reader will see the concept of Convergent Evolution playing out in the data arena as well as the Natural World.


 
In Closing

Convergent Evolution of Data Architectures

Lest it be thought that I am saying that Data Warehouses belong to a bygone era, it is probably worth noting that the archosaurs, Ichthyosaurs included, dominated the Earth for orders of magnitude longer that the mammals and were only dethroned by an asymmetric external shock, not any flaw their own finely honed characteristics.

Also, to be crystal clear, much as while there are similarities between Ichthyosaurs and Dolphins there are also clear differences, the same applies to Data Warehouse and Data Lake architectures. When you get into the details, differences between Data Lakes and Data Warehouses do emerge; there are capabilities that each has that are not features of the other. What is undoubtedly true however is that the same procedural and operational considerations that played a part in making some Warehouses seem unwieldy and unresponsive are also beginning to have the same impact on Data Lakes.

If you are in the business of turning raw data into actionable information, then there are inevitably considerations that will apply to any technological solution. The key lesson is that shape of your architecture is going to be pretty similar, regardless of the technical underpinnings.


 
Notes

 
[1]
 
The two of us are constantly mistaken for one another.
 
[2]
 
To be clear the common ancestor was not a Lancelet, rather Lancelets sit on the branch closest to this common ancestor.
 
[3]
 
Ichthyosaurs are not Dinosaurs, but a different branch of ancient reptiles.
 
[4]
 
This is actually a matter of debate in paleontological circles, but recent evidence suggests small scales.
 
[5]
 
See:

 
[6]
 
A term that is unaccountably missing from The Data & Analytics Dictionary – something to add to the next release. UPDATE: Now remedied here.
 
[7]
 
Ditto. UPDATE: Now remedied here

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Version 2 of The Anatomy of a Data Function

Between November and December 2017, I published the three parts of my Anatomy of a Data Function. These were cunningly called Part I, Part II and Part III. Eight months is a long time in the data arena and I have now issued an update.

The Anatomy of a Data Function

Larger PDF version (opens in a new tab)

The changes in Version 2 are confined to the above organogram and Part I of the text. They consist of the following:

  1. Split Artificial Intelligence out of Data Science in order to better reflect the ascendancy of this area (and also its use outside of Data Science).
     
  2. Change Data Science to Data Science / Engineering in order to better reflect the continuing evolution of this area.

My aim will be to keep this trilogy up-to-date as best practice Data Functions change their shapes and contents.


 
If you would like help building or running your Data Function, or would just like to have an informal chat about the area, please get in touch
 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Fact-based Decision-making

All we need is fact-based decision-making ma'am
This article is about facts. Facts are sometimes less solid than we would like to think; sometimes they are downright malleable. To illustrate, consider the fact that in 98 episodes of Dragnet, Sergeant Joe Friday never uttered the words “Just the facts Ma’am”, though he did often employ the variant alluded to in the image above [1]. Equally, Rick never said “Play it again Sam” in Casablanca [2] and St. Paul never suggested that “money is the root of all evil” [3]. As Michael Caine never said in any film, “not a lot of people know that” [4].

 
Up-front Acknowledgements

These normally appear at the end of an article, but it seemed to make sense to start with them in this case:

Recently I published Building Momentum – How to begin becoming a Data-driven Organisation. In response to this, one of my associates, Olaf Penne, asked me about my thoughts on fact-base decision-making. This piece was prompted by both Olaf’s question and a recent article by my friend Neil Raden on his Silicon Angle blog, Performance management: Can you really manage what you measure? Thanks to both Olaf and Neil for the inspiration.

Fact-based decision making. It sounds good doesn’t it? Especially if you consider the alternatives: going on gut feel, doing what you did last time, guessing, not taking a decision at all. However – as is often the case with issues I deal with on this blog – fact-based decision-making is easier to say than it is to achieve. Here I will look to cover some of the obstacles and suggest a potential way to navigate round them. Let’s start however with some definitions.

Fact NOUN A thing that is known or proved to be true.
(Oxford Dictionaries)
Decision NOUN A conclusion or resolution reached after consideration.
(Oxford Dictionaries)

So one can infer that fact-based decision-making is the process of reaching a conclusion based on consideration of things that are known to be true. Again, it sounds great doesn’t it? It seems that all you have to do is to find things that are true. How hard can that be? Well actually quite hard as it happens. Let’s cover what can go wrong (note: this section is not intended to be exhaustive, links are provided to more in-depth articles where appropriate):


 
Accuracy of Data that is captured

Data Accuracy

A number of factors can play into the accuracy of data capture. Some systems (even in 2018) can still make it harder to capture good data than to ram in bad. Often an issue may also be a lack of master data definitions, so that similar data is labelled differently in different systems.

A more pernicious problem is combinatorial data accuracy, two data items are both valid, but not in combination with each other. However, often the biggest stumbling block is a human one, getting people to buy in to the idea that the care and attention they pay to data capture will pay dividends later in the process.

These and other areas are covered in greater detail in an older article, Using BI to drive improvements in data quality.
 
 
Honesty of Data that is captured

Honesty of Data

Data may be perfectly valid, but still not represent reality. Here I’ll let Neil Raden point out the central issue in his customary style:

People find the most ingenious ways to distort measurement systems to generate the numbers that are desired, not only NOT providing the desired behaviors, but often becoming more dysfunctional through the effort.

[…] voluntary compliance to the [US] tax code encourages a national obsession with “loopholes”, and what salesman hasn’t “sandbagged” a few deals for next quarter after she has met her quota for the current one?

Where there is a reward to be gained or a punishment to be avoided, by hitting certain numbers in a certain way, the creativeness of humans often comes to the fore. It is hard to account for such tweaking in measurement systems.
 
 
Timing issues with Data

Timing Issues

Timing is often problematic. For example, a transaction completed near the end of a period gets recorded in the next period instead, one early in a new period goes into the prior period, which is still open. There is also (as referenced by Neil in his comments above) the delayed booking of transactions in order to – with the nicest possible description – smooth revenues. It is not just hypothetical salespeople who do this of course. Entire organisations can make smoothing adjustments to their figures before publishing and deferral or expedition of obligations and earnings has become something of an art form in accounting circles. While no doubt most of this tweaking is done with the best intentions, it can compromise the fact-based approach that we are aiming for.
 
 
Reliability with which Data is moved around and consolidated

Data Transcription

In our modern architectures, replete with web-services, APIs, cloud-based components and the quasi-instantaneous transmission of new transactions, it is perhaps not surprising that occasionally some data gets lost in translation [5] along the way. That is before data starts to be Sqooped up into Data Lakes, or other such Data Repositories, and then otherwise manipulated in order to derive insight or provide regular information. All of these are processes which can introduce their own errors. Suffice it to say that transmission, collation and manipulation of data can all reduce its accuracy.

Again see Using BI to drive improvements in data quality for further details.
 
 
Pertinence and fidelity of metrics developed from Data

Data Metric

Here we get past issues with data itself (or how it is handled and moved around) and instead consider how it is used. Metrics are seldom reliant on just one data element, but are often rather combinations. The different elements might come in because a given metric is arithmetical in nature, e.g.

\text{Metric X} = \dfrac{\text{Data Item A}+\text{Data Item B}}{\text{Data Item C}}

Choices are made as to how to construct such compound metrics and how to relate them to actual business outcomes. For example:

\text{New Biz Growth} = \dfrac{(\text{Sales CYTD}-\text{Repeat CYTD})-(\text{Sales PYTD}-\text{Repeat PYTD})}{(\text{Sales PYTD}-\text{Repeat PYTD})}

Is this a good way to define New Business Growth? Are there any weaknesses in this definition, for example is it sensitive to any glitches in – say – the tagging of Repeat Business? Do we need to take account of pricing changes between Repeat Business this year and last year? Is New Business Growth something that is even worth tracking; what will we do as a result of understanding this?

The above is a somewhat simple metric, in a section of Using historical data to justify BI investments – Part I, I cover some actual Insurance industry metrics that build on each other and are a little more convoluted. The same article also considers how to – amongst other things – match revenue and outgoings when the latter are spread over time. There are often compromises to be made in defining metrics. Some of these are based on the data available. Some relate to inherent issues with what is being measured. In other cases, a metric may be a best approximation to some indication of business health; a proxy used because that indication is not directly measurable itself. In the last case, staff turnover may be a proxy for staff morale, but it does not directly measure how employees are feeling (a competitor might be poaching otherwise happy staff for example).
 
 
Robustness of extrapolations made from Data

By the third trimester, there will be hundreds of babies inside you...

© Randall Munroe, xkcd.com

I have used the above image before in these pages [6]. The situation it describes may seem farcical, but it is actually not too far away from some extrapolations I have seen in a business context. For example, a prediction of full-year sales may consist of this year’s figures for the first three quarters supplemented by prior year sales for the final quarter. While our metric may be better than nothing, there are some potential distortions related to such an approach:

  1. Repeat business may have fallen into Q4 last year, but was processed in Q3 this year. This shift in timing would lead to such business being double-counted in our year end estimate.
     
  2. Taking point 1 to one side, sales may be growing or contracting compared to the previous year. Using Q4 prior year as is would not reflect this.
     
  3. It is entirely feasible that some market event occurs this year ( for example the entrance or exit of a competitor, or the launch of a new competitor product) which would render prior year figures a poor guide.

Of course all of the above can be adjusted for, but such adjustments would be reliant on human judgement, making any projections similarly reliant on people’s opinions (which as Neil points out may be influenced, conciously or unconsciously, by self-interest). Where sales are based on conversions of prospects, the quantum of prospects might be a more useful predictor of Q4 sales. However here a historical conversion rate would need to be calculated (or conversion probabilities allocated by the salespeople involved) and we are back into essentially the same issues as catalogued above.

I explore some similar themes in a section of Data Visualisation – A Scientific Treatment
 
 
Integrity of statistical estimates based on Data

Statistical Data

Having spent 18 years working in various parts of the Insurance industry, statistical estimates being part of the standard set of metrics is pretty familiar to me [7]. However such estimates appear in a number of industries, sometimes explicitly, sometimes implicitly. A clear parallel would be credit risk in Retail Banking, but something as simple as an estimate of potentially delinquent debtors is an inherently statistical figure (albeit one that may not depend on the output of a statistical model).

The thing with statistical estimates is that they are never a single figure but a range. A model may for example spit out a figure like £12.4 million ± £0.5 million. Let’s unpack this.

Example distribution

Well the output of the model will probably be something analogous to the above image. Here a distribution has been fitted to the business event being modelled. The central point of this (the one most likely to occur according to the model) is £12.4 million. The model is not saying that £12.4 million is the answer, it is saying it is the central point of a range of potential figures. We typically next select a symmetrical range above and below the central figure such that we cover a high proportion of the possible outcomes for the figure being modelled; 95% of them is typical [8]. In the above example, the range extends plus £0. 5 million above £12.4 million and £0.5 million below it (hence the ± sign).

Of course the problem is then that Financial Reports (or indeed most Management Reports) are not set up to cope with plus or minus figures, so typically one of £12.4 million (the central prediction) or £11.9 million (the most conservative estimate [9]) is used. The fact that the number itself is uncertain can get lost along the way. By the time that people who need to take decisions based on such information are in the loop, the inherent uncertainty of the prediction may have disappeared. This can be problematic. Suppose a real result of £12.4 million sees an organisation breaking even, but one of £11.9 million sees a small loss being recorded. This could have quite an influence on what course of action managers adopt [10]; are they relaxed, or concerned?

Beyond the above, it is not exactly unheard of for statistical models to have glitches, sometimes quite big glitches [11].

This segment could easily expand into a series of articles itself. Hopefully I have covered enough to highlight that there may be some challenges in this area.
 
 
And so what?

The dashboard has been updated, how thrilling...

Even if we somehow avoid all of the above pitfalls, there remains one booby-trap that is likely to snare us, absent the necessary diligence. This was alluded to in the section about the definition of metrics:

Is New Business Growth something that is even worth tracking; what will we do as a result of understanding this?

Unless a reported figure, or output of a model, leads to action being taken, it is essentially useless. Facts that never lead to anyone doing anything are like lists learnt by rote at school and regurgitated on demand parrot-fashion; they demonstrate the mechanism of memory, but not that of understanding. As Neil puts it in his article:

[…] technology is never a solution to social problems, and interactions between human beings are inherently social. This is why performance management is a very complex discipline, not just the implementation of dashboard or scorecard technology.


 
How to Measure the Unmeasurable

Measuring the Unmeasurable

Our dream of fact-based decision-making seems to be crumbling to dust. Regular facts are subject to data quality issues, or manipulation by creative humans. As data is moved from system to system and repository to repository, the facts can sometimes acquire an “alt-” prefix. Timing issues and the design of metrics can also erode accuracy. Then there are many perils and pitfalls associated with simple extrapolation and less simple statistical models. Finally, any fact that manages to emerge from this gantlet [12] unscathed may then be totally ignored by those whose actions it is meant to guide. What can be done?

As happens elsewhere on this site, let me turn to another field for inspiration. Not for the first time, let’s consider what Science can teach us about dealing with such issues with facts. In a recent article [13] in my Maths & Science section, I examined the nature of Scientific Theory and – in particular – explored the imprecision inherent in the Scientific Method. Here is some of what I wrote:

It is part of the nature of scientific theories that (unlike their Mathematical namesakes) they are not “true” and indeed do not seek to be “true”. They are models that seek to describe reality, but which often fall short of this aim in certain circumstances. General Relativity matches observed facts to a greater degree than Newtonian Gravity, but this does not mean that General Relativity is “true”, there may be some other, more refined, theory that explains everything that General Relativity does, but which goes on to explain things that it does not. This new theory may match reality in cases where General Relativity does not. This is the essence of the Scientific Method, never satisfied, always seeking to expand or improve existing thought.

I think that the Scientific Method that has served humanity so well over the centuries is applicable to our business dilemma. In the same way that a Scientific Theory is never “true”, but instead useful for explaining observations and predicting the unobserved, business metrics should be judged less on their veracity (though it would be nice if they bore some relation to reality) and instead on how often they lead to the right action being taken and the wrong action being avoided. This is an argument for metrics to be simple to understand and tied to how decision-makers actually think, rather than some other more abstruse and theoretical definition.

A proxy metric is fine, so long as it yields the right result (and the right behaviour) more often than not. A metric with dubious data quality is still useful if it points in the right direction; if the compass needle is no more than a few degrees out. While of course steps that improve the accuracy of metrics are valuable and should be undertaken where cost-effective, at least equal attention should be paid to ensuring that – when the metric has been accessed and digested – something happens as a result. This latter goal is a long way from the arcana of data lineage and metric definition, it is instead the province of human psychology; something that the accomploished data professional should be adept at influencing.

I have touched on how to positively modify human behaviour in these pages a number of times before [14]. It is a subject that I will be coming back to again in coming months, so please watch this space.
 


Further reading on this subject:


 
Notes

 
[1]
 
According to Snopes, the phrase arose from a spoof of the series.
 
[2]
 
The two pertinent exchanges were instead:

Ilsa: Play it once, Sam. For old times’ sake.
Sam: I don’t know what you mean, Miss Ilsa.
Ilsa: Play it, Sam. Play “As Time Goes By”
Sam: Oh, I can’t remember it, Miss Ilsa. I’m a little rusty on it.
Ilsa: I’ll hum it for you. Da-dy-da-dy-da-dum, da-dy-da-dee-da-dum…
Ilsa: Sing it, Sam.

and

Rick: You know what I want to hear.
Sam: No, I don’t.
Rick: You played it for her, you can play it for me!
Sam: Well, I don’t think I can remember…
Rick: If she can stand it, I can! Play it!
 
[3]
 
Though he, or whoever may have written the first epistle to Timothy, might have condemned the “love of money”.
 
[4]
 
The origin of this was a Peter Sellers interview in which he impersonated Caine.
 
[5]
 
One of my Top Ten films.
 
[6]
 
Especially for all Business Analytics professionals out there (2009).
 
[7]
 
See in particular my trilogy:

  1. Using historical data to justify BI investments – Part I (2011)
  2. Using historical data to justify BI investments – Part II (2011)
  3. Using historical data to justify BI investments – Part III (2011)
 
[8]
 
Without getting into too many details, what you are typically doing is stating that there is a less than 5% chance that the measurements forming model input match the distribution due to a fluke; but this is not meant to be a primer on null hypotheses.
 
[9]
 
Of course, depending on context, £12.9 million could instead be the most conservative estimate.
 
[10]
 
This happens a lot in election polling. Candidate A may be estimated to be 3 points ahead of Candidate B, but with an error margin of 5 points, it should be no real surprise when Candidate B wins the ballot.
 
[11]
 
Try googling Nobel Laureates Myron Scholes and Robert Merton and then look for references to Long-term Capital Management.
 
[12]
 
Yes I meant “gantlet” that is the word in the original phrase, not “gauntlet” and so connections with gloves are wide of the mark.
 
[13]
 
Finches, Feathers and Apples (2018).
 
[14]
 
For example:

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

In-depth with CDO Jo Coutuer

In-depth with Jo Coutuer


Part of the In-depth series of interviews

PJT Today’s guest on In-depth is Jo Coutuer, Chief Data Officer and Member of the Executive Committee of BNP Paribas Fortis, a leading Belgian bank. Given the importance of the CDO role in Financial Services, I am very happy that Jo has managed to spare us some of his valuable time to talk.
PJT Jo, you have had an interesting career in a variety of organisations from consultancies to start-ups, from government to major companies. Can you give readers a pen-picture of the journey that has taken you to your current role?
JC For me, the variety of contexts has been the most rewarding. I started in an industry that has now sharply declined in Europe (Telco Manufacturing), continued in the consulting world of ERP tools, switched into a very interesting job for the government, became an entrepreneur and co-created a data company for 13 years, merged that data company into a big 4 consultancy and finally decided to apply my life’s learnings to the fascinating industry of banking. The most remarkable aspect of my career is the fact that my current role and the attention to data that goes with it, did not exist when I started my career. It illustrates how young people today can also build a future, without really knowing what lies ahead. All it takes is the mental flexibility to switch contexts when it is needed.
PJT At present – at least in Europe, and maybe further afield – there is no standard definition of a CDO’s role. Can you tell me a bit about the scope of your work at BNP Paribas Fortis? Are you most focussed on compliance, leverage of data, or a balance of both activities?
JC At BNP Paribas Fortis, the CEO and his executive committee made a courageous decision back in 2016 to create a specific department dedicated to Data. The move was courageous, not only because it defined a new leadership role and a budget, but also because it settled a debate between the businesses and the IT function. At the time of creation of the department, it was decided to carve out of IT the traditional function of “business intelligence and data warehousing” and to establish a central competence centre for “analytics and artificial intelligence“, which before was mostly scattered or non-existing. On top of that, the new department was tasked to assume the regulatory duties that relate to data. More and more, banking regulation focusses on reliable reporting, traceable data flows, systematic data quality measurement and well documented metadata, all embedded in a solid organisational governance. So yes, I would say our Data department is both “defensive” as well as “offensive”. As a CDO, I am privileged to be able to work with experts and leaders in the fields of regulation, data warehousing expertise and data science innovation. Without them, the breadth of the scope and the required depth, would not be manageable.
PJT Do you collaborate with other Executives in the data arena, or is the CDO primus inter pares when it comes to data matters?
JC I would not speak of a hierarchical order when it comes to data. It helps to distinguish three identities of a Data department.

The first one is the identity of the “Governor”. In that identity, peers accept that the CDO translates external duties into internal best practices, as long as this happens in a co-creation mode. We have established a “College of Data Managers”, who are 13 senior managers, representing each a specific “data perimeter”, which in its turn rather well maps to our fields of business or our internal functions. These senior managers intimately link the Data activities to the day-to-day business functions and their respective executives.

A second identity is that of the “Expert”. In that identity, we offer expertise in fields of data integration, data warehousing, reporting, visualisation, data science, … It means that I see my fellow executives as clients and partners and the Data department helps them achieve their business objectives. Mentally (and sometimes practically), we measure up to external professional services or IT companies.

A third identity is that of the “Integrator”. As an integrator, we actively make the link between the business of today, the technological and data potential of today and the business of tomorrow. We actively try to question existing practices and we introduce new concepts for a variety of business applications. And although we are more driving in this role than we are in the role of the “Expert”, we still are fully at the service of our clients.

PJT More generally, how do you see the CDO role changing in coming years, what would 2020’s CDO be doing? Will we even need CDOs in 2020?
JC Ahah! One of the most frequently asked questions on CDO related social media! If previous two years are any predictor of the future, I would say that the CDO of 2020 is one who has solidly matured the governance aspects of Data, just like the CFO and CRO have done that for financial management or risk management. Let’s say that Data has become “routine”.

At the same time, the 2020 CDO will need to offer to his peers, the technical and expert capabilities that are data centric and essential to running a digital business.

And on top of that, I believe that 2020 will be the timeframe in which data valorisation will become an active topic. I explicitly do not use the word “monetisation” because we currently associate data to often with “selling data for advertising purposes”. In our industry, PSD2 [1] will define our duties to be able to exchange data with third party service providers, at the explicit request of our clients. From that new reality, an API-driven ecosystem will surface in which data will be actively valorised, to the direct service of our clients, not to the indirect service of our marketing departments. The 2020 CDO will be instrumental in shaping his or her company’s ecosystem to make sure this happens in a well governed, trusted and safe way. Clients will seek that reassurance and will reward companies who take data management seriously.

PJT Of course, senior roles tend to exist because they add value to their organisations, what do you feel is the value that a CDO brings to the table?
JC I have already mentioned the CDO’s challenge to be schizophrenic ally split between his or her various identities. But it is exactly that breadth of scope that can add value. The CDO should be an “executive integrator”. He can employ “governors” and “experts”, but his or her role in the peer team of executives is to represent the transversality of data’s nature. Data “flows”, data “unites”. More than it is “oil”, data is “water”. It flows through the company’s ecosystem and it nourishes the business and the future business potential. As such, the CDO needs to keep the water clean and make sure it gets pumped across the organisation, so that others can benefit from the nutrients it. And while doing so, the CDO has a duty to add nutrients to the water, in the form of analytical or artificial intelligence induced insights.
PJT Focussing on Analytics, I know you have written about how to build the ideal Analytics team and have mentioned that “purple people” are the key. Can you explain more about this?
JC Purple people are people that integrate the skills of “red” people and “blue” people. Red people bring the scientific data methodologies to the table. Blue people bring the solid frameworks of the business. Data people as individuals and a Data department as an entity, must have as a mission to be “purple” and to actively bridge the gap between the fast growing set of data technologies and methodologies on the one hand and the rapidly evolving and transforming business challenges on the other hand. And of course, if you like Prince [2] as a musician, that can be an asset too!
PJT In my discussions with other CDOs [3] and indeed in my own experience, it seems that teamwork is crucial for a CDO. Of course, this is important for many senior roles, but it does seem central to what a CDO does. My perspective is that both a CDO’s own team and the virtual teams that he or she forms with colleagues are going to have a big say in whether things go well or not. What are your views on this topic?
JC You are absolutely right. A CDO or data function cannot exist in isolation. At some times, transversality feels a burden because it imposes a daily attention to stakeholders. However, in reality, it’s exactly the transversal effect that can generate the added value to an organisation. At the end of the day, the integration aspects between departments and people will generate positive side effects, above and beyond the techniques of data management.
PJT Artificial Intelligence in its various guises has been the topic of conversation recently. This is something with strong linkage to the data field. Obviously without divulging any commercial secrets, what role do you see AI playing in banking going forwards? What about in our lives in general?
JC It’s funny that AI is being discovered as a new topic. I remember writing my Master thesis on the topic a long time ago. Of course, things have evolved since the 90s, with a storage and computing capacity that is approximately 50,000 times stronger for the same price point. This capacity explosion, combined with the connectivity of the internet and the cloud, combined with the increased awareness that data and algorithms have become central elements in a many business strategies, has fundamentally re-calibrated the potential of AI.

In banking, AI and Analytics will soon help clients understand their finances better, will help them to take better and faster decisions, will generate a better (less friction) client experience for “the easy stuff” and it will allow the banks to put humans on “the hard stuff” or on those interactions with their clients that require true human interaction. Behind the scenes, Analytics and AI are already helping to prevent fraud, monitoring suspicious transactions to detect crime, money laundering and fraud. And even deeper inside the mechanics of a bank, Analytics and AI are helping prevent cyber-crimes and are monitoring the stability of the technological platforms onto which our modern financial and societal system is built.

I am convinced that the societal role of banks will continue to exists, despite innovative peer-to-peer or blockchain driven schemes. As such, Analytics and AI will contribute to society as a whole, through their contribution to a reliable and stable financial services system.

PJT With GDPR [4] coming into force only a couple of months ago, the subject of customer data and how it is used is a topical one. Taking BNP Paribas Fortis to one side, what are your thoughts on the balance between data privacy and the “free” services that we all pay for by allowing our data to be sold?
JC I believe that GDPR is both important legislation and brings benefits to customers. First of all, we have good historical reasons to care about our privacy. In times of societal crises or wars, it is the first weapon that is used against society and its citizens. So we should care for it deeply. Second, being in an industry for which “trust” is the most essential element of identity, protecting and respecting the data and the privacy of clients is a natural reflex. And putting the banking question aside for a moment, we should continue to educate aggressively about the fact that services never come for free. As long as consumers are well informed that they pay for their convenience with their data, there is no fundamental concern. But because there is still no real “paid” economy surfacing, the consumer does not really have a choice between “pay-for-service” or “give-data-for-service”. I believe that the market potential for paid services, that guarantee non-exploitation of personal data, is quietly growing. And when it finally appears, consumers will start making choices. Personally, I admit to having moved from being on all possible digital channels and tools, towards being much more selective. And I must admit that digital life with a privacy aware mind is still possible and still fun.
PJT It seems to me that a key capability of a CDO is as an influencer. Influence can take many shapes, from being an acknowledged expert in an area, to the softer skills of being someone that others can talk to openly. Do you agree about this observation? If so, how do you seek to be an influencer?
JC It’s a thin line to walk and it depends on the type of CDO that you are and the mandate that you have. If you have a mandate to do “governance only”, then you should have the confidence of delivering on your mandate, just like a CRO or a CFO does. For that I always revert to the phrase: “we agreed that data is a valuable asset, just like money or people or buildings, … so let’s then act like it.” If you have mandate to “change”, to “create value”, then you have to be an integrator and influencer because you can never change an organisation and its people on your own.
PJT Before letting you go, a quick personal question. I know you spent some time at the University of Cambridge. I lived in this town while my wife was working on her PhD. Like Cambridge, Leuven [5] is a historic town just outside of a major capital city. What parallels do you see between the two and what did you think of the locals?
JC Cambridge is famous for its “punts”, Leuven for its Stella Artois “pints”. And both central churches (or chapels) are home to iconic paintings by Flemish masters, Rubens in Cambridge and Bouts in Leuven. Visit both!
PJT Jo, thank you so much for talking to me and giving readers the benefit of your ideas and experience.

Jo Coutuer can be reached at via his LinkedIn profile.


Disclosure: At the time of publication, neither peterjamesthomas.com Ltd. nor any of its Directors had any shared commercial interests with Jo Coutuer, BNP Paribas Fortis or any entities associated with either of these.


If you are a Chief Data Officer, a Chief Analytics Officer, a Director of Data, or hold some other “Top Data Job” and would like to share your thoughts with the readers of this site in an interview like this one, please get in contact.

 
Notes

 
[1]
 
Payment Services Directive 2.
 
[2]
 
Prince Rogers Nelson.
 
[3]
 
Two recent examples include:

 
[4]
 
General Data Protection Regulation.
 
[5]
 
Leuven.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Offence, Defence and the Top Data Job

Offence and Defence - 2018 World Cup

Football [1] has been in the news rather a lot of late; apparently there is some competition or other going on in Russia [2]. Presumably it was this that brought to my mind the analogy sometimes applied to the data arena of offence and defence [3]. Defence brings to mind Data Governance, Master Data Management and Data Quality. Offence suggests Data Science, Machine Learning and Analytics. This is an analogy I have briefly touched on in these pages before [4]; here I want to expand on it.

Rather than Association Football, it was however the American version that first crossed my mind. In Gridiron, there are of course wholly separate teams for each of offence, defence, kicking and receiving, each filled with specialists. I would be happy to learn from readers about any counterexamples, but I struggle to think of any other sport that is like this [5]. In each of Association Football, both types of Rugby, Australian Rules Football and indeed Basketball, Baseball (see previous note [5]) Volleyball, Hockey, Ice Hockey, Lacrosse, Polo, Water Polo and Handball, the same players form both the offence and defence. Of course this is probably due to them being a bit less stop-start than American Football, offence can turn into defence in a split-second in some of them.

To stick with Football (I’m going to drop “Association” from here on in), while players may be designated as goalkeepers, defenders, mid-fielders, wingers and attackers (strikers), any player may be called on to defend or attack at any time [6]. Star strikers may need to make desperate tackles. Defenders (who tend to be taller) will be called up to try to turn corner kicks into goals. Even at the most basic level, the ball needs to be transferred from one end of the field to the other, which requires (absent the Goalkeeper simply taking what is known as route one – i.e. kicking it as far as they can towards the other goal) several players to pass the ball, control it and pass again. The whole team contributes.

I have written before about the nomenclature maze that often surrounds the Top Data Job [7] (see Further Reading at the end of the article). In some organisations the offence and defence aspects of the data arena are separate, in the sense that both are headed by someone who then reports into a non-data-specialist. For example a Chief Data Officer and a Chief Analytics Officer might both report to a Chief Operating Officer. This feels a bit like the American Football approach; separate teams to do separate things. I’m probably stretching the metaphor [8], but a problem that occurs to me is that – in business – the data offence and data defence teams will need to be on the field of play at the same time. Aren’t they going to get in each other’s way and end up duplicating activities? At the very least, they are going to need some robust rules about who does what and for these to be made very clear to the players. Also, ultimately, while both offence and defence teams in Gridiron will have their own coaches, these will report to a Head Coach; someone who presumably knows just a bit about American Football. I can’t think of any instances where an NFL team has no Head Coach and instead the next tier of staff all report to the owner.

Of course having multiple senior data roles reporting into different parts of the Executive may be fine and many organisations operate this way. However, again coming back to my sporting analogy, I prefer the approach adopted by Football, Rugby, Basketball and the rest. I like the idea of a single, cohesive Data Function, led by someone who is a data specialist, no matter what their job title might me. In most sports what seems to work well is a team in which people have roles, but in which there is cross-over and a need to just get done. I think this works for people involved in data work as well.

You wouldn’t have the Head of Tax and the Head of Financial Reporting both reporting to the CEO, that’s what CFOs are for (among other things). It should be the same in the data arena with the Top Data Job being just that, the one person ultimately accountable for both the control and leverage of data. I have made no secret of my opinion that this is the optimum approach. I think my view is supported by the overwhelming number of sports where offence and defence are functions of the same, cohesive team.
 


Further reading on this subject:


 
Notes

 
[1]
 
Association of course.
 
[2]
 
My winter team sport was always Rugby Football, of the Union variety. But – as is evident from quite a few articles on this site – for many years my spare time was mostly occupied by rock climbing and bouldering.

The day after England’s defeat at the hands of Croatia, the Polish guy I regularly buy my skinny flat white from offered his commiserations about yesterday. I was at a loss as to what he had done to me yesterday and he had to explain that he was referring to the World Cup. Not all Brit’s are Football fanatics.

 
[3]
 
Offense and defense for my wife and any other Americans reading.
 
[4]
 
This was as part of Alphabet Soup.
 
[5]
 
The only thing I could think of that was even in the same ballpark (pun intended) was the use of a designated hitter in some baseball leagues. Even then, the majority of the team have to field as well as bat.
 
[6]
 
There are indeed examples of Goalkeepers, the quintessential defensive player, scoring in International Football.
 
[7]
 
With acknowledgement to Peter Aiken.
 
[8]
 
For neither the first time, nor the last: e.g. A bad workman blames his [Business Intelligence] tools and Analogies.

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

How to Spot a Flawed Data Strategy

Data Strategy Alarm Bell

I was recently preparing for an data-centric interview to be published as a podcast [watch this space]. A chat with the interviewer had prompted me to think about the question of how you can tell that there are issues with your Data Strategy. During the actual interview, we had so many things to talk about that we never got to this question. I thought that it was interesting enough to merit a mini-article, which is the genesis of this piece.


 
I have often had my services retained by organisations to develop a Data Strategy from scratch [1]. However, I have also gone into organisations who have an established Data Strategy, but are concerned about whether it is the right one and how it is being executed. In this latter case, my thought processes include the following.

The initial question to consider is, “are there any obvious alarm bells ringing?” Some alarm bells are ones that would apply to any strategy.

First of all, you need to be clear which problem you are addressing or which opportunity you want to seize (sometimes both). I have been brought into organisations where the Data Strategy consists of something like “build a Data Lake”. While I have nothing against data lakes myself, and indeed have helped to create them, the obvious question is “why does this organisation need a Data Lake?” If the answer is not something core to the operations of the organisation, it may well not need one.

Next implementing a technology is not a strategy. The data arena is unfortunately plagued by technology fan-boyism [2]. The latest and greatest visualisation tool is not going to sort out your data quality problems all by itself. Moving your back-end data platform from Oracle to Hadoop is not going to suddenly increase adoption of Analytics. All of these technologies have valuable parts to play, but the important thing to remember is that a Data Strategy must first and foremost be a business strategy. As such it must do at least one of: increase sales, optimise pricing, decrease costs, reduce risks or open new markets. A Data Lake will not in and of itself do any of these, what you chose to do with it may well contribute to many of these areas.

A further consideration is “what else is going on in the organisation?” This is important both in a business and a technological sense. If the organisation has just acquired another one, does the Data Strategy reflect this? If there is an ongoing Digital Transformation programme, then how does the Data Strategy align itself with this; is it an enabler, a Digital programme work-stream, or a stand-alone programme? In the same vein, it may well make sense to initially design the Data Strategy along purist lines (failing to do so at least initially may be a missed opportunity for radical change [3]), however there will then need to be an adjustment to take into account what else is going on in the organisation, its current situation and its culture.

Having introduced the word “culture”, the final observation is in this area. If the Data Strategy does not envisage impacting corporate culture (e.g. to shift it to focus more on the importance and potential value of data), then one must ask what are its chances of achieving anything tangible? All organisations are comprised of individuals and the best strategies both take this into account and were developed as a result of spending time thinking how best to influence people’s behaviour in a positive manner [4]. Absence of cultural and education / communication elements from a Data Strategy is more a 200 decibel claxon than a regular alarm bell.


 
Given I am generally brought in when organisations want to address a data problem or seize a data opportunity, I have to admit that I probably have a biassed set of experiences. Nevertheless one or more of the above issues has been present whenever I have started to examine an existing Data Strategy. In the (for me) hypothetical case where things are in better shape, then the next steps in evaluating a Data Strategy would be to get into the details of each of: the Data Strategy itself; the organisation and what makes it tick; and the people and personalities involved. However, if a Data Strategy does not suffer from any of the above flaws, it is already more sound than the majority of Data Strategies and the people who drew it up are to be congratulated.


 
If you would like help with your existing Data Strategy, or to kick-off the process of developing one from scratch, then please feel free to get in contact.
 


Further reading on this subject:


 
Notes

 
[1]
 
A matrix of the data-centric (and other) areas I have been accountable for at various organisations appears here. Just scroll down to Data Strategy, which the is the second row in the Data-centric Work section.
 
[2]
 
And fan-girlism, though this seems to be less of a thing TBH.
 
[3]
 
See:

 
[4]
 
I cover the cultural aspects of Data-centric work in many places on this site, perhaps start with 20 Risks that Beset Data Programmes and Ever tried? Ever failed?, both of which also link back to my earlier (and still relevant) writing on this subject.

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

An in-depth interview with CDO Caroline Carruthers

In-depth with Caroline Carruthers


Part of the In-depth series of interviews

PJT Today I am talking to Caroline Carruthers, experienced data professional and famous as co-author (with Peter Jackson) of The Chief Data Officer’s Playbook. Caroline is currently Group Director of Data Management at Lowell Group. I am very pleased that she has found the time to talk to me about some of her experiences and ideas about the data space.
PJT Caroline, I mentioned your experience in the data field, can you paint a picture of this for readers?
CC Hi Peter, of course. I often describe myself as a data cheerleader or data evangelist. I love all the incredible technologies that are coming around such as AI. However, the foundation we have to build these on is a data one. Without that solid data foundation we are just building houses of cards. My experience started off in IT as a graduate for the TSB, moving into consulting for IBM and then ATOS I quickly recognised that whilst I love technology (I will always be a geek!) the root cause of a lot of the issues we are facing came down to data and our treatment of it, whether that meant we didn’t understand the risks or value associated with it is just different sides of the same pendulum. So my career has been a bit eclectic through CTO and Programme Director roles but the focus for me has always been on treating data as a valuable asset.
PJT The Chief Data Officer's Playbook
The Chief Data Officer’s Playbook has been very well-received. Equally I can imagine that it was a lot of work to pull this together with Peter. Can you tell me a bit about what motivated you to write this book?
CC The book came about as Peter and I were presenting at a conference in London and we both gave the same answer to a question about the role of a CDO; there was no manual or rule book, it was an evolving role and, until we did have something that clarified what it was, we would struggle. Very luckily for me Peter came up with the idea of writing it together. We never pretended we had all the answers, it was a way of getting our experiences down on paper so we (the data community) could have a starting point to professionalise what we all do. We both love being part of the data community and feel really passionate about helping everyone understand it a little better.
PJT As an aside, what was the experience of co-authoring like? What do you feel this approach brought to the book and were there any challenges?
CC It was a gift, writing with Peter. We’ve both been honest with each other and said that if either of us had tried to do it on their own we probably wouldn’t have finished it. We both have different and complementary strengths so we just made sure to use that when we wrote the book. Having an idea of what we wanted it to look like from the beginning helped massively and having two of us meant that when one of us had had enough the other one brought them back round. The challenges were more around time together than anything else, we both were and are full time CDOs so this was holidays and weekends. Luckily for us we didn’t know what we didn’t know; on the day of the book launch was when our editor told us it wasn’t normal to write a book as fast as we did!
PJT There is a lot of very sound and practical advice contained in The Chief Data Officer’s Playbook, is there any particular section, or a particular theme that is close to your heart, or which you feel is central to driving success in the data arena?
CC For me personally it’s the chapter about data hoarding because it came about from a Sunday morning tradition that my son and I have, where we veg in front of the tv and spend a lazy Sunday morning together. The idea is that data hoarders keep all data, which means that organisations become so crammed full of data that they don’t value it anymore. This chapter of the book is about understanding the value of data and treating it accordingly. If we truly understood the value of what we had, people would change their behaviour to look after it better.
PJT I have been speaking to other CDOs about the nature of the role and how – in many ways – this is still ill-defined and emergent [1]. How do you define the scope of the CDO role and do you see this changing in coming years?
CC In the book, we talk about different generations of CDOs, the first being risk focused, the second being value-add focused but by the third generation we will have a clearly defined, professionalised role that is clearly accepted as a key member of the C suite.
PJT I find that something which most successful data leaders have in common is a focus on the people aspects of embracing the opportunities afforded by leveraging data [2]. What are your feelings on this subject?
CC I totally agree with that, I often talk about hearts and minds being the most important aspect of data. You can have the best processes, tools and tech in the world but if you don’t convince people to come out of their comfort zone and try something different you will fail.
PJT What practical advice can you offer to data professionals seeking to up their game in influencing organisations at all levels from the Executive Suite to those engaged in day-to-day activities? How exactly do you go about driving cultural change?
CC Focus on outcomes, keep your head up and be aware of the detail but make sure you are solving problems – just have fun while you do it.
PJT Some CDOs have a focus on the risk and governance agenda, some are more involved in using data to drive growth and open new opportunities, some have blended responsibilities. Where do you sit in this spectrum and where do you feel that CDOs can add greatest value?
CC I’d say I started from the risk adverse side but with a background in tech and strategy, I do love the value add side of data and think as a CDOs you need to understand it all.
PJT The Chief Data Officer’s Playbook is a great resource to help both experienced CDOs and those new to the field. Are there other ways in which data leaders can benefit from the ideas and insights that you and Peter have?
CC Funny you should mention this… On the back of the really great feedback and reception the book got we are running a CDO summer school this summer sponsored by Collibra. We thought it would be an opportunity to engage with people more directly and help form a community that can help and learn from each other.
PJT I also hear that you are working on a sequel to your successful book, can you give readers a sneak preview of what this will be covering?
CC Of course, it’s obviously still about data but is more focused on the transformation an organisation needs to go through in order to get the best from it. It’s due out spring next year so watch this space.
PJT As well as the activities we have covered, I know that you are engaged in some other interesting and important areas. Can you first of all tell me a bit about your work to get children, and in particular girls, involved in Science, Technology, Engineering and Mathematics (STEM)?
CC I would love to. I’m really lucky that I get the chance to talk to girls in school about STEM subjects and to give them an insight into some of the many different careers that might interest them that they may not have been aware of. I don’t remember my careers counsellor at school telling me I could be a CDO one day! There are two key messages that I really try to get across to them. First, I genuinely believe that everyone has a talent, something that excites them and they are good at but if you don’t try different things you may never know what that is. Second, I don’t care if they do go into a STEM subject. What I care passionately about is that they don’t limit themselves based on other people’s preconceptions.
PJT Finally, I know that you are also a trustee of CILIP the information association and are working with them to develop data-specific professional qualifications. Why do you think that this is important?
CC I don’t think that data professionals necessarily get the credit they deserve and it can also be really hard to move into our field without some pretty weighty qualifications. I want to open the subject out so we can have access courses to get into data as well as recognised qualifications to continue to professionalise and value the discipline of data.
PJT Caroline, it has been a pleasure to speak. Thank you for sharing your ideas with us today.

Caroline Carruthers can be reached at caroline.carruthers@carruthersandjackson.com.


Disclosure: At the time of publication, neither peterjamesthomas.com Ltd. nor any of its Directors had any shared commercial interests with Caroline Carruthers or any entities associated with her.


If you are a Chief Data Officer, a Chief Analytics Officer, a Director of Data, or hold some other “Top Data Job” and would like to share your thoughts with the readers of this site in an interview like this one, please get in contact.

 
Notes

 
[1]
 
See An in-depth interview with experienced Chief Data Officer Roberto Maranca.
 
[2]
 
See:

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Building Momentum – How to begin becoming a Data-driven Organisation

Building Momentum - Becoming a Data Driven Organisation

Larger, annotated PDF version (opens in a new tab)

Introduction

It is hard to find an organisation that does not aspire to being data-driven these days. While there is undoubtedly an element of me-tooism about some of these statements (or a fear of competitors / new entrants who may use their data better, gaining a competitive advantage), often there is a clear case for the better leverage of data assets. This may be to do with the stand-alone benefits of such an approach (enhanced understanding of customers, competitors, products / services etc. [1]), or as a keystone supporting a broader digital transformation.

However, in my experience, many organisations have much less mature ideas about how to achieve their data goals than they do about setting them. Given the lack of executive experience in data matters [2], it is not atypical that one of the large strategy consultants is engaged to shape a data strategy; one of the large management consultants is engaged to turn this into something executable and maybe to select some suitable technologies; and one of the large systems integrators (or increasingly off-shore organisations migrating up the food chain) is engaged to do the work, which by this stage normally relates to building technology capabilities, implementing a new architecture or some other technology-focussed programme.

Juggling Third Parties

Even if each of these partners does a great job – which one would hope they do at their price points – a few things invariably get lost along the way. These include:

  1. A data strategy that is closely coupled to the organisation’s actual needs rather than something more general.

    While there are undoubtedly benefits in adopting best practice for an industry, there is also something to be said for a more tailored approach, tied to business imperatives and which may have the possibility to define the new best practice. In some areas of business, it makes sense to take the tried and tested approach, to be a part of the herd. In others – and data is in my opinion one of these – taking a more innovative and distinctive path is more likely to lead to success.
     

  2. Connective tissue between strategy and execution.

    The distinctions between the three types of organisations I cite above are becoming more blurry (not least as each seeks to develop new revenue streams). This can lead to the strategy consultants developing plans, which get ripped up by the management consultants; the management consultants revisiting the initial strategy; the systems integrators / off-shorers replanning, or opening up technical and architecture discussions again. Of course this means the client paying at least twice for this type of work. What also disappears is the type of accountability that comes when the same people are responsible for developing a strategy, turning this into a practical plan and then executing this [3].
     

  3. Focus on the cultural aspects of becoming more data-driven.

    This is both one of the most important factors that determines success or failure [4] and something that – frankly because it is not easy to do – often falls by the wayside. By the time that the third external firm has been on-boarded, the name of the game is generally building something (e.g. a Data Lake, or an analytics platform) rather than the more human questions of who will use this, in what way, to achieve which business objectives.

Of course a way to address the above is to allocate some experienced people (internal or external, ideally probably a blend) who stay the course from development of data strategy through fleshing this out to execution and who – importantly – can also take a lead role in driving the necessary cultural change. It also makes sense to think about engaging organisations who are small enough to tailor their approach to your needs and who will not force a “cookie cutter” approach. I have written extensively about how – with the benefit of such people on board – to run such a data transformation programme [5]. Here I am going to focus on just one phase of such a programme and often the most important one; getting going and building momentum.


 
A Third Way

There are a couple of schools of thought here:

  1. Focus on laying solid data foundations and thus build data capabilities that are robust and will stand the test of time.
     
  2. Focus on delivering something ASAP in the data arena, which will build the case for further investment.

There are points in favour of both approaches and criticisms that can be made of each as well. For example, while the first approach will be necessary at some point (and indeed at a relatively early one) in order to sustain a transformation to a data-driven organisation, it obviously takes time and effort. Exclusive focus on this area can use up money, political capital and try the patience of sponsors. Few business initiatives will be funded for years if they do not begin to have at least some return relatively soon. This remains the case even if the benefits down the line are potentially great.

Equally, the second approach can seem very productive at first, but will generally end up trying to make a silk purse out of a sow’s ear [6]. Inevitably, without improvements to the underlying data landscape, limitations in the type of useful analytics that be carried out will be reached; sometimes sooner that might be thought. While I don’t generally refer to religious topics on this blog [7], the Parable of the Sower is apposite here. Focussing on delivering analytics without attending to the broader data landscape is indeed like the seed that fell on stony ground. The practice yields results that spring up, only to wilt when the sun gets hot, given that they have no real roots [8].

So what to do? Well, there is a Third Way. This involves blending both approaches. I tend to think of this in the following way:

Proportion of Point and Strategic Data Activities over Time

First of all, this is a cartoon, it is not intended to indicate actual percentages, just to illustrate a general trend. In real life, it is likely that you will cycle round multiple times and indeed have different parallel work-streams at different stages. The general points I am trying to convey with this diagram are:

  1. At the beginning of a data transformation programme, there should probably be more emphasis on interim delivery and tactical changes. However, imoportantly, there is never zero strategic work. As things progress, the emphasis should swing more to strategic, long-term work. But again, even in a mature programme, there is never zero tactical work. There can also of course be several iterations of such shifts in approach.
     
  2. Interim and tactical steps should relate to not just analytics, but also to making point fixes to the data landscape where possible. It is also important to kick off diagnostic work, which will establish how bad things are and also suggest areas which could be attacked sooner rather than later; this too can initially be done on a tactical basis and then made more robust later. In general, if you consider the span of strategic data work, it makes sense to kick off cut-down (and maybe drastically cut-down) versions of many activities early on.
     
  3. Importantly, the tactical and strategic work-streams should not be hermetically sealed. What you actually want is healthy interplay. Building some early, “quick and dirty” analytics may highlight areas that should be covered by a data audit, or where there are obvious weaknesses in a data architecture. Any data assets that are built on a more strategic basis should also be leveraged by tactical work, improving its utility and probably increasing its lifespan.

 
Interconnected Activities

At the beginning of this article, I present a diagram (repeated below) which covers three types of initial data activities, the sort of work that – if executed competently – can begin to generate momentum for a data programme. The exhibit also references Data Strategy.

Building Momentum - Becoming a Data Driven Organisation

Larger, annotated PDF version (opens in a new tab)

Let’s look at each of these four things in some more detail:

  1. Analytic Point Solutions

    Where data has historically been locked up in either hard-to-use repositories or in source systems themselves, liberating even a bit of it can be very helpful. This does not have to be with snazzy tools (unless you want to showcase the art of the possible). An anecdote might help to explain.

    At one organisation, they had existing reporting that was actually not horrendous, but it was hard to access, hard to parameterise and hard to do follow-on analysis on. I took it upon myself to run 30 plus reports on a weekly and monthly basis, download the contents to Excel, front these with some basic graphs and make these all available on an intranet. This meant that people from Country A or Department B could go straight to their figures rather than having to run fiddly reports. It also meant that they had an immediate visual overview – including some comparisons to prior periods and trends over time (which were not available in the original reports). Importantly, they also got a basic pivot table, which they could use to further examine what was going on. These simple steps (if a bit laborious for me) had a massive impact. I later replaced the Excel with pages I wrote in a new web-reporting tool we built in house. Ultimately, my team moved these to our strategic Analytics platform.

    This shows how point solutions can be very valuable and also morph into more strategic facilities over time.
     

  2. Data Process Improvements

    Data issues may be to do with a range of problems from poor validation in systems, to bad data integration, but immature data processes and insufficient education for data entry staff are often key conributors to overall problems. Identifying such issues and quantifying their impact should be the province of a Data Audit, which is something I would recommend considering early on in a data programme. Once more this can be basic at first, considering just superficial issues, and then expand over time.

    While fixing some data process problems and making a stepped change in data quality will both probably take time an effort, it may be possible to identify and target some narrower areas in which progress can be made quite quickly. It may be that one key attribute necessary for analysis is poorly entered and validated. Some good communications around this problem can help, better guidance for people entering it is also useful and some “quick and dirty” reporting highlighting problems and – hopefully – tracking improvement can make a difference quicker than you might expect [9].
     

  3. Data Architecture Enhancements

    Improving a Data Architecture sounds like a multi-year task and indeed it can often be just that. However, it may be that there are some areas where judicious application of limited resource and funds can make a difference early on. A team engaged in a data programme should seek out such opportunities and expect to devote time and attention to them in parallel with other work. Architectural improvements would be best coordinated with data process improvements where feasible.

    An example might be providing a web-based tool to look up valid codes for entry into a system. Of course it would be a lot better to embed this functionality in the system itself, but it may take many months to include this in a change schedule whereas the tool could be made available quickly. I have had some success with extending such a tool to allow users to build their own hierarchies, which can then be reflected in either point analytics solutions or more strategic offerings. It may be possible to later offer the tool’s functionality via web-services allowing it to be integrated into more than one system.
     

  4. Data Strategy

    I have written extensively about Data Strategy on this site [10]. What I wanted to cover here is the interplay between Data Strategy and some of the other areas I have just covered. It might be thought that Data Strategy is both carved on tablets of stone [11] and stands in splendid and theoretical isolation, but this should not ever be the case. The development of a Data Strategy should of course be informed by a situational analysis and a vision of “what good looks like” for an organisation. However, both of these things can be shaped by early tactical work. Taking cues from initial tactical work should lead to a more pragmatic strategy, more aligned to business realities.

    Work in each of the three areas itemised above can play an important role in shaping a Data Strategy and – as the Data Strategy matures – it can obviously guide interim work as well. This should be an iterative process with lots of feedback.


 
Closing Thoughts

I have captured the essence of these thoughts in the diagram above. The important things to take away are that in order to generate momentum, you need to start to do some stuff; to extend the physical metaphor, you have to start pushing. However, momentum is a vector quantity (it has a direction as well as a magnitude [12]) and building momentum is not a lot of use unless it is in the general direction in which you want to move; so push with some care and judgement. It is also useful to realise that – so long as your broad direction is OK – you can make refinements to your direction as you pick up speed.

The above thoughts are based on my experience in a range of organisations and I am confident that they can be applied anywhere, making allowance for local cultures of course. Once momentum is established, it still needs to be maintained (or indeed increased), but I find that getting the ball moving in the first place often presents the greatest challenge. My hope is that the framework I present here can help data practitioners to get over this initial hurdle and begin to really make a difference in their organisations.
 


Further reading on this subject:


 
Notes

 
[1]
 
Way back in 2009, I wrote about the benefits of leveraging data to provide enhanced information. The article in question was tited Measuring the benefits of Business Intelligence. Everything I mention remains valid today in 2018.
 
[2]
 
See also:

 
[3]
 
If I many be allowed to blow my own trumpet for a moment, I have developed data / information strategies for eight organisations, turned seven of these into a costed / planned programme and executed at least the first few phases of six of these. I have always found being a consistent presence through these phases has been beneficial to the organisations I was helping, as well as helping to reduce duplication of work.
 
[4]
 
See my, now rather venerable, trilogy about cultural change in data / information programmes:

  1. Marketing Change
  2. Education and cultural transformation and
  3. Sustaining Cultural Change

together with the rather more recent:

  1. 20 Risks that Beset Data Programmes and
  2. Ever tried? Ever failed?
 
[5]
 
See for example:

  1. Draining the Swamp
  2. Bumps in the Road and
  3. Ideas for avoiding Big Data failures and for dealing with them if they happen
 
[6]
 
Dictionary.com offers a nice explanation of this phrase..
 
[7]
 
I was raised a Catholic, but have been areligious for many years.
 
[8]
 
Much like x^2+x+1=0.

For anyone interested, the two roots of this polynomial are clearly:

-\dfrac{1}{2}+\dfrac{\sqrt{3}}{2}\hspace{1mm}i\hspace{5mm}\text{and}\hspace{5mm}-\dfrac{1}{2}-\dfrac{\sqrt{3}}{2}\hspace{1mm}i

neither of which is Real.

 
[9]
 
See my rather venerable article, Using BI to drive improvements in data quality, for a fuller treatment of this area.
 
[10]
 
For starters see:

  1. Forming an Information Strategy: Part I – General Strategy
  2. Forming an Information Strategy: Part II – Situational Analysis
  3. Forming an Information Strategy: Part III – Completing the Strategy

and also the Data Strategy segment of The Anatomy of a Data Function – Part I.

 
[11]
 
Tablet of Stone
 
[12]
 
See Glimpses of Symmetry, Chapter 15 – It’s Space Jim….

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases