Version 2 of The Anatomy of a Data Function

Between November and December 2017, I published the three parts of my Anatomy of a Data Function. These were cunningly called Part I, Part II and Part III. Eight months is a long time in the data arena and I have now issued an update.

The Anatomy of a Data Function

Larger PDF version (opens in a new tab)

The changes in Version 2 are confined to the above organogram and Part I of the text. They consist of the following:

  1. Split Artificial Intelligence out of Data Science in order to better reflect the ascendancy of this area (and also its use outside of Data Science).
     
  2. Change Data Science to Data Science / Engineering in order to better reflect the continuing evolution of this area.

My aim will be to keep this trilogy up-to-date as best practice Data Functions change their shapes and contents.


 
If you would like help building or running your Data Function, or would just like to have an informal chat about the area, please get in touch
 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Fact-based Decision-making

All we need is fact-based decision-making ma'am
This article is about facts. Facts are sometimes less solid than we would like to think; sometimes they are downright malleable. To illustrate, consider the fact that in 98 episodes of Dragnet, Sergeant Joe Friday never uttered the words “Just the facts Ma’am”, though he did often employ the variant alluded to in the image above [1]. Equally, Rick never said “Play it again Sam” in Casablanca [2] and St. Paul never suggested that “money is the root of all evil” [3]. As Michael Caine never said in any film, “not a lot of people know that” [4].

 
Up-front Acknowledgements

These normally appear at the end of an article, but it seemed to make sense to start with them in this case:

Recently I published Building Momentum – How to begin becoming a Data-driven Organisation. In response to this, one of my associates, Olaf Penne, asked me about my thoughts on fact-base decision-making. This piece was prompted by both Olaf’s question and a recent article by my friend Neil Raden on his Silicon Angle blog, Performance management: Can you really manage what you measure? Thanks to both Olaf and Neil for the inspiration.

Fact-based decision making. It sounds good doesn’t it? Especially if you consider the alternatives: going on gut feel, doing what you did last time, guessing, not taking a decision at all. However – as is often the case with issues I deal with on this blog – fact-based decision-making is easier to say than it is to achieve. Here I will look to cover some of the obstacles and suggest a potential way to navigate round them. Let’s start however with some definitions.

Fact NOUN A thing that is known or proved to be true.
(Oxford Dictionaries)
Decision NOUN A conclusion or resolution reached after consideration.
(Oxford Dictionaries)

So one can infer that fact-based decision-making is the process of reaching a conclusion based on consideration of things that are known to be true. Again, it sounds great doesn’t it? It seems that all you have to do is to find things that are true. How hard can that be? Well actually quite hard as it happens. Let’s cover what can go wrong (note: this section is not intended to be exhaustive, links are provided to more in-depth articles where appropriate):


 
Accuracy of Data that is captured

Data Accuracy

A number of factors can play into the accuracy of data capture. Some systems (even in 2018) can still make it harder to capture good data than to ram in bad. Often an issue may also be a lack of master data definitions, so that similar data is labelled differently in different systems.

A more pernicious problem is combinatorial data accuracy, two data items are both valid, but not in combination with each other. However, often the biggest stumbling block is a human one, getting people to buy in to the idea that the care and attention they pay to data capture will pay dividends later in the process.

These and other areas are covered in greater detail in an older article, Using BI to drive improvements in data quality.
 
 
Honesty of Data that is captured

Honesty of Data

Data may be perfectly valid, but still not represent reality. Here I’ll let Neil Raden point out the central issue in his customary style:

People find the most ingenious ways to distort measurement systems to generate the numbers that are desired, not only NOT providing the desired behaviors, but often becoming more dysfunctional through the effort.

[…] voluntary compliance to the [US] tax code encourages a national obsession with “loopholes”, and what salesman hasn’t “sandbagged” a few deals for next quarter after she has met her quota for the current one?

Where there is a reward to be gained or a punishment to be avoided, by hitting certain numbers in a certain way, the creativeness of humans often comes to the fore. It is hard to account for such tweaking in measurement systems.
 
 
Timing issues with Data

Timing Issues

Timing is often problematic. For example, a transaction completed near the end of a period gets recorded in the next period instead, one early in a new period goes into the prior period, which is still open. There is also (as referenced by Neil in his comments above) the delayed booking of transactions in order to – with the nicest possible description – smooth revenues. It is not just hypothetical salespeople who do this of course. Entire organisations can make smoothing adjustments to their figures before publishing and deferral or expedition of obligations and earnings has become something of an art form in accounting circles. While no doubt most of this tweaking is done with the best intentions, it can compromise the fact-based approach that we are aiming for.
 
 
Reliability with which Data is moved around and consolidated

Data Transcription

In our modern architectures, replete with web-services, APIs, cloud-based components and the quasi-instantaneous transmission of new transactions, it is perhaps not surprising that occasionally some data gets lost in translation [5] along the way. That is before data starts to be Sqooped up into Data Lakes, or other such Data Repositories, and then otherwise manipulated in order to derive insight or provide regular information. All of these are processes which can introduce their own errors. Suffice it to say that transmission, collation and manipulation of data can all reduce its accuracy.

Again see Using BI to drive improvements in data quality for further details.
 
 
Pertinence and fidelity of metrics developed from Data

Data Metric

Here we get past issues with data itself (or how it is handled and moved around) and instead consider how it is used. Metrics are seldom reliant on just one data element, but are often rather combinations. The different elements might come in because a given metric is arithmetical in nature, e.g.

\text{Metric X} = \dfrac{\text{Data Item A}+\text{Data Item B}}{\text{Data Item C}}

Choices are made as to how to construct such compound metrics and how to relate them to actual business outcomes. For example:

\text{New Biz Growth} = \dfrac{(\text{Sales CYTD}-\text{Repeat CYTD})-(\text{Sales PYTD}-\text{Repeat PYTD})}{(\text{Sales PYTD}-\text{Repeat PYTD})}

Is this a good way to define New Business Growth? Are there any weaknesses in this definition, for example is it sensitive to any glitches in – say – the tagging of Repeat Business? Do we need to take account of pricing changes between Repeat Business this year and last year? Is New Business Growth something that is even worth tracking; what will we do as a result of understanding this?

The above is a somewhat simple metric, in a section of Using historical data to justify BI investments – Part I, I cover some actual Insurance industry metrics that build on each other and are a little more convoluted. The same article also considers how to – amongst other things – match revenue and outgoings when the latter are spread over time. There are often compromises to be made in defining metrics. Some of these are based on the data available. Some relate to inherent issues with what is being measured. In other cases, a metric may be a best approximation to some indication of business health; a proxy used because that indication is not directly measurable itself. In the last case, staff turnover may be a proxy for staff morale, but it does not directly measure how employees are feeling (a competitor might be poaching otherwise happy staff for example).
 
 
Robustness of extrapolations made from Data

By the third trimester, there will be hundreds of babies inside you...

© Randall Munroe, xkcd.com

I have used the above image before in these pages [6]. The situation it describes may seem farcical, but it is actually not too far away from some extrapolations I have seen in a business context. For example, a prediction of full-year sales may consist of this year’s figures for the first three quarters supplemented by prior year sales for the final quarter. While our metric may be better than nothing, there are some potential distortions related to such an approach:

  1. Repeat business may have fallen into Q4 last year, but was processed in Q3 this year. This shift in timing would lead to such business being double-counted in our year end estimate.
     
  2. Taking point 1 to one side, sales may be growing or contracting compared to the previous year. Using Q4 prior year as is would not reflect this.
     
  3. It is entirely feasible that some market event occurs this year ( for example the entrance or exit of a competitor, or the launch of a new competitor product) which would render prior year figures a poor guide.

Of course all of the above can be adjusted for, but such adjustments would be reliant on human judgement, making any projections similarly reliant on people’s opinions (which as Neil points out may be influenced, conciously or unconsciously, by self-interest). Where sales are based on conversions of prospects, the quantum of prospects might be a more useful predictor of Q4 sales. However here a historical conversion rate would need to be calculated (or conversion probabilities allocated by the salespeople involved) and we are back into essentially the same issues as catalogued above.

I explore some similar themes in a section of Data Visualisation – A Scientific Treatment
 
 
Integrity of statistical estimates based on Data

Statistical Data

Having spent 18 years working in various parts of the Insurance industry, statistical estimates being part of the standard set of metrics is pretty familiar to me [7]. However such estimates appear in a number of industries, sometimes explicitly, sometimes implicitly. A clear parallel would be credit risk in Retail Banking, but something as simple as an estimate of potentially delinquent debtors is an inherently statistical figure (albeit one that may not depend on the output of a statistical model).

The thing with statistical estimates is that they are never a single figure but a range. A model may for example spit out a figure like £12.4 million ± £0.5 million. Let’s unpack this.

Example distribution

Well the output of the model will probably be something analogous to the above image. Here a distribution has been fitted to the business event being modelled. The central point of this (the one most likely to occur according to the model) is £12.4 million. The model is not saying that £12.4 million is the answer, it is saying it is the central point of a range of potential figures. We typically next select a symmetrical range above and below the central figure such that we cover a high proportion of the possible outcomes for the figure being modelled; 95% of them is typical [8]. In the above example, the range extends plus £0. 5 million above £12.4 million and £0.5 million below it (hence the ± sign).

Of course the problem is then that Financial Reports (or indeed most Management Reports) are not set up to cope with plus or minus figures, so typically one of £12.4 million (the central prediction) or £11.9 million (the most conservative estimate [9]) is used. The fact that the number itself is uncertain can get lost along the way. By the time that people who need to take decisions based on such information are in the loop, the inherent uncertainty of the prediction may have disappeared. This can be problematic. Suppose a real result of £12.4 million sees an organisation breaking even, but one of £11.9 million sees a small loss being recorded. This could have quite an influence on what course of action managers adopt [10]; are they relaxed, or concerned?

Beyond the above, it is not exactly unheard of for statistical models to have glitches, sometimes quite big glitches [11].

This segment could easily expand into a series of articles itself. Hopefully I have covered enough to highlight that there may be some challenges in this area.
 
 
And so what?

The dashboard has been updated, how thrilling...

Even if we somehow avoid all of the above pitfalls, there remains one booby-trap that is likely to snare us, absent the necessary diligence. This was alluded to in the section about the definition of metrics:

Is New Business Growth something that is even worth tracking; what will we do as a result of understanding this?

Unless a reported figure, or output of a model, leads to action being taken, it is essentially useless. Facts that never lead to anyone doing anything are like lists learnt by rote at school and regurgitated on demand parrot-fashion; they demonstrate the mechanism of memory, but not that of understanding. As Neil puts it in his article:

[…] technology is never a solution to social problems, and interactions between human beings are inherently social. This is why performance management is a very complex discipline, not just the implementation of dashboard or scorecard technology.


 
How to Measure the Unmeasurable

Measuring the Unmeasurable

Our dream of fact-based decision-making seems to be crumbling to dust. Regular facts are subject to data quality issues, or manipulation by creative humans. As data is moved from system to system and repository to repository, the facts can sometimes acquire an “alt-” prefix. Timing issues and the design of metrics can also erode accuracy. Then there are many perils and pitfalls associated with simple extrapolation and less simple statistical models. Finally, any fact that manages to emerge from this gantlet [12] unscathed may then be totally ignored by those whose actions it is meant to guide. What can be done?

As happens elsewhere on this site, let me turn to another field for inspiration. Not for the first time, let’s consider what Science can teach us about dealing with such issues with facts. In a recent article [13] in my Maths & Science section, I examined the nature of Scientific Theory and – in particular – explored the imprecision inherent in the Scientific Method. Here is some of what I wrote:

It is part of the nature of scientific theories that (unlike their Mathematical namesakes) they are not “true” and indeed do not seek to be “true”. They are models that seek to describe reality, but which often fall short of this aim in certain circumstances. General Relativity matches observed facts to a greater degree than Newtonian Gravity, but this does not mean that General Relativity is “true”, there may be some other, more refined, theory that explains everything that General Relativity does, but which goes on to explain things that it does not. This new theory may match reality in cases where General Relativity does not. This is the essence of the Scientific Method, never satisfied, always seeking to expand or improve existing thought.

I think that the Scientific Method that has served humanity so well over the centuries is applicable to our business dilemma. In the same way that a Scientific Theory is never “true”, but instead useful for explaining observations and predicting the unobserved, business metrics should be judged less on their veracity (though it would be nice if they bore some relation to reality) and instead on how often they lead to the right action being taken and the wrong action being avoided. This is an argument for metrics to be simple to understand and tied to how decision-makers actually think, rather than some other more abstruse and theoretical definition.

A proxy metric is fine, so long as it yields the right result (and the right behaviour) more often than not. A metric with dubious data quality is still useful if it points in the right direction; if the compass needle is no more than a few degrees out. While of course steps that improve the accuracy of metrics are valuable and should be undertaken where cost-effective, at least equal attention should be paid to ensuring that – when the metric has been accessed and digested – something happens as a result. This latter goal is a long way from the arcana of data lineage and metric definition, it is instead the province of human psychology; something that the accomploished data professional should be adept at influencing.

I have touched on how to positively modify human behaviour in these pages a number of times before [14]. It is a subject that I will be coming back to again in coming months, so please watch this space.
 


Further reading on this subject:


 
Notes

 
[1]
 
According to Snopes, the phrase arose from a spoof of the series.
 
[2]
 
The two pertinent exchanges were instead:

Ilsa: Play it once, Sam. For old times’ sake.
Sam: I don’t know what you mean, Miss Ilsa.
Ilsa: Play it, Sam. Play “As Time Goes By”
Sam: Oh, I can’t remember it, Miss Ilsa. I’m a little rusty on it.
Ilsa: I’ll hum it for you. Da-dy-da-dy-da-dum, da-dy-da-dee-da-dum…
Ilsa: Sing it, Sam.

and

Rick: You know what I want to hear.
Sam: No, I don’t.
Rick: You played it for her, you can play it for me!
Sam: Well, I don’t think I can remember…
Rick: If she can stand it, I can! Play it!
 
[3]
 
Though he, or whoever may have written the first epistle to Timothy, might have condemned the “love of money”.
 
[4]
 
The origin of this was a Peter Sellers interview in which he impersonated Caine.
 
[5]
 
One of my Top Ten films.
 
[6]
 
Especially for all Business Analytics professionals out there (2009).
 
[7]
 
See in particular my trilogy:

  1. Using historical data to justify BI investments – Part I (2011)
  2. Using historical data to justify BI investments – Part II (2011)
  3. Using historical data to justify BI investments – Part III (2011)
 
[8]
 
Without getting into too many details, what you are typically doing is stating that there is a less than 5% chance that the measurements forming model input match the distribution due to a fluke; but this is not meant to be a primer on null hypotheses.
 
[9]
 
Of course, depending on context, £12.9 million could instead be the most conservative estimate.
 
[10]
 
This happens a lot in election polling. Candidate A may be estimated to be 3 points ahead of Candidate B, but with an error margin of 5 points, it should be no real surprise when Candidate B wins the ballot.
 
[11]
 
Try googling Nobel Laureates Myron Scholes and Robert Merton and then look for references to Long-term Capital Management.
 
[12]
 
Yes I meant “gantlet” that is the word in the original phrase, not “gauntlet” and so connections with gloves are wide of the mark.
 
[13]
 
Finches, Feathers and Apples (2018).
 
[14]
 
For example:

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

In-depth with CDO Jo Coutuer

In-depth with Jo Coutuer


Part of the In-depth series of interviews

PJT Today’s guest on In-depth is Jo Coutuer, Chief Data Officer and Member of the Executive Committee of BNP Paribas Fortis, a leading Belgian bank. Given the importance of the CDO role in Financial Services, I am very happy that Jo has managed to spare us some of his valuable time to talk.
PJT Jo, you have had an interesting career in a variety of organisations from consultancies to start-ups, from government to major companies. Can you give readers a pen-picture of the journey that has taken you to your current role?
JC For me, the variety of contexts has been the most rewarding. I started in an industry that has now sharply declined in Europe (Telco Manufacturing), continued in the consulting world of ERP tools, switched into a very interesting job for the government, became an entrepreneur and co-created a data company for 13 years, merged that data company into a big 4 consultancy and finally decided to apply my life’s learnings to the fascinating industry of banking. The most remarkable aspect of my career is the fact that my current role and the attention to data that goes with it, did not exist when I started my career. It illustrates how young people today can also build a future, without really knowing what lies ahead. All it takes is the mental flexibility to switch contexts when it is needed.
PJT At present – at least in Europe, and maybe further afield – there is no standard definition of a CDO’s role. Can you tell me a bit about the scope of your work at BNP Paribas Fortis? Are you most focussed on compliance, leverage of data, or a balance of both activities?
JC At BNP Paribas Fortis, the CEO and his executive committee made a courageous decision back in 2016 to create a specific department dedicated to Data. The move was courageous, not only because it defined a new leadership role and a budget, but also because it settled a debate between the businesses and the IT function. At the time of creation of the department, it was decided to carve out of IT the traditional function of “business intelligence and data warehousing” and to establish a central competence centre for “analytics and artificial intelligence“, which before was mostly scattered or non-existing. On top of that, the new department was tasked to assume the regulatory duties that relate to data. More and more, banking regulation focusses on reliable reporting, traceable data flows, systematic data quality measurement and well documented metadata, all embedded in a solid organisational governance. So yes, I would say our Data department is both “defensive” as well as “offensive”. As a CDO, I am privileged to be able to work with experts and leaders in the fields of regulation, data warehousing expertise and data science innovation. Without them, the breadth of the scope and the required depth, would not be manageable.
PJT Do you collaborate with other Executives in the data arena, or is the CDO primus inter pares when it comes to data matters?
JC I would not speak of a hierarchical order when it comes to data. It helps to distinguish three identities of a Data department.

The first one is the identity of the “Governor”. In that identity, peers accept that the CDO translates external duties into internal best practices, as long as this happens in a co-creation mode. We have established a “College of Data Managers”, who are 13 senior managers, representing each a specific “data perimeter”, which in its turn rather well maps to our fields of business or our internal functions. These senior managers intimately link the Data activities to the day-to-day business functions and their respective executives.

A second identity is that of the “Expert”. In that identity, we offer expertise in fields of data integration, data warehousing, reporting, visualisation, data science, … It means that I see my fellow executives as clients and partners and the Data department helps them achieve their business objectives. Mentally (and sometimes practically), we measure up to external professional services or IT companies.

A third identity is that of the “Integrator”. As an integrator, we actively make the link between the business of today, the technological and data potential of today and the business of tomorrow. We actively try to question existing practices and we introduce new concepts for a variety of business applications. And although we are more driving in this role than we are in the role of the “Expert”, we still are fully at the service of our clients.

PJT More generally, how do you see the CDO role changing in coming years, what would 2020’s CDO be doing? Will we even need CDOs in 2020?
JC Ahah! One of the most frequently asked questions on CDO related social media! If previous two years are any predictor of the future, I would say that the CDO of 2020 is one who has solidly matured the governance aspects of Data, just like the CFO and CRO have done that for financial management or risk management. Let’s say that Data has become “routine”.

At the same time, the 2020 CDO will need to offer to his peers, the technical and expert capabilities that are data centric and essential to running a digital business.

And on top of that, I believe that 2020 will be the timeframe in which data valorisation will become an active topic. I explicitly do not use the word “monetisation” because we currently associate data to often with “selling data for advertising purposes”. In our industry, PSD2 [1] will define our duties to be able to exchange data with third party service providers, at the explicit request of our clients. From that new reality, an API-driven ecosystem will surface in which data will be actively valorised, to the direct service of our clients, not to the indirect service of our marketing departments. The 2020 CDO will be instrumental in shaping his or her company’s ecosystem to make sure this happens in a well governed, trusted and safe way. Clients will seek that reassurance and will reward companies who take data management seriously.

PJT Of course, senior roles tend to exist because they add value to their organisations, what do you feel is the value that a CDO brings to the table?
JC I have already mentioned the CDO’s challenge to be schizophrenic ally split between his or her various identities. But it is exactly that breadth of scope that can add value. The CDO should be an “executive integrator”. He can employ “governors” and “experts”, but his or her role in the peer team of executives is to represent the transversality of data’s nature. Data “flows”, data “unites”. More than it is “oil”, data is “water”. It flows through the company’s ecosystem and it nourishes the business and the future business potential. As such, the CDO needs to keep the water clean and make sure it gets pumped across the organisation, so that others can benefit from the nutrients it. And while doing so, the CDO has a duty to add nutrients to the water, in the form of analytical or artificial intelligence induced insights.
PJT Focussing on Analytics, I know you have written about how to build the ideal Analytics team and have mentioned that “purple people” are the key. Can you explain more about this?
JC Purple people are people that integrate the skills of “red” people and “blue” people. Red people bring the scientific data methodologies to the table. Blue people bring the solid frameworks of the business. Data people as individuals and a Data department as an entity, must have as a mission to be “purple” and to actively bridge the gap between the fast growing set of data technologies and methodologies on the one hand and the rapidly evolving and transforming business challenges on the other hand. And of course, if you like Prince [2] as a musician, that can be an asset too!
PJT In my discussions with other CDOs [3] and indeed in my own experience, it seems that teamwork is crucial for a CDO. Of course, this is important for many senior roles, but it does seem central to what a CDO does. My perspective is that both a CDO’s own team and the virtual teams that he or she forms with colleagues are going to have a big say in whether things go well or not. What are your views on this topic?
JC You are absolutely right. A CDO or data function cannot exist in isolation. At some times, transversality feels a burden because it imposes a daily attention to stakeholders. However, in reality, it’s exactly the transversal effect that can generate the added value to an organisation. At the end of the day, the integration aspects between departments and people will generate positive side effects, above and beyond the techniques of data management.
PJT Artificial Intelligence in its various guises has been the topic of conversation recently. This is something with strong linkage to the data field. Obviously without divulging any commercial secrets, what role do you see AI playing in banking going forwards? What about in our lives in general?
JC It’s funny that AI is being discovered as a new topic. I remember writing my Master thesis on the topic a long time ago. Of course, things have evolved since the 90s, with a storage and computing capacity that is approximately 50,000 times stronger for the same price point. This capacity explosion, combined with the connectivity of the internet and the cloud, combined with the increased awareness that data and algorithms have become central elements in a many business strategies, has fundamentally re-calibrated the potential of AI.

In banking, AI and Analytics will soon help clients understand their finances better, will help them to take better and faster decisions, will generate a better (less friction) client experience for “the easy stuff” and it will allow the banks to put humans on “the hard stuff” or on those interactions with their clients that require true human interaction. Behind the scenes, Analytics and AI are already helping to prevent fraud, monitoring suspicious transactions to detect crime, money laundering and fraud. And even deeper inside the mechanics of a bank, Analytics and AI are helping prevent cyber-crimes and are monitoring the stability of the technological platforms onto which our modern financial and societal system is built.

I am convinced that the societal role of banks will continue to exists, despite innovative peer-to-peer or blockchain driven schemes. As such, Analytics and AI will contribute to society as a whole, through their contribution to a reliable and stable financial services system.

PJT With GDPR [4] coming into force only a couple of months ago, the subject of customer data and how it is used is a topical one. Taking BNP Paribas Fortis to one side, what are your thoughts on the balance between data privacy and the “free” services that we all pay for by allowing our data to be sold?
JC I believe that GDPR is both important legislation and brings benefits to customers. First of all, we have good historical reasons to care about our privacy. In times of societal crises or wars, it is the first weapon that is used against society and its citizens. So we should care for it deeply. Second, being in an industry for which “trust” is the most essential element of identity, protecting and respecting the data and the privacy of clients is a natural reflex. And putting the banking question aside for a moment, we should continue to educate aggressively about the fact that services never come for free. As long as consumers are well informed that they pay for their convenience with their data, there is no fundamental concern. But because there is still no real “paid” economy surfacing, the consumer does not really have a choice between “pay-for-service” or “give-data-for-service”. I believe that the market potential for paid services, that guarantee non-exploitation of personal data, is quietly growing. And when it finally appears, consumers will start making choices. Personally, I admit to having moved from being on all possible digital channels and tools, towards being much more selective. And I must admit that digital life with a privacy aware mind is still possible and still fun.
PJT It seems to me that a key capability of a CDO is as an influencer. Influence can take many shapes, from being an acknowledged expert in an area, to the softer skills of being someone that others can talk to openly. Do you agree about this observation? If so, how do you seek to be an influencer?
JC It’s a thin line to walk and it depends on the type of CDO that you are and the mandate that you have. If you have a mandate to do “governance only”, then you should have the confidence of delivering on your mandate, just like a CRO or a CFO does. For that I always revert to the phrase: “we agreed that data is a valuable asset, just like money or people or buildings, … so let’s then act like it.” If you have mandate to “change”, to “create value”, then you have to be an integrator and influencer because you can never change an organisation and its people on your own.
PJT Before letting you go, a quick personal question. I know you spent some time at the University of Cambridge. I lived in this town while my wife was working on her PhD. Like Cambridge, Leuven [5] is a historic town just outside of a major capital city. What parallels do you see between the two and what did you think of the locals?
JC Cambridge is famous for its “punts”, Leuven for its Stella Artois “pints”. And both central churches (or chapels) are home to iconic paintings by Flemish masters, Rubens in Cambridge and Bouts in Leuven. Visit both!
PJT Jo, thank you so much for talking to me and giving readers the benefit of your ideas and experience.

Jo Coutuer can be reached at via his LinkedIn profile.


Disclosure: At the time of publication, neither peterjamesthomas.com Ltd. nor any of its Directors had any shared commercial interests with Jo Coutuer, BNP Paribas Fortis or any entities associated with either of these.


If you are a Chief Data Officer, a Chief Analytics Officer, a Director of Data, or hold some other “Top Data Job” and would like to share your thoughts with the readers of this site in an interview like this one, please get in contact.

 
Notes

 
[1]
 
Payment Services Directive 2.
 
[2]
 
Prince Rogers Nelson.
 
[3]
 
Two recent examples include:

 
[4]
 
General Data Protection Regulation.
 
[5]
 
Leuven.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Offence, Defence and the Top Data Job

Offence and Defence - 2018 World Cup

Football [1] has been in the news rather a lot of late; apparently there is some competition or other going on in Russia [2]. Presumably it was this that brought to my mind the analogy sometimes applied to the data arena of offence and defence [3]. Defence brings to mind Data Governance, Master Data Management and Data Quality. Offence suggests Data Science, Machine Learning and Analytics. This is an analogy I have briefly touched on in these pages before [4]; here I want to expand on it.

Rather than Association Football, it was however the American version that first crossed my mind. In Gridiron, there are of course wholly separate teams for each of offence, defence, kicking and receiving, each filled with specialists. I would be happy to learn from readers about any counterexamples, but I struggle to think of any other sport that is like this [5]. In each of Association Football, both types of Rugby, Australian Rules Football and indeed Basketball, Baseball (see previous note [5]) Volleyball, Hockey, Ice Hockey, Lacrosse, Polo, Water Polo and Handball, the same players form both the offence and defence. Of course this is probably due to them being a bit less stop-start than American Football, offence can turn into defence in a split-second in some of them.

To stick with Football (I’m going to drop “Association” from here on in), while players may be designated as goalkeepers, defenders, mid-fielders, wingers and attackers (strikers), any player may be called on to defend or attack at any time [6]. Star strikers may need to make desperate tackles. Defenders (who tend to be taller) will be called up to try to turn corner kicks into goals. Even at the most basic level, the ball needs to be transferred from one end of the field to the other, which requires (absent the Goalkeeper simply taking what is known as route one – i.e. kicking it as far as they can towards the other goal) several players to pass the ball, control it and pass again. The whole team contributes.

I have written before about the nomenclature maze that often surrounds the Top Data Job [7] (see Further Reading at the end of the article). In some organisations the offence and defence aspects of the data arena are separate, in the sense that both are headed by someone who then reports into a non-data-specialist. For example a Chief Data Officer and a Chief Analytics Officer might both report to a Chief Operating Officer. This feels a bit like the American Football approach; separate teams to do separate things. I’m probably stretching the metaphor [8], but a problem that occurs to me is that – in business – the data offence and data defence teams will need to be on the field of play at the same time. Aren’t they going to get in each other’s way and end up duplicating activities? At the very least, they are going to need some robust rules about who does what and for these to be made very clear to the players. Also, ultimately, while both offence and defence teams in Gridiron will have their own coaches, these will report to a Head Coach; someone who presumably knows just a bit about American Football. I can’t think of any instances where an NFL team has no Head Coach and instead the next tier of staff all report to the owner.

Of course having multiple senior data roles reporting into different parts of the Executive may be fine and many organisations operate this way. However, again coming back to my sporting analogy, I prefer the approach adopted by Football, Rugby, Basketball and the rest. I like the idea of a single, cohesive Data Function, led by someone who is a data specialist, no matter what their job title might me. In most sports what seems to work well is a team in which people have roles, but in which there is cross-over and a need to just get done. I think this works for people involved in data work as well.

You wouldn’t have the Head of Tax and the Head of Financial Reporting both reporting to the CEO, that’s what CFOs are for (among other things). It should be the same in the data arena with the Top Data Job being just that, the one person ultimately accountable for both the control and leverage of data. I have made no secret of my opinion that this is the optimum approach. I think my view is supported by the overwhelming number of sports where offence and defence are functions of the same, cohesive team.
 


Further reading on this subject:


 
Notes

 
[1]
 
Association of course.
 
[2]
 
My winter team sport was always Rugby Football, of the Union variety. But – as is evident from quite a few articles on this site – for many years my spare time was mostly occupied by rock climbing and bouldering.

The day after England’s defeat at the hands of Croatia, the Polish guy I regularly buy my skinny flat white from offered his commiserations about yesterday. I was at a loss as to what he had done to me yesterday and he had to explain that he was referring to the World Cup. Not all Brit’s are Football fanatics.

 
[3]
 
Offense and defense for my wife and any other Americans reading.
 
[4]
 
This was as part of Alphabet Soup.
 
[5]
 
The only thing I could think of that was even in the same ballpark (pun intended) was the use of a designated hitter in some baseball leagues. Even then, the majority of the team have to field as well as bat.
 
[6]
 
There are indeed examples of Goalkeepers, the quintessential defensive player, scoring in International Football.
 
[7]
 
With acknowledgement to Peter Aiken.
 
[8]
 
For neither the first time, nor the last: e.g. A bad workman blames his [Business Intelligence] tools and Analogies.

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Building Momentum – How to begin becoming a Data-driven Organisation

Building Momentum - Becoming a Data Driven Organisation

Larger, annotated PDF version (opens in a new tab)

Introduction

It is hard to find an organisation that does not aspire to being data-driven these days. While there is undoubtedly an element of me-tooism about some of these statements (or a fear of competitors / new entrants who may use their data better, gaining a competitive advantage), often there is a clear case for the better leverage of data assets. This may be to do with the stand-alone benefits of such an approach (enhanced understanding of customers, competitors, products / services etc. [1]), or as a keystone supporting a broader digital transformation.

However, in my experience, many organisations have much less mature ideas about how to achieve their data goals than they do about setting them. Given the lack of executive experience in data matters [2], it is not atypical that one of the large strategy consultants is engaged to shape a data strategy; one of the large management consultants is engaged to turn this into something executable and maybe to select some suitable technologies; and one of the large systems integrators (or increasingly off-shore organisations migrating up the food chain) is engaged to do the work, which by this stage normally relates to building technology capabilities, implementing a new architecture or some other technology-focussed programme.

Juggling Third Parties

Even if each of these partners does a great job – which one would hope they do at their price points – a few things invariably get lost along the way. These include:

  1. A data strategy that is closely coupled to the organisation’s actual needs rather than something more general.

    While there are undoubtedly benefits in adopting best practice for an industry, there is also something to be said for a more tailored approach, tied to business imperatives and which may have the possibility to define the new best practice. In some areas of business, it makes sense to take the tried and tested approach, to be a part of the herd. In others – and data is in my opinion one of these – taking a more innovative and distinctive path is more likely to lead to success.
     

  2. Connective tissue between strategy and execution.

    The distinctions between the three types of organisations I cite above are becoming more blurry (not least as each seeks to develop new revenue streams). This can lead to the strategy consultants developing plans, which get ripped up by the management consultants; the management consultants revisiting the initial strategy; the systems integrators / off-shorers replanning, or opening up technical and architecture discussions again. Of course this means the client paying at least twice for this type of work. What also disappears is the type of accountability that comes when the same people are responsible for developing a strategy, turning this into a practical plan and then executing this [3].
     

  3. Focus on the cultural aspects of becoming more data-driven.

    This is both one of the most important factors that determines success or failure [4] and something that – frankly because it is not easy to do – often falls by the wayside. By the time that the third external firm has been on-boarded, the name of the game is generally building something (e.g. a Data Lake, or an analytics platform) rather than the more human questions of who will use this, in what way, to achieve which business objectives.

Of course a way to address the above is to allocate some experienced people (internal or external, ideally probably a blend) who stay the course from development of data strategy through fleshing this out to execution and who – importantly – can also take a lead role in driving the necessary cultural change. It also makes sense to think about engaging organisations who are small enough to tailor their approach to your needs and who will not force a “cookie cutter” approach. I have written extensively about how – with the benefit of such people on board – to run such a data transformation programme [5]. Here I am going to focus on just one phase of such a programme and often the most important one; getting going and building momentum.


 
A Third Way

There are a couple of schools of thought here:

  1. Focus on laying solid data foundations and thus build data capabilities that are robust and will stand the test of time.
     
  2. Focus on delivering something ASAP in the data arena, which will build the case for further investment.

There are points in favour of both approaches and criticisms that can be made of each as well. For example, while the first approach will be necessary at some point (and indeed at a relatively early one) in order to sustain a transformation to a data-driven organisation, it obviously takes time and effort. Exclusive focus on this area can use up money, political capital and try the patience of sponsors. Few business initiatives will be funded for years if they do not begin to have at least some return relatively soon. This remains the case even if the benefits down the line are potentially great.

Equally, the second approach can seem very productive at first, but will generally end up trying to make a silk purse out of a sow’s ear [6]. Inevitably, without improvements to the underlying data landscape, limitations in the type of useful analytics that be carried out will be reached; sometimes sooner that might be thought. While I don’t generally refer to religious topics on this blog [7], the Parable of the Sower is apposite here. Focussing on delivering analytics without attending to the broader data landscape is indeed like the seed that fell on stony ground. The practice yields results that spring up, only to wilt when the sun gets hot, given that they have no real roots [8].

So what to do? Well, there is a Third Way. This involves blending both approaches. I tend to think of this in the following way:

Proportion of Point and Strategic Data Activities over Time

First of all, this is a cartoon, it is not intended to indicate actual percentages, just to illustrate a general trend. In real life, it is likely that you will cycle round multiple times and indeed have different parallel work-streams at different stages. The general points I am trying to convey with this diagram are:

  1. At the beginning of a data transformation programme, there should probably be more emphasis on interim delivery and tactical changes. However, imoportantly, there is never zero strategic work. As things progress, the emphasis should swing more to strategic, long-term work. But again, even in a mature programme, there is never zero tactical work. There can also of course be several iterations of such shifts in approach.
     
  2. Interim and tactical steps should relate to not just analytics, but also to making point fixes to the data landscape where possible. It is also important to kick off diagnostic work, which will establish how bad things are and also suggest areas which could be attacked sooner rather than later; this too can initially be done on a tactical basis and then made more robust later. In general, if you consider the span of strategic data work, it makes sense to kick off cut-down (and maybe drastically cut-down) versions of many activities early on.
     
  3. Importantly, the tactical and strategic work-streams should not be hermetically sealed. What you actually want is healthy interplay. Building some early, “quick and dirty” analytics may highlight areas that should be covered by a data audit, or where there are obvious weaknesses in a data architecture. Any data assets that are built on a more strategic basis should also be leveraged by tactical work, improving its utility and probably increasing its lifespan.

 
Interconnected Activities

At the beginning of this article, I present a diagram (repeated below) which covers three types of initial data activities, the sort of work that – if executed competently – can begin to generate momentum for a data programme. The exhibit also references Data Strategy.

Building Momentum - Becoming a Data Driven Organisation

Larger, annotated PDF version (opens in a new tab)

Let’s look at each of these four things in some more detail:

  1. Analytic Point Solutions

    Where data has historically been locked up in either hard-to-use repositories or in source systems themselves, liberating even a bit of it can be very helpful. This does not have to be with snazzy tools (unless you want to showcase the art of the possible). An anecdote might help to explain.

    At one organisation, they had existing reporting that was actually not horrendous, but it was hard to access, hard to parameterise and hard to do follow-on analysis on. I took it upon myself to run 30 plus reports on a weekly and monthly basis, download the contents to Excel, front these with some basic graphs and make these all available on an intranet. This meant that people from Country A or Department B could go straight to their figures rather than having to run fiddly reports. It also meant that they had an immediate visual overview – including some comparisons to prior periods and trends over time (which were not available in the original reports). Importantly, they also got a basic pivot table, which they could use to further examine what was going on. These simple steps (if a bit laborious for me) had a massive impact. I later replaced the Excel with pages I wrote in a new web-reporting tool we built in house. Ultimately, my team moved these to our strategic Analytics platform.

    This shows how point solutions can be very valuable and also morph into more strategic facilities over time.
     

  2. Data Process Improvements

    Data issues may be to do with a range of problems from poor validation in systems, to bad data integration, but immature data processes and insufficient education for data entry staff are often key conributors to overall problems. Identifying such issues and quantifying their impact should be the province of a Data Audit, which is something I would recommend considering early on in a data programme. Once more this can be basic at first, considering just superficial issues, and then expand over time.

    While fixing some data process problems and making a stepped change in data quality will both probably take time an effort, it may be possible to identify and target some narrower areas in which progress can be made quite quickly. It may be that one key attribute necessary for analysis is poorly entered and validated. Some good communications around this problem can help, better guidance for people entering it is also useful and some “quick and dirty” reporting highlighting problems and – hopefully – tracking improvement can make a difference quicker than you might expect [9].
     

  3. Data Architecture Enhancements

    Improving a Data Architecture sounds like a multi-year task and indeed it can often be just that. However, it may be that there are some areas where judicious application of limited resource and funds can make a difference early on. A team engaged in a data programme should seek out such opportunities and expect to devote time and attention to them in parallel with other work. Architectural improvements would be best coordinated with data process improvements where feasible.

    An example might be providing a web-based tool to look up valid codes for entry into a system. Of course it would be a lot better to embed this functionality in the system itself, but it may take many months to include this in a change schedule whereas the tool could be made available quickly. I have had some success with extending such a tool to allow users to build their own hierarchies, which can then be reflected in either point analytics solutions or more strategic offerings. It may be possible to later offer the tool’s functionality via web-services allowing it to be integrated into more than one system.
     

  4. Data Strategy

    I have written extensively about Data Strategy on this site [10]. What I wanted to cover here is the interplay between Data Strategy and some of the other areas I have just covered. It might be thought that Data Strategy is both carved on tablets of stone [11] and stands in splendid and theoretical isolation, but this should not ever be the case. The development of a Data Strategy should of course be informed by a situational analysis and a vision of “what good looks like” for an organisation. However, both of these things can be shaped by early tactical work. Taking cues from initial tactical work should lead to a more pragmatic strategy, more aligned to business realities.

    Work in each of the three areas itemised above can play an important role in shaping a Data Strategy and – as the Data Strategy matures – it can obviously guide interim work as well. This should be an iterative process with lots of feedback.


 
Closing Thoughts

I have captured the essence of these thoughts in the diagram above. The important things to take away are that in order to generate momentum, you need to start to do some stuff; to extend the physical metaphor, you have to start pushing. However, momentum is a vector quantity (it has a direction as well as a magnitude [12]) and building momentum is not a lot of use unless it is in the general direction in which you want to move; so push with some care and judgement. It is also useful to realise that – so long as your broad direction is OK – you can make refinements to your direction as you pick up speed.

The above thoughts are based on my experience in a range of organisations and I am confident that they can be applied anywhere, making allowance for local cultures of course. Once momentum is established, it still needs to be maintained (or indeed increased), but I find that getting the ball moving in the first place often presents the greatest challenge. My hope is that the framework I present here can help data practitioners to get over this initial hurdle and begin to really make a difference in their organisations.
 


Further reading on this subject:


 
Notes

 
[1]
 
Way back in 2009, I wrote about the benefits of leveraging data to provide enhanced information. The article in question was tited Measuring the benefits of Business Intelligence. Everything I mention remains valid today in 2018.
 
[2]
 
See also:

 
[3]
 
If I many be allowed to blow my own trumpet for a moment, I have developed data / information strategies for eight organisations, turned seven of these into a costed / planned programme and executed at least the first few phases of six of these. I have always found being a consistent presence through these phases has been beneficial to the organisations I was helping, as well as helping to reduce duplication of work.
 
[4]
 
See my, now rather venerable, trilogy about cultural change in data / information programmes:

  1. Marketing Change
  2. Education and cultural transformation and
  3. Sustaining Cultural Change

together with the rather more recent:

  1. 20 Risks that Beset Data Programmes and
  2. Ever tried? Ever failed?
 
[5]
 
See for example:

  1. Draining the Swamp
  2. Bumps in the Road and
  3. Ideas for avoiding Big Data failures and for dealing with them if they happen
 
[6]
 
Dictionary.com offers a nice explanation of this phrase..
 
[7]
 
I was raised a Catholic, but have been areligious for many years.
 
[8]
 
Much like x^2+x+1=0.

For anyone interested, the two roots of this polynomial are clearly:

-\dfrac{1}{2}+\dfrac{\sqrt{3}}{2}\hspace{1mm}i\hspace{5mm}\text{and}\hspace{5mm}-\dfrac{1}{2}-\dfrac{\sqrt{3}}{2}\hspace{1mm}i

neither of which is Real.

 
[9]
 
See my rather venerable article, Using BI to drive improvements in data quality, for a fuller treatment of this area.
 
[10]
 
For starters see:

  1. Forming an Information Strategy: Part I – General Strategy
  2. Forming an Information Strategy: Part II – Situational Analysis
  3. Forming an Information Strategy: Part III – Completing the Strategy

and also the Data Strategy segment of The Anatomy of a Data Function – Part I.

 
[11]
 
Tablet of Stone
 
[12]
 
See Glimpses of Symmetry, Chapter 15 – It’s Space Jim….

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Link directly to entries in the Data and Analytics Dictionary

The Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary has always had internal tags (anchors for those old enough to recall their HTML) which allowed me, as its author, to link to individual entries from other web-pages I write. An example of the use of these is my article, A Brief History of Databases.

I have now made these tags public. Each entry in the Dictionary is followed by the full tag address in a box. This is accompanied by a link icon as follows:

Data Dictionary excerpt

Clicking on the link icon will copy the tag address to your clipboard. Alternatively the tag URL may just be copied from the box containing it directly. You can then use this address in your own article to link back to the D&AD entry.

As with the vast majority of my work, the contents of the Data and Analytics Dictionary is covered by a Creative Commons Attribution 4.0 International Licence. This means you can include my text or images in your own web-pages, presentations, Word documents etc. You can even modify my work, so long as you point out that you have done this.

If you would like to link back to the Data and Analytics Dictionary to provide definitions of terms that you are using, this should now be very easy. For example:

Lorem ipsum dolor sit amet, consectetur adipiscing Big Data elit. Duis tempus nisi sit amet libero vehicula Data Lake, sed tempor leo consectetur. Pellentesque suscipit sed felisData Governance ac mattis. Fusce mattis luctus posuere. Duis a Spark mattis velit. In scelerisque massa ac turpis viverra, acLogistic Regression pretium neque condimentum.

Equally, I’d be delighted if you wanted to include part of all of the text of an entry in the Data and Analytics Dictionary in your own work, commercial or personal; a link back using this new functionality would be very much appreciated.

I hope that this new functionality will be useful. An update to the Dictionary’s contents will be published in the next couple of months.
 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Sic Transit Gloria Magnorum Datorum

Sic transit gloria mundi

It happens to all of us eventually I suppose.

Just the other day, I heard someone referring to “traditional Big Data”. Since when did Big Data become “traditional”, I didn’t get the e-mail? Of course, in the technology field, the epithet “traditional” is code for “broken”, “no longer of any use” and – most damningly of all – “deeply uncool”. The term is widely used, whether – with this connotation – it is either helpful or accurate is perhaps a matter for debate. This usage makes me recall the rather silly debate about Analytics versus “traditional” Business Intelligence that occurred around 2009 [1].

By way of context, the person talking about “traditional Big Data” was referring to the difference between some of the original denizens of the Hadoop ecosystem and more recent offerings like Databricks or Beam. They also had in mind the various quasi-proprietary flavours of Big Data and/or Big Data plug-ins offered by (that word again) “traditional” vendors. In this sense, the usage is probably appropriate, albeit somewhat jarring. In the more pejorative sense I refer to above, “traditional” is somewhat misleading when applied to either Big Data or – in the author’s opinion – several of its precursors.

Shiny!

While we inhabit a world which places a premium on innovation, favouring the new and the shiny [2], traditional methods have much to offer. If something – a technique or technology – has achieved “traditional” status, it means that it has become part of how things are done. While shaking up the status quo can be beneficial, “traditional” approaches have the not insignificant benefit of having been tried and tested. “Traditional” data tools are ones that have survived some time and are still used. While not guaranteeing success, it should at least be possible to be successful with such tools because other people have done this before.

Maybe, several years after its move into the mainstream, Big Data has become “traditional”. However I would take this as meaning “fit for purpose”, “useful” and “still pretty cool”. Then I think the same about many of the technologies that were described as “traditional” in contrast to Big Data. As ever, the main things that lead to either success or failure in data-centric work [3] have very little to do with technology, be that traditional or à la mode.
 


 
Notes

 
[1]
 
If you have the stomach for it, see Business Analytics vs Business Intelligence and succeeding articles.
 
[2]
 
See also 2009’s The latest and greatest versus the valuable.
 
[3]
 
I itemise a few of these in last year’s 20 Risks that Beset Data Programmes.

 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

A Brief History of Databases

A Brief History of Databases

Larger PDF version (opens in a new tab)

The pace of change in the field of database technology seems to be constantly accelerating. No doubt in five year’s time [1], Big Data and the Hadoop suite [2] will seem to be as old-fashioned as earlier technologies can appear to some people nowadays. Today there is a great variety of database technologies that are in use in different organisations for different purposes. There are also a lot of vendors, some of whom have more than one type of database product. I think that it is worthwhile considering both the genesis of databases and some of the major developments that have occurred between then and now.

The infographic appearing at the start of this article seeks to provide just such a perspective. It presents an abridged and simplified perspective on the history of databases from the 1960s to the late 2010s. It is hard to make out the text in the above diagram, so I would recommend that readers click on the link provided in order to view a much larger version with bigger and more legible text.

The infographic references a number of terms. Below I provide links to definitions of several of these, which are taken from The Data and Analytics Dictionary. The list progresses from the top of the diagram downwards, but starts with a definition of “database” itself:

To my mind, it is interesting to see just how long we have been grappling with the best way to set up databases. Also of note is that some of the Big Data technologies are actually relatively venerable, dating to the mid-to-late 2000s (some elements are even older, consisting of techniques for handling flat files on UNIX or Mainframe computers back in the day).

I hope that both the infographic and the definitions provided above contribute to the understanding of the history of databases and also that they help to elucidate the different types of database that are available to organisations today.
 


 
Acknowledgements

The following people’s input is acknowledged on the document itself, but my thanks are also repeated here:

Of course any errors and omissions remain the responsibility of the author.


 
Notes

 
[1]
 
If not significantly before then.
 
[2]
 
One of J K Rowling’s lesser-known works.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary and The Anatomy of a Data Function

 

A further extension of the Data and Analytics Dictionary

The Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary is an active document and I will continue to issue revised versions of it periodically. A larger update is in the works, but for now here are a dozen new definitions:

  1. Binary
  2. Business Analyst
  3. Chief Analytics Officer (CAO)
  4. Data
  5. Data Analyst
  6. Data Business Analyst
  7. Data Marketplace
  8. Data Steward
  9. Digital
  10. End User Computing (EUC)
  11. Information
  12. Web Analytics

As previously stated, ideas for what to include next would be more than welcome (any suggestions used will also be acknowledged).
 


 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

A Retrospective of 2017’s Articles

A Review of 2017

This article was originally intended for publication late in the year it reviews, but, as they [1] say, the best-laid schemes o’ mice an’ men gang aft agley…

In 2017 I wrote more articles [2] than in any year since 2009, which was the first full year of this site’s existence. Some were viewed by thousands of people, others received less attention. Here I am going to ignore the metric of popular acclaim and instead highlight a few of the articles that I enjoyed writing most, or sometimes re-reading a few months later [3]. Given the breadth of subject matter that appears on peterjamesthomas.com, I have split this retrospective into six areas, which are presented in decreasing order of the number of 2017 articles I wrote in each. These are as follows:

  1. General Data Articles
  2. Data Visualisation
  3. Statistics & Data Science
  4. CDO perspectives
  5. Programme Advice
  6. Analytics & Big Data

In each category, I will pick out two or three of pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.

 
 
General Data Articles
 
The Data & Analytics Dictionary
 
August
The Data and Analytics Dictionary
My attempt to navigate the maze of data and analytics terminology. Everything from Algorithm to Web Analytics.
 
The Anatomy of a Data Function
 
November & December
The Anatomy of a Data Function: Part I, Part II and Part III
Three articles focussed on the structure and components of a modern Data Function and how its components interact with both each other and the wider organisation in order to support business goals.
 
 
Data Visualisation
 
Nucleosynthesis and Data Visualisation
 
January
Nucleosynthesis and Data Visualisation
How one of the most famous scientific data visualisations, the Periodic Table, has been repurposed to explain where the atoms we are all made of come from via the processes of nucleosynthesis.
 
Hurricanes and Data Visualisation
 
September & October
Hurricanes and Data Visualisation: Part I – Rainbow’s Gravity and Part II – Map Reading
Two articles on how Data Visualisation is used in Meteorology. Part I provides a worked example illustrating some of the problems that can arise when adopting a rainbow colour palette in data visualisation. Part II grapples with hurricane prediction and covers some issues with data visualisations that are intended to convey safety information to the public.
 
 
Statistics & Data Science
 
Toast
 
February
Toast
What links Climate Change, the Manhattan Project, Brexit and Toast? How do these relate to the public’s trust in Science? What does this mean for Data Scientists?
Answers provided by Nature, The University of Cambridge and the author.
 
How to be Surprisingly Popular
 
February
How to be Surprisingly Popular
The wisdom of the crowd relies upon essentially democratic polling of a large number of respondents; an approach that has several shortcomings, not least the lack of weight attached to people with specialist knowledge. The Surprisingly Popular algorithm addresses these shortcomings and so far has out-performed existing techniques in a range of studies.
 
A Nobel Laureate’s views on creating Meaning from Data
 
October
A Nobel Laureate’s views on creating Meaning from Data
The 2017 Nobel Prize for Chemistry was awarded to Structural Biologist Richard Henderson and two other co-recipients. What can Machine Learning practitioners learn from Richard’s observations about how to generate images from Cryo-Electron Microscopy data?
 
 
CDO Perspectives
 
Alphabet Soup
 
January
Alphabet Soup
Musings on the overlapping roles of Chief Analytics Officer and Chief Data Officer and thoughts on whether there should be just one Top Data Job in an organisation.
 
A Sweeter Spot for the CDO?
 
February
A Sweeter Spot for the CDO?
An extension of my concept of the Chief Data Officer sweet spot, inspired by Bruno Aziza of AtScale.
 
A truth universally acknowledged…
 
September
A truth universally acknowledged…
Many Chief Data Officer job descriptions have a list of requirements that resemble Swiss Army Knives. This article argues that the CDO must be the conductor of an orchestra, not someone who is a virtuoso in every single instrument.
 
 
Programme Advice
 
Bumps in the Road
 
January
Bumps in the Road
What the aftermath of repeated roadworks can tell us about the potentially deleterious impact of Change Programmes on Data Landscapes.
 
20 Risks that Beset Data Programmes
 
February
20 Risks that Beset Data Programmes
A review of 20 risks that can plague data programmes. How effectively these are managed / mitigated can make or break your programme.
 
Ideas for avoiding Big Data failures and for dealing with them if they happen
 
March
Ideas for avoiding Big Data failures and for dealing with them if they happen
Paul Barsch (EY & Teradata) provides some insight into why Big Data projects fail, what you can do about this and how best to treat any such projects that head off the rails. With additional contributions from Big Data gurus Albert Einstein, Thomas Edison and Samuel Beckett.
 
 
Analytics & Big Data
 
Bigger and Better (Data)?
 
February
Bigger and Better (Data)?
Some examples of where bigger data is not necessarily better data. Provided by Bill Vorhies and Larry Greenemeier .
 
Elephants’ Graveyard?
 
March
Elephants’ Graveyard?
Thoughts on trends in interest in Hadoop and Spark, featuring George Hill, James Kobielus, Kashif Saiyed and Martyn Richard Jones, together with the author’s perspective on the importance of technology in data-centric work.
 
 
and Finally…

I would like to close this review of 2017 with a final article, one that somehow defies classification:

 
25 Indispensable Business Terms
 
April
25 Indispensable Business Terms
An illustrated Buffyverse take on Business gobbledygook – What would Buffy do about thinking outside the box? To celebrate 20 years of Buffy the Vampire Slayer and 1st April 2017.

 
Notes

 
[1]
 
“They” here obviously standing for Robert Burns.
 
[2]
 
Thirty-four articles and one new page.
 
[3]
 
Of course some of these may also have been popular, I’m not being masochistic here!

 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary