Version 2 of The Anatomy of a Data Function

Between November and December 2017, I published the three parts of my Anatomy of a Data Function. These were cunningly called Part I, Part II and Part III. Eight months is a long time in the data arena and I have now issued an update.

The changes in Version 2 are confined to the above organogram and Part I of the text. They consist of the following:

1. Split Artificial Intelligence out of Data Science in order to better reflect the ascendancy of this area (and also its use outside of Data Science).

2. Change Data Science to Data Science / Engineering in order to better reflect the continuing evolution of this area.

My aim will be to keep this trilogy up-to-date as best practice Data Functions change their shapes and contents.

If you would like help building or running your Data Function, or would just like to have an informal chat about the area, please get in touch

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

Fact-based Decision-making

This article is about facts. Facts are sometimes less solid than we would like to think; sometimes they are downright malleable. To illustrate, consider the fact that in 98 episodes of Dragnet, Sergeant Joe Friday never uttered the words “Just the facts Ma’am”, though he did often employ the variant alluded to in the image above [1]. Equally, Rick never said “Play it again Sam” in Casablanca [2] and St. Paul never suggested that “money is the root of all evil” [3]. As Michael Caine never said in any film, “not a lot of people know that” [4].

 Up-front Acknowledgements These normally appear at the end of an article, but it seemed to make sense to start with them in this case: Recently I published Building Momentum – How to begin becoming a Data-driven Organisation. In response to this, one of my associates, Olaf Penne, asked me about my thoughts on fact-base decision-making. This piece was prompted by both Olaf’s question and a recent article by my friend Neil Raden on his Silicon Angle blog, Performance management: Can you really manage what you measure? Thanks to both Olaf and Neil for the inspiration.

Fact-based decision making. It sounds good doesn’t it? Especially if you consider the alternatives: going on gut feel, doing what you did last time, guessing, not taking a decision at all. However – as is often the case with issues I deal with on this blog – fact-based decision-making is easier to say than it is to achieve. Here I will look to cover some of the obstacles and suggest a potential way to navigate round them. Let’s start however with some definitions.

 Fact NOUN A thing that is known or proved to be true. (Oxford Dictionaries) Decision NOUN A conclusion or resolution reached after consideration. (Oxford Dictionaries)

So one can infer that fact-based decision-making is the process of reaching a conclusion based on consideration of things that are known to be true. Again, it sounds great doesn’t it? It seems that all you have to do is to find things that are true. How hard can that be? Well actually quite hard as it happens. Let’s cover what can go wrong (note: this section is not intended to be exhaustive, links are provided to more in-depth articles where appropriate):

Accuracy of Data that is captured

A number of factors can play into the accuracy of data capture. Some systems (even in 2018) can still make it harder to capture good data than to ram in bad. Often an issue may also be a lack of master data definitions, so that similar data is labelled differently in different systems.

A more pernicious problem is combinatorial data accuracy, two data items are both valid, but not in combination with each other. However, often the biggest stumbling block is a human one, getting people to buy in to the idea that the care and attention they pay to data capture will pay dividends later in the process.

These and other areas are covered in greater detail in an older article, Using BI to drive improvements in data quality.

Honesty of Data that is captured

Data may be perfectly valid, but still not represent reality. Here I’ll let Neil Raden point out the central issue in his customary style:

People find the most ingenious ways to distort measurement systems to generate the numbers that are desired, not only NOT providing the desired behaviors, but often becoming more dysfunctional through the effort.

[…] voluntary compliance to the [US] tax code encourages a national obsession with “loopholes”, and what salesman hasn’t “sandbagged” a few deals for next quarter after she has met her quota for the current one?

Where there is a reward to be gained or a punishment to be avoided, by hitting certain numbers in a certain way, the creativeness of humans often comes to the fore. It is hard to account for such tweaking in measurement systems.

Timing issues with Data

Timing is often problematic. For example, a transaction completed near the end of a period gets recorded in the next period instead, one early in a new period goes into the prior period, which is still open. There is also (as referenced by Neil in his comments above) the delayed booking of transactions in order to – with the nicest possible description – smooth revenues. It is not just hypothetical salespeople who do this of course. Entire organisations can make smoothing adjustments to their figures before publishing and deferral or expedition of obligations and earnings has become something of an art form in accounting circles. While no doubt most of this tweaking is done with the best intentions, it can compromise the fact-based approach that we are aiming for.

Reliability with which Data is moved around and consolidated

In our modern architectures, replete with web-services, APIs, cloud-based components and the quasi-instantaneous transmission of new transactions, it is perhaps not surprising that occasionally some data gets lost in translation [5] along the way. That is before data starts to be Sqooped up into Data Lakes, or other such Data Repositories, and then otherwise manipulated in order to derive insight or provide regular information. All of these are processes which can introduce their own errors. Suffice it to say that transmission, collation and manipulation of data can all reduce its accuracy.

Again see Using BI to drive improvements in data quality for further details.

Pertinence and fidelity of metrics developed from Data

Here we get past issues with data itself (or how it is handled and moved around) and instead consider how it is used. Metrics are seldom reliant on just one data element, but are often rather combinations. The different elements might come in because a given metric is arithmetical in nature, e.g.

$\text{Metric X} = \dfrac{\text{Data Item A}+\text{Data Item B}}{\text{Data Item C}}$

Choices are made as to how to construct such compound metrics and how to relate them to actual business outcomes. For example:

$\text{New Biz Growth} = \dfrac{(\text{Sales CYTD}-\text{Repeat CYTD})-(\text{Sales PYTD}-\text{Repeat PYTD})}{(\text{Sales PYTD}-\text{Repeat PYTD})}$

Is this a good way to define New Business Growth? Are there any weaknesses in this definition, for example is it sensitive to any glitches in – say – the tagging of Repeat Business? Do we need to take account of pricing changes between Repeat Business this year and last year? Is New Business Growth something that is even worth tracking; what will we do as a result of understanding this?

The above is a somewhat simple metric, in a section of Using historical data to justify BI investments – Part I, I cover some actual Insurance industry metrics that build on each other and are a little more convoluted. The same article also considers how to – amongst other things – match revenue and outgoings when the latter are spread over time. There are often compromises to be made in defining metrics. Some of these are based on the data available. Some relate to inherent issues with what is being measured. In other cases, a metric may be a best approximation to some indication of business health; a proxy used because that indication is not directly measurable itself. In the last case, staff turnover may be a proxy for staff morale, but it does not directly measure how employees are feeling (a competitor might be poaching otherwise happy staff for example).

Robustness of extrapolations made from Data

I have used the above image before in these pages [6]. The situation it describes may seem farcical, but it is actually not too far away from some extrapolations I have seen in a business context. For example, a prediction of full-year sales may consist of this year’s figures for the first three quarters supplemented by prior year sales for the final quarter. While our metric may be better than nothing, there are some potential distortions related to such an approach:

1. Repeat business may have fallen into Q4 last year, but was processed in Q3 this year. This shift in timing would lead to such business being double-counted in our year end estimate.

2. Taking point 1 to one side, sales may be growing or contracting compared to the previous year. Using Q4 prior year as is would not reflect this.

3. It is entirely feasible that some market event occurs this year ( for example the entrance or exit of a competitor, or the launch of a new competitor product) which would render prior year figures a poor guide.

Of course all of the above can be adjusted for, but such adjustments would be reliant on human judgement, making any projections similarly reliant on people’s opinions (which as Neil points out may be influenced, conciously or unconsciously, by self-interest). Where sales are based on conversions of prospects, the quantum of prospects might be a more useful predictor of Q4 sales. However here a historical conversion rate would need to be calculated (or conversion probabilities allocated by the salespeople involved) and we are back into essentially the same issues as catalogued above.

I explore some similar themes in a section of Data Visualisation – A Scientific Treatment

Integrity of statistical estimates based on Data

Having spent 18 years working in various parts of the Insurance industry, statistical estimates being part of the standard set of metrics is pretty familiar to me [7]. However such estimates appear in a number of industries, sometimes explicitly, sometimes implicitly. A clear parallel would be credit risk in Retail Banking, but something as simple as an estimate of potentially delinquent debtors is an inherently statistical figure (albeit one that may not depend on the output of a statistical model).

The thing with statistical estimates is that they are never a single figure but a range. A model may for example spit out a figure like £12.4 million ± £0.5 million. Let’s unpack this.

Well the output of the model will probably be something analogous to the above image. Here a distribution has been fitted to the business event being modelled. The central point of this (the one most likely to occur according to the model) is £12.4 million. The model is not saying that £12.4 million is the answer, it is saying it is the central point of a range of potential figures. We typically next select a symmetrical range above and below the central figure such that we cover a high proportion of the possible outcomes for the figure being modelled; 95% of them is typical [8]. In the above example, the range extends plus £0. 5 million above £12.4 million and £0.5 million below it (hence the ± sign).

Of course the problem is then that Financial Reports (or indeed most Management Reports) are not set up to cope with plus or minus figures, so typically one of £12.4 million (the central prediction) or £11.9 million (the most conservative estimate [9]) is used. The fact that the number itself is uncertain can get lost along the way. By the time that people who need to take decisions based on such information are in the loop, the inherent uncertainty of the prediction may have disappeared. This can be problematic. Suppose a real result of £12.4 million sees an organisation breaking even, but one of £11.9 million sees a small loss being recorded. This could have quite an influence on what course of action managers adopt [10]; are they relaxed, or concerned?

Beyond the above, it is not exactly unheard of for statistical models to have glitches, sometimes quite big glitches [11].

This segment could easily expand into a series of articles itself. Hopefully I have covered enough to highlight that there may be some challenges in this area.

And so what?

Even if we somehow avoid all of the above pitfalls, there remains one booby-trap that is likely to snare us, absent the necessary diligence. This was alluded to in the section about the definition of metrics:

Is New Business Growth something that is even worth tracking; what will we do as a result of understanding this?

Unless a reported figure, or output of a model, leads to action being taken, it is essentially useless. Facts that never lead to anyone doing anything are like lists learnt by rote at school and regurgitated on demand parrot-fashion; they demonstrate the mechanism of memory, but not that of understanding. As Neil puts it in his article:

[…] technology is never a solution to social problems, and interactions between human beings are inherently social. This is why performance management is a very complex discipline, not just the implementation of dashboard or scorecard technology.

How to Measure the Unmeasurable

Our dream of fact-based decision-making seems to be crumbling to dust. Regular facts are subject to data quality issues, or manipulation by creative humans. As data is moved from system to system and repository to repository, the facts can sometimes acquire an “alt-” prefix. Timing issues and the design of metrics can also erode accuracy. Then there are many perils and pitfalls associated with simple extrapolation and less simple statistical models. Finally, any fact that manages to emerge from this gantlet [12] unscathed may then be totally ignored by those whose actions it is meant to guide. What can be done?

As happens elsewhere on this site, let me turn to another field for inspiration. Not for the first time, let’s consider what Science can teach us about dealing with such issues with facts. In a recent article [13] in my Maths & Science section, I examined the nature of Scientific Theory and – in particular – explored the imprecision inherent in the Scientific Method. Here is some of what I wrote:

It is part of the nature of scientific theories that (unlike their Mathematical namesakes) they are not “true” and indeed do not seek to be “true”. They are models that seek to describe reality, but which often fall short of this aim in certain circumstances. General Relativity matches observed facts to a greater degree than Newtonian Gravity, but this does not mean that General Relativity is “true”, there may be some other, more refined, theory that explains everything that General Relativity does, but which goes on to explain things that it does not. This new theory may match reality in cases where General Relativity does not. This is the essence of the Scientific Method, never satisfied, always seeking to expand or improve existing thought.

I think that the Scientific Method that has served humanity so well over the centuries is applicable to our business dilemma. In the same way that a Scientific Theory is never “true”, but instead useful for explaining observations and predicting the unobserved, business metrics should be judged less on their veracity (though it would be nice if they bore some relation to reality) and instead on how often they lead to the right action being taken and the wrong action being avoided. This is an argument for metrics to be simple to understand and tied to how decision-makers actually think, rather than some other more abstruse and theoretical definition.

A proxy metric is fine, so long as it yields the right result (and the right behaviour) more often than not. A metric with dubious data quality is still useful if it points in the right direction; if the compass needle is no more than a few degrees out. While of course steps that improve the accuracy of metrics are valuable and should be undertaken where cost-effective, at least equal attention should be paid to ensuring that – when the metric has been accessed and digested – something happens as a result. This latter goal is a long way from the arcana of data lineage and metric definition, it is instead the province of human psychology; something that the accomploished data professional should be adept at influencing.

I have touched on how to positively modify human behaviour in these pages a number of times before [14]. It is a subject that I will be coming back to again in coming months, so please watch this space.

Notes

[1]

According to Snopes, the phrase arose from a spoof of the series.

[2]

The two pertinent exchanges were instead:

 Ilsa: Play it once, Sam. For old times’ sake. Sam: I don’t know what you mean, Miss Ilsa. Ilsa: Play it, Sam. Play “As Time Goes By” Sam: Oh, I can’t remember it, Miss Ilsa. I’m a little rusty on it. Ilsa: I’ll hum it for you. Da-dy-da-dy-da-dum, da-dy-da-dee-da-dum… Ilsa: Sing it, Sam.

and

 Rick: You know what I want to hear. Sam: No, I don’t. Rick: You played it for her, you can play it for me! Sam: Well, I don’t think I can remember… Rick: If she can stand it, I can! Play it!

[3]

Though he, or whoever may have written the first epistle to Timothy, might have condemned the “love of money”.

[4]

The origin of this was a Peter Sellers interview in which he impersonated Caine.

[5]

One of my Top Ten films.

[6]

Especially for all Business Analytics professionals out there (2009).

[7]

See in particular my trilogy:

[8]

Without getting into too many details, what you are typically doing is stating that there is a less than 5% chance that the measurements forming model input match the distribution due to a fluke; but this is not meant to be a primer on null hypotheses.

[9]

Of course, depending on context, £12.9 million could instead be the most conservative estimate.

[10]

This happens a lot in election polling. Candidate A may be estimated to be 3 points ahead of Candidate B, but with an error margin of 5 points, it should be no real surprise when Candidate B wins the ballot.

[11]

Try googling Nobel Laureates Myron Scholes and Robert Merton and then look for references to Long-term Capital Management.

[12]

Yes I meant “gantlet” that is the word in the original phrase, not “gauntlet” and so connections with gloves are wide of the mark.

[13]

Finches, Feathers and Apples (2018).

[14]

For example:

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

In-depth with CDO Jo Coutuer

Part of the In-depth series of interviews

 Today’s guest on In-depth is Jo Coutuer, Chief Data Officer and Member of the Executive Committee of BNP Paribas Fortis, a leading Belgian bank. Given the importance of the CDO role in Financial Services, I am very happy that Jo has managed to spare us some of his valuable time to talk.
 Jo, you have had an interesting career in a variety of organisations from consultancies to start-ups, from government to major companies. Can you give readers a pen-picture of the journey that has taken you to your current role? For me, the variety of contexts has been the most rewarding. I started in an industry that has now sharply declined in Europe (Telco Manufacturing), continued in the consulting world of ERP tools, switched into a very interesting job for the government, became an entrepreneur and co-created a data company for 13 years, merged that data company into a big 4 consultancy and finally decided to apply my life’s learnings to the fascinating industry of banking. The most remarkable aspect of my career is the fact that my current role and the attention to data that goes with it, did not exist when I started my career. It illustrates how young people today can also build a future, without really knowing what lies ahead. All it takes is the mental flexibility to switch contexts when it is needed.
 Do you collaborate with other Executives in the data arena, or is the CDO primus inter pares when it comes to data matters? I would not speak of a hierarchical order when it comes to data. It helps to distinguish three identities of a Data department. The first one is the identity of the “Governor”. In that identity, peers accept that the CDO translates external duties into internal best practices, as long as this happens in a co-creation mode. We have established a “College of Data Managers”, who are 13 senior managers, representing each a specific “data perimeter”, which in its turn rather well maps to our fields of business or our internal functions. These senior managers intimately link the Data activities to the day-to-day business functions and their respective executives. A second identity is that of the “Expert”. In that identity, we offer expertise in fields of data integration, data warehousing, reporting, visualisation, data science, … It means that I see my fellow executives as clients and partners and the Data department helps them achieve their business objectives. Mentally (and sometimes practically), we measure up to external professional services or IT companies. A third identity is that of the “Integrator”. As an integrator, we actively make the link between the business of today, the technological and data potential of today and the business of tomorrow. We actively try to question existing practices and we introduce new concepts for a variety of business applications. And although we are more driving in this role than we are in the role of the “Expert”, we still are fully at the service of our clients.
 More generally, how do you see the CDO role changing in coming years, what would 2020’s CDO be doing? Will we even need CDOs in 2020? Ahah! One of the most frequently asked questions on CDO related social media! If previous two years are any predictor of the future, I would say that the CDO of 2020 is one who has solidly matured the governance aspects of Data, just like the CFO and CRO have done that for financial management or risk management. Let’s say that Data has become “routine”. At the same time, the 2020 CDO will need to offer to his peers, the technical and expert capabilities that are data centric and essential to running a digital business. And on top of that, I believe that 2020 will be the timeframe in which data valorisation will become an active topic. I explicitly do not use the word “monetisation” because we currently associate data to often with “selling data for advertising purposes”. In our industry, PSD2 [1] will define our duties to be able to exchange data with third party service providers, at the explicit request of our clients. From that new reality, an API-driven ecosystem will surface in which data will be actively valorised, to the direct service of our clients, not to the indirect service of our marketing departments. The 2020 CDO will be instrumental in shaping his or her company’s ecosystem to make sure this happens in a well governed, trusted and safe way. Clients will seek that reassurance and will reward companies who take data management seriously.
 Of course, senior roles tend to exist because they add value to their organisations, what do you feel is the value that a CDO brings to the table? I have already mentioned the CDO’s challenge to be schizophrenic ally split between his or her various identities. But it is exactly that breadth of scope that can add value. The CDO should be an “executive integrator”. He can employ “governors” and “experts”, but his or her role in the peer team of executives is to represent the transversality of data’s nature. Data “flows”, data “unites”. More than it is “oil”, data is “water”. It flows through the company’s ecosystem and it nourishes the business and the future business potential. As such, the CDO needs to keep the water clean and make sure it gets pumped across the organisation, so that others can benefit from the nutrients it. And while doing so, the CDO has a duty to add nutrients to the water, in the form of analytical or artificial intelligence induced insights.
 Focussing on Analytics, I know you have written about how to build the ideal Analytics team and have mentioned that “purple people” are the key. Can you explain more about this? Purple people are people that integrate the skills of “red” people and “blue” people. Red people bring the scientific data methodologies to the table. Blue people bring the solid frameworks of the business. Data people as individuals and a Data department as an entity, must have as a mission to be “purple” and to actively bridge the gap between the fast growing set of data technologies and methodologies on the one hand and the rapidly evolving and transforming business challenges on the other hand. And of course, if you like Prince [2] as a musician, that can be an asset too!
 In my discussions with other CDOs [3] and indeed in my own experience, it seems that teamwork is crucial for a CDO. Of course, this is important for many senior roles, but it does seem central to what a CDO does. My perspective is that both a CDO’s own team and the virtual teams that he or she forms with colleagues are going to have a big say in whether things go well or not. What are your views on this topic? You are absolutely right. A CDO or data function cannot exist in isolation. At some times, transversality feels a burden because it imposes a daily attention to stakeholders. However, in reality, it’s exactly the transversal effect that can generate the added value to an organisation. At the end of the day, the integration aspects between departments and people will generate positive side effects, above and beyond the techniques of data management.
 Artificial Intelligence in its various guises has been the topic of conversation recently. This is something with strong linkage to the data field. Obviously without divulging any commercial secrets, what role do you see AI playing in banking going forwards? What about in our lives in general? It’s funny that AI is being discovered as a new topic. I remember writing my Master thesis on the topic a long time ago. Of course, things have evolved since the 90s, with a storage and computing capacity that is approximately 50,000 times stronger for the same price point. This capacity explosion, combined with the connectivity of the internet and the cloud, combined with the increased awareness that data and algorithms have become central elements in a many business strategies, has fundamentally re-calibrated the potential of AI. In banking, AI and Analytics will soon help clients understand their finances better, will help them to take better and faster decisions, will generate a better (less friction) client experience for “the easy stuff” and it will allow the banks to put humans on “the hard stuff” or on those interactions with their clients that require true human interaction. Behind the scenes, Analytics and AI are already helping to prevent fraud, monitoring suspicious transactions to detect crime, money laundering and fraud. And even deeper inside the mechanics of a bank, Analytics and AI are helping prevent cyber-crimes and are monitoring the stability of the technological platforms onto which our modern financial and societal system is built. I am convinced that the societal role of banks will continue to exists, despite innovative peer-to-peer or blockchain driven schemes. As such, Analytics and AI will contribute to society as a whole, through their contribution to a reliable and stable financial services system.
 With GDPR [4] coming into force only a couple of months ago, the subject of customer data and how it is used is a topical one. Taking BNP Paribas Fortis to one side, what are your thoughts on the balance between data privacy and the “free” services that we all pay for by allowing our data to be sold? I believe that GDPR is both important legislation and brings benefits to customers. First of all, we have good historical reasons to care about our privacy. In times of societal crises or wars, it is the first weapon that is used against society and its citizens. So we should care for it deeply. Second, being in an industry for which “trust” is the most essential element of identity, protecting and respecting the data and the privacy of clients is a natural reflex. And putting the banking question aside for a moment, we should continue to educate aggressively about the fact that services never come for free. As long as consumers are well informed that they pay for their convenience with their data, there is no fundamental concern. But because there is still no real “paid” economy surfacing, the consumer does not really have a choice between “pay-for-service” or “give-data-for-service”. I believe that the market potential for paid services, that guarantee non-exploitation of personal data, is quietly growing. And when it finally appears, consumers will start making choices. Personally, I admit to having moved from being on all possible digital channels and tools, towards being much more selective. And I must admit that digital life with a privacy aware mind is still possible and still fun.
 It seems to me that a key capability of a CDO is as an influencer. Influence can take many shapes, from being an acknowledged expert in an area, to the softer skills of being someone that others can talk to openly. Do you agree about this observation? If so, how do you seek to be an influencer? It’s a thin line to walk and it depends on the type of CDO that you are and the mandate that you have. If you have a mandate to do “governance only”, then you should have the confidence of delivering on your mandate, just like a CRO or a CFO does. For that I always revert to the phrase: “we agreed that data is a valuable asset, just like money or people or buildings, … so let’s then act like it.” If you have mandate to “change”, to “create value”, then you have to be an integrator and influencer because you can never change an organisation and its people on your own.
 Before letting you go, a quick personal question. I know you spent some time at the University of Cambridge. I lived in this town while my wife was working on her PhD. Like Cambridge, Leuven [5] is a historic town just outside of a major capital city. What parallels do you see between the two and what did you think of the locals? Cambridge is famous for its “punts”, Leuven for its Stella Artois “pints”. And both central churches (or chapels) are home to iconic paintings by Flemish masters, Rubens in Cambridge and Bouts in Leuven. Visit both!
 Jo, thank you so much for talking to me and giving readers the benefit of your ideas and experience.

Jo Coutuer can be reached at via his LinkedIn profile.

Disclosure: At the time of publication, neither peterjamesthomas.com Ltd. nor any of its Directors had any shared commercial interests with Jo Coutuer, BNP Paribas Fortis or any entities associated with either of these.

 If you are a Chief Data Officer, a Chief Analytics Officer, a Director of Data, or hold some other “Top Data Job” and would like to share your thoughts with the readers of this site in an interview like this one, please get in contact.

Notes

 [1] Payment Services Directive 2. [2] Prince Rogers Nelson. [3] Two recent examples include: [4] General Data Protection Regulation. [5] Leuven.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

Offence, Defence and the Top Data Job

Football [1] has been in the news rather a lot of late; apparently there is some competition or other going on in Russia [2]. Presumably it was this that brought to my mind the analogy sometimes applied to the data arena of offence and defence [3]. Defence brings to mind Data Governance, Master Data Management and Data Quality. Offence suggests Data Science, Machine Learning and Analytics. This is an analogy I have briefly touched on in these pages before [4]; here I want to expand on it.

Rather than Association Football, it was however the American version that first crossed my mind. In Gridiron, there are of course wholly separate teams for each of offence, defence, kicking and receiving, each filled with specialists. I would be happy to learn from readers about any counterexamples, but I struggle to think of any other sport that is like this [5]. In each of Association Football, both types of Rugby, Australian Rules Football and indeed Basketball, Baseball (see previous note [5]) Volleyball, Hockey, Ice Hockey, Lacrosse, Polo, Water Polo and Handball, the same players form both the offence and defence. Of course this is probably due to them being a bit less stop-start than American Football, offence can turn into defence in a split-second in some of them.

To stick with Football (I’m going to drop “Association” from here on in), while players may be designated as goalkeepers, defenders, mid-fielders, wingers and attackers (strikers), any player may be called on to defend or attack at any time [6]. Star strikers may need to make desperate tackles. Defenders (who tend to be taller) will be called up to try to turn corner kicks into goals. Even at the most basic level, the ball needs to be transferred from one end of the field to the other, which requires (absent the Goalkeeper simply taking what is known as route one – i.e. kicking it as far as they can towards the other goal) several players to pass the ball, control it and pass again. The whole team contributes.

I have written before about the nomenclature maze that often surrounds the Top Data Job [7] (see Further Reading at the end of the article). In some organisations the offence and defence aspects of the data arena are separate, in the sense that both are headed by someone who then reports into a non-data-specialist. For example a Chief Data Officer and a Chief Analytics Officer might both report to a Chief Operating Officer. This feels a bit like the American Football approach; separate teams to do separate things. I’m probably stretching the metaphor [8], but a problem that occurs to me is that – in business – the data offence and data defence teams will need to be on the field of play at the same time. Aren’t they going to get in each other’s way and end up duplicating activities? At the very least, they are going to need some robust rules about who does what and for these to be made very clear to the players. Also, ultimately, while both offence and defence teams in Gridiron will have their own coaches, these will report to a Head Coach; someone who presumably knows just a bit about American Football. I can’t think of any instances where an NFL team has no Head Coach and instead the next tier of staff all report to the owner.

Of course having multiple senior data roles reporting into different parts of the Executive may be fine and many organisations operate this way. However, again coming back to my sporting analogy, I prefer the approach adopted by Football, Rugby, Basketball and the rest. I like the idea of a single, cohesive Data Function, led by someone who is a data specialist, no matter what their job title might me. In most sports what seems to work well is a team in which people have roles, but in which there is cross-over and a need to just get done. I think this works for people involved in data work as well.

You wouldn’t have the Head of Tax and the Head of Financial Reporting both reporting to the CEO, that’s what CFOs are for (among other things). It should be the same in the data arena with the Top Data Job being just that, the one person ultimately accountable for both the control and leverage of data. I have made no secret of my opinion that this is the optimum approach. I think my view is supported by the overwhelming number of sports where offence and defence are functions of the same, cohesive team.

Notes

 [1] Association of course. [2] My winter team sport was always Rugby Football, of the Union variety. But – as is evident from quite a few articles on this site – for many years my spare time was mostly occupied by rock climbing and bouldering. The day after England’s defeat at the hands of Croatia, the Polish guy I regularly buy my skinny flat white from offered his commiserations about yesterday. I was at a loss as to what he had done to me yesterday and he had to explain that he was referring to the World Cup. Not all Brit’s are Football fanatics. [3] Offense and defense for my wife and any other Americans reading. [4] This was as part of Alphabet Soup. [5] The only thing I could think of that was even in the same ballpark (pun intended) was the use of a designated hitter in some baseball leagues. Even then, the majority of the team have to field as well as bat. [6] There are indeed examples of Goalkeepers, the quintessential defensive player, scoring in International Football. [7] With acknowledgement to Peter Aiken. [8] For neither the first time, nor the last: e.g. A bad workman blames his [Business Intelligence] tools and Analogies.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

Building Momentum – How to begin becoming a Data-driven Organisation

Introduction

It is hard to find an organisation that does not aspire to being data-driven these days. While there is undoubtedly an element of me-tooism about some of these statements (or a fear of competitors / new entrants who may use their data better, gaining a competitive advantage), often there is a clear case for the better leverage of data assets. This may be to do with the stand-alone benefits of such an approach (enhanced understanding of customers, competitors, products / services etc. [1]), or as a keystone supporting a broader digital transformation.

However, in my experience, many organisations have much less mature ideas about how to achieve their data goals than they do about setting them. Given the lack of executive experience in data matters [2], it is not atypical that one of the large strategy consultants is engaged to shape a data strategy; one of the large management consultants is engaged to turn this into something executable and maybe to select some suitable technologies; and one of the large systems integrators (or increasingly off-shore organisations migrating up the food chain) is engaged to do the work, which by this stage normally relates to building technology capabilities, implementing a new architecture or some other technology-focussed programme.

Even if each of these partners does a great job – which one would hope they do at their price points – a few things invariably get lost along the way. These include:

1. A data strategy that is closely coupled to the organisation’s actual needs rather than something more general.

While there are undoubtedly benefits in adopting best practice for an industry, there is also something to be said for a more tailored approach, tied to business imperatives and which may have the possibility to define the new best practice. In some areas of business, it makes sense to take the tried and tested approach, to be a part of the herd. In others – and data is in my opinion one of these – taking a more innovative and distinctive path is more likely to lead to success.

2. Connective tissue between strategy and execution.

The distinctions between the three types of organisations I cite above are becoming more blurry (not least as each seeks to develop new revenue streams). This can lead to the strategy consultants developing plans, which get ripped up by the management consultants; the management consultants revisiting the initial strategy; the systems integrators / off-shorers replanning, or opening up technical and architecture discussions again. Of course this means the client paying at least twice for this type of work. What also disappears is the type of accountability that comes when the same people are responsible for developing a strategy, turning this into a practical plan and then executing this [3].

3. Focus on the cultural aspects of becoming more data-driven.

This is both one of the most important factors that determines success or failure [4] and something that – frankly because it is not easy to do – often falls by the wayside. By the time that the third external firm has been on-boarded, the name of the game is generally building something (e.g. a Data Lake, or an analytics platform) rather than the more human questions of who will use this, in what way, to achieve which business objectives.

Of course a way to address the above is to allocate some experienced people (internal or external, ideally probably a blend) who stay the course from development of data strategy through fleshing this out to execution and who – importantly – can also take a lead role in driving the necessary cultural change. It also makes sense to think about engaging organisations who are small enough to tailor their approach to your needs and who will not force a “cookie cutter” approach. I have written extensively about how – with the benefit of such people on board – to run such a data transformation programme [5]. Here I am going to focus on just one phase of such a programme and often the most important one; getting going and building momentum.

A Third Way

There are a couple of schools of thought here:

1. Focus on laying solid data foundations and thus build data capabilities that are robust and will stand the test of time.

2. Focus on delivering something ASAP in the data arena, which will build the case for further investment.

There are points in favour of both approaches and criticisms that can be made of each as well. For example, while the first approach will be necessary at some point (and indeed at a relatively early one) in order to sustain a transformation to a data-driven organisation, it obviously takes time and effort. Exclusive focus on this area can use up money, political capital and try the patience of sponsors. Few business initiatives will be funded for years if they do not begin to have at least some return relatively soon. This remains the case even if the benefits down the line are potentially great.

Equally, the second approach can seem very productive at first, but will generally end up trying to make a silk purse out of a sow’s ear [6]. Inevitably, without improvements to the underlying data landscape, limitations in the type of useful analytics that be carried out will be reached; sometimes sooner that might be thought. While I don’t generally refer to religious topics on this blog [7], the Parable of the Sower is apposite here. Focussing on delivering analytics without attending to the broader data landscape is indeed like the seed that fell on stony ground. The practice yields results that spring up, only to wilt when the sun gets hot, given that they have no real roots [8].

So what to do? Well, there is a Third Way. This involves blending both approaches. I tend to think of this in the following way:

First of all, this is a cartoon, it is not intended to indicate actual percentages, just to illustrate a general trend. In real life, it is likely that you will cycle round multiple times and indeed have different parallel work-streams at different stages. The general points I am trying to convey with this diagram are:

1. At the beginning of a data transformation programme, there should probably be more emphasis on interim delivery and tactical changes. However, imoportantly, there is never zero strategic work. As things progress, the emphasis should swing more to strategic, long-term work. But again, even in a mature programme, there is never zero tactical work. There can also of course be several iterations of such shifts in approach.

2. Interim and tactical steps should relate to not just analytics, but also to making point fixes to the data landscape where possible. It is also important to kick off diagnostic work, which will establish how bad things are and also suggest areas which could be attacked sooner rather than later; this too can initially be done on a tactical basis and then made more robust later. In general, if you consider the span of strategic data work, it makes sense to kick off cut-down (and maybe drastically cut-down) versions of many activities early on.

3. Importantly, the tactical and strategic work-streams should not be hermetically sealed. What you actually want is healthy interplay. Building some early, “quick and dirty” analytics may highlight areas that should be covered by a data audit, or where there are obvious weaknesses in a data architecture. Any data assets that are built on a more strategic basis should also be leveraged by tactical work, improving its utility and probably increasing its lifespan.

Interconnected Activities

At the beginning of this article, I present a diagram (repeated below) which covers three types of initial data activities, the sort of work that – if executed competently – can begin to generate momentum for a data programme. The exhibit also references Data Strategy.

Let’s look at each of these four things in some more detail:

1. Analytic Point Solutions

Where data has historically been locked up in either hard-to-use repositories or in source systems themselves, liberating even a bit of it can be very helpful. This does not have to be with snazzy tools (unless you want to showcase the art of the possible). An anecdote might help to explain.

At one organisation, they had existing reporting that was actually not horrendous, but it was hard to access, hard to parameterise and hard to do follow-on analysis on. I took it upon myself to run 30 plus reports on a weekly and monthly basis, download the contents to Excel, front these with some basic graphs and make these all available on an intranet. This meant that people from Country A or Department B could go straight to their figures rather than having to run fiddly reports. It also meant that they had an immediate visual overview – including some comparisons to prior periods and trends over time (which were not available in the original reports). Importantly, they also got a basic pivot table, which they could use to further examine what was going on. These simple steps (if a bit laborious for me) had a massive impact. I later replaced the Excel with pages I wrote in a new web-reporting tool we built in house. Ultimately, my team moved these to our strategic Analytics platform.

This shows how point solutions can be very valuable and also morph into more strategic facilities over time.

2. Data Process Improvements

Data issues may be to do with a range of problems from poor validation in systems, to bad data integration, but immature data processes and insufficient education for data entry staff are often key conributors to overall problems. Identifying such issues and quantifying their impact should be the province of a Data Audit, which is something I would recommend considering early on in a data programme. Once more this can be basic at first, considering just superficial issues, and then expand over time.

While fixing some data process problems and making a stepped change in data quality will both probably take time an effort, it may be possible to identify and target some narrower areas in which progress can be made quite quickly. It may be that one key attribute necessary for analysis is poorly entered and validated. Some good communications around this problem can help, better guidance for people entering it is also useful and some “quick and dirty” reporting highlighting problems and – hopefully – tracking improvement can make a difference quicker than you might expect [9].

3. Data Architecture Enhancements

Improving a Data Architecture sounds like a multi-year task and indeed it can often be just that. However, it may be that there are some areas where judicious application of limited resource and funds can make a difference early on. A team engaged in a data programme should seek out such opportunities and expect to devote time and attention to them in parallel with other work. Architectural improvements would be best coordinated with data process improvements where feasible.

An example might be providing a web-based tool to look up valid codes for entry into a system. Of course it would be a lot better to embed this functionality in the system itself, but it may take many months to include this in a change schedule whereas the tool could be made available quickly. I have had some success with extending such a tool to allow users to build their own hierarchies, which can then be reflected in either point analytics solutions or more strategic offerings. It may be possible to later offer the tool’s functionality via web-services allowing it to be integrated into more than one system.

4. Data Strategy

I have written extensively about Data Strategy on this site [10]. What I wanted to cover here is the interplay between Data Strategy and some of the other areas I have just covered. It might be thought that Data Strategy is both carved on tablets of stone [11] and stands in splendid and theoretical isolation, but this should not ever be the case. The development of a Data Strategy should of course be informed by a situational analysis and a vision of “what good looks like” for an organisation. However, both of these things can be shaped by early tactical work. Taking cues from initial tactical work should lead to a more pragmatic strategy, more aligned to business realities.

Work in each of the three areas itemised above can play an important role in shaping a Data Strategy and – as the Data Strategy matures – it can obviously guide interim work as well. This should be an iterative process with lots of feedback.

Closing Thoughts

I have captured the essence of these thoughts in the diagram above. The important things to take away are that in order to generate momentum, you need to start to do some stuff; to extend the physical metaphor, you have to start pushing. However, momentum is a vector quantity (it has a direction as well as a magnitude [12]) and building momentum is not a lot of use unless it is in the general direction in which you want to move; so push with some care and judgement. It is also useful to realise that – so long as your broad direction is OK – you can make refinements to your direction as you pick up speed.

The above thoughts are based on my experience in a range of organisations and I am confident that they can be applied anywhere, making allowance for local cultures of course. Once momentum is established, it still needs to be maintained (or indeed increased), but I find that getting the ball moving in the first place often presents the greatest challenge. My hope is that the framework I present here can help data practitioners to get over this initial hurdle and begin to really make a difference in their organisations.

Notes

 [1] Way back in 2009, I wrote about the benefits of leveraging data to provide enhanced information. The article in question was tited Measuring the benefits of Business Intelligence. Everything I mention remains valid today in 2018. [2] See also: [3] If I many be allowed to blow my own trumpet for a moment, I have developed data / information strategies for eight organisations, turned seven of these into a costed / planned programme and executed at least the first few phases of six of these. I have always found being a consistent presence through these phases has been beneficial to the organisations I was helping, as well as helping to reduce duplication of work. [4] See my, now rather venerable, trilogy about cultural change in data / information programmes: together with the rather more recent: [5] See for example: [6] Dictionary.com offers a nice explanation of this phrase.. [7] I was raised a Catholic, but have been areligious for many years. [8] Much like $x^2+x+1=0$. For anyone interested, the two roots of this polynomial are clearly: $-\dfrac{1}{2}+\dfrac{\sqrt{3}}{2}\hspace{1mm}i\hspace{5mm}\text{and}\hspace{5mm}-\dfrac{1}{2}-\dfrac{\sqrt{3}}{2}\hspace{1mm}i$ neither of which is Real. [9] See my rather venerable article, Using BI to drive improvements in data quality, for a fuller treatment of this area. [10] For starters see: and also the Data Strategy segment of The Anatomy of a Data Function – Part I. [11] [12] See Glimpses of Symmetry, Chapter 15 – It’s Space Jim….

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

Link directly to entries in the Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary has always had internal tags (anchors for those old enough to recall their HTML) which allowed me, as its author, to link to individual entries from other web-pages I write. An example of the use of these is my article, A Brief History of Databases.

I have now made these tags public. Each entry in the Dictionary is followed by the full tag address in a box. This is accompanied by a link icon as follows:

Clicking on the link icon will copy the tag address to your clipboard. Alternatively the tag URL may just be copied from the box containing it directly. You can then use this address in your own article to link back to the D&AD entry.

As with the vast majority of my work, the contents of the Data and Analytics Dictionary is covered by a Creative Commons Attribution 4.0 International Licence. This means you can include my text or images in your own web-pages, presentations, Word documents etc. You can even modify my work, so long as you point out that you have done this.

If you would like to link back to the Data and Analytics Dictionary to provide definitions of terms that you are using, this should now be very easy. For example:

Lorem ipsum dolor sit amet, consectetur adipiscing Big Data elit. Duis tempus nisi sit amet libero vehicula Data Lake, sed tempor leo consectetur. Pellentesque suscipit sed felisData Governance ac mattis. Fusce mattis luctus posuere. Duis a Spark mattis velit. In scelerisque massa ac turpis viverra, acLogistic Regression pretium neque condimentum.

Equally, I’d be delighted if you wanted to include part of all of the text of an entry in the Data and Analytics Dictionary in your own work, commercial or personal; a link back using this new functionality would be very much appreciated.

I hope that this new functionality will be useful. An update to the Dictionary’s contents will be published in the next couple of months.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

Sic Transit Gloria Magnorum Datorum

It happens to all of us eventually I suppose.

Just the other day, I heard someone referring to “traditional Big Data”. Since when did Big Data become “traditional”, I didn’t get the e-mail? Of course, in the technology field, the epithet “traditional” is code for “broken”, “no longer of any use” and – most damningly of all – “deeply uncool”. The term is widely used, whether – with this connotation – it is either helpful or accurate is perhaps a matter for debate. This usage makes me recall the rather silly debate about Analytics versus “traditional” Business Intelligence that occurred around 2009 [1].

By way of context, the person talking about “traditional Big Data” was referring to the difference between some of the original denizens of the Hadoop ecosystem and more recent offerings like Databricks or Beam. They also had in mind the various quasi-proprietary flavours of Big Data and/or Big Data plug-ins offered by (that word again) “traditional” vendors. In this sense, the usage is probably appropriate, albeit somewhat jarring. In the more pejorative sense I refer to above, “traditional” is somewhat misleading when applied to either Big Data or – in the author’s opinion – several of its precursors.

While we inhabit a world which places a premium on innovation, favouring the new and the shiny [2], traditional methods have much to offer. If something – a technique or technology – has achieved “traditional” status, it means that it has become part of how things are done. While shaking up the status quo can be beneficial, “traditional” approaches have the not insignificant benefit of having been tried and tested. “Traditional” data tools are ones that have survived some time and are still used. While not guaranteeing success, it should at least be possible to be successful with such tools because other people have done this before.

Maybe, several years after its move into the mainstream, Big Data has become “traditional”. However I would take this as meaning “fit for purpose”, “useful” and “still pretty cool”. Then I think the same about many of the technologies that were described as “traditional” in contrast to Big Data. As ever, the main things that lead to either success or failure in data-centric work [3] have very little to do with technology, be that traditional or à la mode.

Notes

 [1] If you have the stomach for it, see Business Analytics vs Business Intelligence and succeeding articles. [2] See also 2009’s The latest and greatest versus the valuable. [3] I itemise a few of these in last year’s 20 Risks that Beset Data Programmes.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

A further extension of the Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary is an active document and I will continue to issue revised versions of it periodically. A larger update is in the works, but for now here are a dozen new definitions:

As previously stated, ideas for what to include next would be more than welcome (any suggestions used will also be acknowledged).

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

A Retrospective of 2017’s Articles

This article was originally intended for publication late in the year it reviews, but, as they [1] say, the best-laid schemes o’ mice an’ men gang aft agley…

In 2017 I wrote more articles [2] than in any year since 2009, which was the first full year of this site’s existence. Some were viewed by thousands of people, others received less attention. Here I am going to ignore the metric of popular acclaim and instead highlight a few of the articles that I enjoyed writing most, or sometimes re-reading a few months later [3]. Given the breadth of subject matter that appears on peterjamesthomas.com, I have split this retrospective into six areas, which are presented in decreasing order of the number of 2017 articles I wrote in each. These are as follows:

In each category, I will pick out two or three of pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.

I would like to close this review of 2017 with a final article, one that somehow defies classification:

 April 25 Indispensable Business Terms An illustrated Buffyverse take on Business gobbledygook – What would Buffy do about thinking outside the box? To celebrate 20 years of Buffy the Vampire Slayer and 1st April 2017.

Notes

 [1] “They” here obviously standing for Robert Burns. [2] Thirty-four articles and one new page. [3] Of course some of these may also have been popular, I’m not being masochistic here!

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

The Anatomy of a Data Function – Part III

 Part I Part II Part III

This is the third and final part of my review of the anatomy of a Data Function, Part I may be viewed here and Part II here.

 Update: The data arena is a fluid one. The original set of Anatomy of a Data Function articles dates back to November 2017. As of August 2018, the data function schematic has been updated to separate out Artificial Intelligence from Data Science and to change the latter to Data Science / Engineering. No doubt further changes will be made from time to time.

In the first article, I introduced the following Data Function organogram:

and went on to cover each of Data Strategy, Analytics & Insight and Data Operations & Technology. In Part II, I discussed the two remaining Data Function areas of Data Architecture and Data Management. In this final article, I wanted to cover the Related Areas that appear on the right of the above diagram. This naturally segues into talking about the practicalities of establishing a Data Function and highlighting some problems to be avoided or managed.

As in Parts I and II, unless otherwise stated, text indented as a quotation is excerpted from the Data and Analytics Dictionary.

Related Areas

I have outlined some of the key areas with which the Data Function will work. This is not intended to be a comprehensive list and indeed the boxes may be different in different organisations. Regardless of the departments that appear here, the general approach will however be similar. I won’t go through each function in great detail here. There are some obvious points to make however. The first is an overall one that clearly a collaborative approach is mandatory. While there are undeniably some police-like attributes of any Data Function, it would be best if these were carried out by friendly community policemen or women, not paramilitaries.

So rather more:

and rather less:

Data Privacy and Information Security

Though strongly related, these areas do not generally fall under the Data Function. Indeed some legislation requires that they are separate functions. Data Privacy and Information Security are related, but also distinct from each other. Definitions are as follows:

[Data Privacy] pertains to data held by organisations about individuals (customers, counterparties etc.) and specifically to data that can be used to identify people (personally identifiable data), or is sensitive in nature, such as medical records, financial transactions and so on. There is a legal obligation to safeguard such information and many regulations around how it can be used and how long it can be retained. Often the storage and use of such data requires explicit consent from the person involved.

Data and Analytics Dictionary entry: Data Privacy

Information Security consists of the steps that are necessary to make sure that any data or information, particularly sensitive information (trade secrets, financial information, intellectual property, employee details, customer and supplier details and so on), is protected from unauthorised access or use. Threats to be guarded against would include everything from intentional industrial espionage, to ad hoc hacking, to employees releasing or selling company information. The practice of Information Security also applies to the (nowadays typical) situation where some elements of internal information is made available via the internet. There is a need here to ensure that only those people who are authenticated to access such information can do so.

Data and Analytics Dictionary entry: Information Security

Digital

Digital is not a box that would have necessarily have appeared on this chart 15, or even 10, years ago. However, nowadays this is often an important (and large) department in many organisations. Digital departments leverage data heavily; both what they gather themselves and and data drawn from other parts of the organisation. This can be to show customers their transactions, to guide next best actions, or to suggest potentially useful products or services. Given this, collaboration with the Data Function should be particularly strong.

Change Management

There are some specific points to make with respect to Change collaboration. One dimension of this was covered in Part II. Looking at things the other way round, as well as being a regular department, with what are laughingly referred to as “business as usual” responsibilities [1], the Data Function will also drive a number of projects and programmes. Depending on how this is approached in an organisation, this means either that the Data Function will need its own Project Managers etc., or to have such allocated from Change. This means that interactions with Change are bidirectional, which may be particularly challenging.

For some reason, Change departments have often ended up holding the purse strings for all projects and programmes (perhaps a less than ideal outcome), so a Data Function looking to get its own work done may run counter to this (see also the second section of this article).

IT

While the role of IT is perhaps narrower nowadays than historically [2], they are deeply involved in the world of data and the infrastructure that supports its movement around the organisation. This means that the Data Function needs to pay particular attention to its relationship with IT.

Embedded Analytics Teams

A wholly centralised approach to delivering Analytics is neither feasible, nor desirable. I generally recommend hybrid arrangements with a strong centralised group and affiliated analytical resource embedded in business teams. In some organisations such people may be part of the Data Function, or have a dotted line into it. In others the connection may be less formal. Whatever the arrangements, the best result would be if embedded analytical staff viewed themselves as part of a broader analytical and data community, which can share tips, work to standards and leverage each other’s work.

Data Stewards

Data Stewards are a concept that arises from a requirement to embed Data Governance policies and processes. Data Function Governance staff and Data Architects both need to work closely with Data Stewards. A definition is as follows:

This is a concept that arises out of Data Governance. It recognises that accountability for things like data quality, metadata and the implementation of data policies needs to be devolved to business departments and often locations. A Data Steward is the person within a particular part of an organisation who is responsible for ensuring that their data is fit for purpose and that their area adheres to data policies and guidelines.

Data and Analytics Dictionary entry: Data Steward

End User Computing

There are several good reasons for engaging with this area. First, the various EUCs that have been developed will embody some element (unsatisfied elsewhere) of requirements for the processing and or distribution of data; these needs probably need to be met. Second, EUCs can present significant risks to organisations (as well as delivering significant benefits) and ameliorating these (while hopefully retaining the benefits) should be on the list of any Data Function. Third, the people who have built EUCs tend to be knowledgeable about an organisation’s data, the sort of people who can be useful sources of information and also potential allies.

[End User Computing] is a term used to cover systems developed by people other than an organisation’s IT department or an approved commercial software vendor. It may be that such software is developed and maintained by a small group of people within a department, but more typically a single person will have created and cares for the code. EUCs may be written in mainstream languages such as Java, C++ or Python, but are frequently instead Excel- or Access-based, leveraging their shared macro/scripting language, VBA (for Visual Basic for Applications). While related to Microsoft Visual Basic (the precursor to .NET), VBA is not a stand-alone language and can only run within a Microsoft Office application, such as Excel.

Data and Analytics Dictionary entry: End User Computing (EUC)

Third Party Providers

Often such organisations may be contracted through the IT function; however the Data Function may also hire its own consultants / service providers. In either case, the Data Function will need to pay similar attention to external groups as it does to internal service providers.

Building a Data Function for the Practical Man [3]

Starting Small

It is a truth universally acknowledged, that a Leader newly in possession of a Data Function, must be in want of some staff [5]. However seldom will such a person be furnished with a budget and headcount commensurate with the task at hand; at least in the early days. Often instead, the mission, should you choose to accept it, is to begin to make a difference in the Data World with a skeleton crew at best [6]. Well no one can work miracles and so it is a question of judgement where to apply scarce resource.

My view is that this is best applied in shining a light on the existing data landscape, but in two ways. First, at the Analytics end of the spectrum, looking to unearth novel findings from an organisation’s data; the sort of task you give to a capable Data Scientist with some background in the industry sector they are operating in. Second, at the Governance end of the spectrum, documenting failures in existing data processing and reporting; in particular any that could expose the organisation to specific and tangible risks. In B2C organisations, an obvious place to look is in customer data. In B2B ones instead you can look at transactions with counterparties, or in the preparation of data for external reports, either Financial or Regulatory. Here the ideal person is a competent Data Analyst with some knowledge of the existing data landscape, in particular the compromises that have to be made to work with it.

In both cases, the objective is to tell the organisation things it does not know. Positively, a glimmer of what nuggets its data holds and the impact this could have. Negatively, examples of where a poor data landscape leads to legal, regulatory, or reputational risks.

These activities can add value early on and increase demand for more of this type of work. The first investigation can lead to the creation of a Data Science team, the second to the establishment of regular Data Audits and people to run these.

A corollary here is a point that I ceaselessly make, data exploitation and data control are two sides of the same coin. By making progress in areas that are at least superficially at antipodal locations within a Data Function, the connective tissue between them becomes more apparent.

BAU or Project?

There is a pernicious opinion held by an awful lot of people which goes as follows.

1. We have issues with our data, its quality, completeness and fitness for purpose.
2. We do not do a good enough job of leveraging our data to guide decision making.
3. Therefore we need a data project / programme to sort this out once and for all.
4. Where is the telephone number of the Change Director?

Well there is some logic to the above and setting up a data project (more likely programme) is a helpful thing to do. However, this is necessary, but not sufficient [7]. Let’s think of a comparison?

1. We need to ensure that our Financial and Management accounts are sound.
3. Therefore we need a Finance project / programme to sort this out once and for all.
4. Where is the telephone number of the Change Director?

Most CFOs would view the above as their responsibility. They have an entire function focussed on such matters. Of course they may want to run some Finance projects and Change will help with this, but a Finance Department is an ongoing necessity.

To pick another example one that illustrates just how quickly the make-up of organisations can change, just replace the word “Finance” with “Risk” in the above and “CFO” with “CRO”. While programmes may be helpful to improve either Risk or Finance, they do not run the Risk or Finance functions, the designated officers do and they have a complement of staff to assist them. It is exactly the same with data. Data programmes will enhance your use of data or control of it, but they will not ensure the day-to-day management and leverage of data in your organisation. Running “data” is the responsibility of the designated officer [8] and they should have a complement of staff to assist them as well.

The Data Function is a “business as usual” [9] function. Conveying this fact to a range of stakeholders is going to be one of the first challenges. It may be that the couple of examples I cite above can provide some ammunition for this task.

Demolishing Demoralising Demarcations

With Data Functions and their leaders both being relative emergent phenomena [10], the separation of duties between them and other areas of a business that also deal with data can be less than clear. Scanning down the Related Areas column of the overall Data Function chart, three entities stand out who may feel that they have a strong role to play in data matters: Digital, Change Management and IT.

Of course each is correct and collaboration is the best way forward. However, human nature is not always do benign and I have several times seen jockeying for position between Data, Digital, Change and IT. Route A to resolving this is of course having clarity as to everyone’s roles and a lead Executive (normally a CEO or COO) who ensures that people play nicely with each other. Back in the real world, it will be down to the leaders in each of these areas to forge some sort of consensus about who does what and why. It is probably best to realise this upfront, rather than wasting time and effort lobbying Executives to rule on things they probably have no intention of ruling on.

Nascent Data Function leaders should be aware that there will be a tendency for other teams to carve out what might be seen as the sexier elements of Data work; this can almost seem logical when – for example – a Digital team already has a full complement of web analytics staff; surely it is just a matter of pointing these at other internal data sets, right?

If we assume that the Data Function is the last of the above mentioned departments to form, then “zero sum game” thinking would dictate that whatever is accretive to the Data Function is deleterious to existing data staff in other departments. Perhaps a good place to start in combatting this mind-set is to first acknowledge it and second to take steps to allay people’s fears. It may well make sense for some staff to gravitate to the Data Function, but only if there is a compelling logic and only if all parties agree. Offering the leaders of other departments joint decision-making on such sensitive issues can be a good confidence-building step.

Setting out explicitly to help colleagues in other departments, where feasible to do so, can make very good sense and begin the necessary work of building bridges. As with most areas of human endeavour, forging good relationships and working towards the common good are both the right thing to do and put the Data Function leader in a good place as and when more contentious discussions arise.

To make this concrete, when people in another function appear to be stepping on the toes of the Data Function, instead of reacting with outrage, it may be preferable to embrace and fully understand the work that is being done. It may even make sense to support such work, even if the ultimate view is to do things a bit differently. Insisting on organisational purity and a “my way, or the highway” attitude to data matters are both steps towards a failed Data Function. Instead, engage, listen, support and – maybe over time – seek to nudge things towards your desired state.

Closing Thoughts

So we have reached the end of our anatomical journey. While maybe the information contained in these three articles would pale into insignificance compared to an actual course in human anatomy, we have nevertheless covered five main work-areas within a Data Function, splitting these down into nineteen sub-areas and cataloguing eight functions with which collaboration will be key in driving success. I have also typed over 8,000 words to convey my ideas. For those who have read all of them, thank you for your perseverance; I hope that the effort has been worthwhile and that you found some of my opinions thought-provoking.

I would also like to thank the various people who have provided positive feedback on this series via LinkedIn and Facebook. Your comments were particularly influential in shaping this final chapter.

So what are the main takeaways? Well first the word collaboration has cropped up a lot and – because data is so pervasive in organisations – the need to collaborate with a wide variety of people and departments is strong. Second, extending the human anatomy analogy, while each human shares a certain basic layout (upright, bipedal, two arms, etc.), there is considerable variation within the basic parameters. The same goes for the organogram of a Data Function that I have presented at the beginning of each of these articles. The boxes may be rearranged in some organisations, some may not sit in the Data Function in others, the amount of people allocated to each work-area will vary enormously. As with human anatomy, grasping the overall shape is more important than focussing on the inevitable variations between different people.

Third, a central concept is of course that a Data Function is necessary, not just a series of data-centric projects. Even if it starts small, some dedicated resource will be necessary and it would probably be foolish to embark on a data journey without at least a skeleton crew. Fourth, in such straitened circumstances, it is important to point early and clearly to the value of data, both in reducing potentially expensive risks and in driving insights that can save money, boost market share or improve products or services. If the budget is limited, attend to these two things first.

A fifth and final thought is how little these three articles have focussed on technology. Hadoop clusters, data visualisation suites and data governance tools all have their place, but the success or failure of data-centric work tends to pivot on more human and process considerations. This theme of technology being the least important part of data work is one I have come back to time and time again over the nine years that this blog has been published. This observation remains as true today as back in 2008.

 Part I Part II Part III

Notes

 [1] BAU should in general be filed along with other mythical creatures such as Unicorns, Bigfoot, The Kraken and The Loch Ness Monster. [2] Not least because of the rise of Data Functions, Digital Teams and stand-alone Change Organisations. [3] A title borrowed from J E Thompson’s Calculus for the Practical Man; a tome read by the young Richard Feynman in childhood. Today “Calculus for the Practical Person” might be a more inclusive title. [4] Also known as “pulling yourself up by your bootstraps”. [5] I seem to be channelling JA a lot at present – see A truth universally acknowledged…. [6] Indeed I have stated on this particular journey with just myself for company on no fewer than for occasions (these three 1, 2, 3, plus at Bupa). [7] Once a Mathematician, always a Mathematician. [8] See Alphabet Soup for some ideas about what he or she might be called. [9] See note 1. [10] Despite early high-profile CDOs beginning to appear at the turn of the millennium – Joe Bugajski was appointed VP and Chief Data Officer at Visa International in 2001 (Wikipedia).

From: peterjamesthomas.com, home of The Data and Analytics Dictionary