The title of this article is borrowed from a piece published by recruitment consultants La Fosse Associates earlier in the year. As its content consisted of me being interviewed by their Senior Managing Consultant, Liam Grier, I trust that I won’t get accused of plagiarism. Liam and I have known each other for years and so I was happy to work with him on this interview.
As part of my consulting business, I end up thinking about Data Capability Frameworks quite a bit. Sometimes this is when I am assessing current Data Capabilities, sometimes it is when I am thinking about how to transition to future Data Capabilities. Regular readers will also recall my tripartite series on The Anatomy of a Data Function, which really focussed more on capabilities than purely organisation structure .
Detailed frameworks like the one contained in Anatomy are not appropriate for all audiences. Often I need to provide a more easily-absorbed view of what a Data Function is and what it does. The exhibit above is one that I have developed and refined over the last three or so years and which seems to have resonated with a number of clients. It has – I believe – the merit of simplicity. I have tried to distil things down to the essentials. Here I will aim to walk the reader through its contents, much of which I hope is actually self-explanatory.
The overall arrangement has been chosen intentionally, the top three areas are visible activities, the bottom three are more foundational areas , ones that are necessary for the top three boxes to be discharged well. I will start at the top left and work across and then down.
Collation of Data to provide Information
This area includes what is often described as “traditional” reporting , Dashboards and analysis facilities. The Information created here is invaluable for both determining what has happened and discerning trends / turning points. It is typically what is used to run an organisation on a day-to-day basis. Absence of such Information has been the cause of underperformance (or indeed major losses) in many an organisation, including a few that I have been brought in to help. The flip side is that making the necessary investments to provide even basic information has been at the heart of the successful business turnarounds that I have been involved in.
The bulk of Business Intelligence efforts would also fall into this area, but there is some overlap with the area I next describe as well.
Leverage of Data to generate Insight
In this second area we have disciplines such as Analytics and Data Science. The objective here is to use a variety of techniques to tease out findings from available data (both internal and external) that go beyond the explicit purpose for which it was captured. Thus data to do with bank transactions might be combined with publically available demographic and location data to build an attribute model for both existing and potential clients, which can in turn be used to make targeted offers or product suggestions to them on Digital platforms.
It is my experience that work in this area can have a massive and rapid commercial impact. There are few activities in an organisation where a week’s work can equate to a percentage point increase in profitability, but I have seen insight-focussed teams deliver just that type of ground-shifting result.
Control of Data to ensure it is Fit-for-Purpose
This refers to a wide range of activities from Data Governance to Data Management to Data Quality improvement and indeed related concepts such as Master Data Management. Here as well as the obvious policies, processes and procedures, together with help from tools and technology, we see the need for the human angle to be embraced via strong communications, education programmes and aligning personal incentives with desired data quality outcomes.
The primary purpose of this important work is to ensure that the information an organisation collates and the insight it generates are reliable. A helpful by-product of doing the right things in these areas is that the vast majority of what is required for regulatory compliance is achieved simply by doing things that add business value anyway.
Data Architecture / Infrastructure
Best practice has evolved in this area. When I first started focussing on the data arena, Data Warehouses were state of the art. More recently Big Data architectures, including things like Data Lakes, have appeared and – at least in some cases – begun to add significant value. However, I am on public record multiple times stating that technology choices are generally the least important in the journey towards becoming a data-centric organisation. This is not to say such choices are unimportant, but rather that other choices are more important, for example how best to engage your potential users and begin to build momentum .
Having said this, the model that seems to have emerged of late is somewhat different to the single version of the truth aspired to for many years by organisations. Instead best practice now encompasses two repositories: the first Operational, the second Analytical. At a high-level, arrangements would be something like this:
The Operational Repository would contain a subset of corporate data. It would be highly controlled, highly reconciled and used to support both regular reporting and a large chunk of dashboard content. It would be designed to also feed data to other areas, notably Finance systems. This would be complemented by the Analytical Repository, into which most corporate data (augmented by external data) would be poured. This would be accessed by a smaller number of highly skilled staff, Data Scientists and Analytics experts, who would use it to build models, produce one off analyses and to support areas such as Data Visualisation and Machine Learning.
It is not atypical for Operational Repositories to be SQL-based and Analytical Repsoitories to be Big Data-based, but you could use SQL for both or indeed Big Data for both according to the circumstances of an organisation and its technical expertise.
Data Operating Model / Organisation Design
Here I will direct readers to my (soon to be updated) earlier work on The Anatomy of a Data Function. However, it is worth mentioning a couple of additional points. First an Operating Model for data must encompass the whole organisation, not just the Data Function. Such a model should cover how data is captured, sourced and used across all departments.
Second I think that the concept of a Data Community is important here, a web of like-minded Data Scientists and Analytics people, sitting in various business areas and support functions, but linked to the central hub of the Data Function by common tooling, shared data sets (ideally Curated) and aligned methodologies. Such a virtual data team is of course predicated on an organisation hiring collaborative people who want to be part of and contribute to the Data Community, but those are the types of people that organisations should be hiring anyway .
Our final area is that of Data Strategy, something I have written about extensively in these pages  and a major part of the work that I do for organisations.
It is an oft-repeated truism that a Data Strategy must reflect an overarching Business Strategy. While this is clearly the case, often things are less straightforward. For example, the Business Strategy may be in flux; this is particularly the case where a turn-around effort is required. Also, how the organisation uses data for competitive advantage may itself become a central pillar of its overall Business Strategy. Either way, rather than waiting for a Business Strategy to be finalised, there are a number of things that will need to be part of any Data Strategy: the establishment of a Data Function; a focus on making data fit-for-purpose to better support both information and insight; creation of consistent and business-focussed reporting and analysis; and the introduction or augmentation of Data Science capabilities. Many of these activities can help to shape a Business Strategy based on facts, not gut feel.
More broadly, any Data Strategy will include: a description of where the organisation is now (threats and opportunities); a vision for commercially advantageous future data capabilities; and a path for moving between the current and the future states. Rather than being PowerPoint-ware, such a strategy needs to be communicated assiduously and in a variety of ways so that it can be both widely understood and form a guide for data-centric activities across the organisation.
As per my other articles, the data capabilities that a modern organisation needs are broader and more detailed than those I have presented here. However, I have found this simple approach a useful place to start. It covers all the basic areas and provides a scaffold off of which more detailed capabilities may be hung.
The framework has been informed by what I have seen and done in a wide range of organisations, but of course it is not necessarily the final word. As always I would be interested in any general feedback and in any suggestions for improvement.
In passing, Anatomy is due for its second refresh, which will put greater emphasis on Data Science and its role as an indispensable part of a modern Data Function. Watch this space.
Though nowadays you hear “traditional” Analytics and “traditional” Big Data as well (on the latter see Sic Transit Gloria Magnorum Datorum), no doubt “traditional” Machine Learning will be with us at some point, if it isn’t here already.
This is the second year in which I have produced a retrospective of my blogging activity. As in 2017, I have failed miserably in my original objective of posting this early in January. Despite starting to write this piece on 18th December 2018, I have somehow sneaked into the second quarter before getting round to completing it. Maybe I will do better with 2019’s highlights!
Anyway, 2018 was a record-breaking year for peterjamesthomas.com. The site saw more traffic than in any other year since its inception; indeed hits were over a third higher than in any previous year. This increase was driven in part by the launch of my new Maths & Science section, articles from which claimed no fewer than 6 slots in the 2018 top 10 articles, when measured by hits . Overall the total number of articles and new pages I published exceeded 2017’s figures to claim the second spot behind 2009; our first year in business.
As with every year, some of my work was viewed by tens of thousands of people, while other pieces received less attention. This is my selection of the articles that I enjoyed writing most, which does not always overlap with the most popular ones. Given the advent of the Maths & Science section, there are now seven categories into which I have split articles. These are as follows:
In each category, I will pick out one or two pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.
Two Forbes articles argue different perspectives about the role of Chief Data Officer. The first (by Lauren deLisa Coleman) stresses its importance, the second (by Randy Bean) highlights some of the challenges that CDOs face.
Many companies want to become data driven, but getting started on the journey towards this goal can be tough. This article offers a framework for building momentum in the early stages of a Data Programme.
The number π is surrounded by a fog of misunderstanding and even mysticism. This article seeks to address some common misconceptions about π, to show that in many ways it is just like any other number, but also to demonstrate some of its less common properties.
One of the more recent chapters in my forthcoming book on Group Theory and Particle Physics. This focuses on the seminal contributions of Mathematician Emmy Noether to the fundamentals of Physics and the connection between Symmetry and Conservation Laws.
This Fox has a longing for grapes:
He jumps, but the bunch still escapes.
So he goes away sour;
And, ’tis said, to this hour
Declares that he’s no taste for grapes.
— W.J.Linton (after Aesop)
Not all of the organisations I have worked with or for have had a C-level Executive accountable primarily for Marketing. Where they have, I have normally found the people holding these roles to be better informed about data matters than their peers. I have always found it easy and enjoyable to collaborate with such people. The same goes in general for Marketing Managers. This article is not about Marketing professionals, it is about poorly researched journalism.
Chief data officers (ditto) are becoming increasingly common, but for a data strategy to work their appointments can only ever be a temporary fix.
Intrigued, I felt I had to avail myself of the wisdom and domain expertise contained in the article (the clickbait worked of course). The first few paragraphs reveal the actual motivation. The piece is a reaction  to the most senior Marketing person at easyJet being moved out of his role, which is being abolished, and – as part of the same reorganisation – a Chief Data Officer (CDO) being appointed. Now the first thing to say, based on the article’s introductory comments, is that easyJet did not have a Chief Marketing Officer. The role that was abolished was instead Chief Commercial Officer, so there was no one charged full-time with Marketing anyway. The Marketing responsibilities previously supported part-time by the CCO have now been spread among other executives.
The next part of the article covers the views of a Marketing Week columnist (pause for irony) before moving on to arrangements for the management of data matters in three UK-based organisations:
Flubit – a growing on-line marketplace aiming to compete with Amazon
The first two of these have CDOs (albeit with one doing the role alongside other responsibilities). Both of these people:
[…] come at data as people with backgrounds in its use in marketing
Flubit does not have a CDO, which is used as supporting evidence for the superfluous nature of the role .
Suffice it to say that a straw poll consisting of the handful of organisations that the journalist was able to get a comment from is not the most robust of approaches . Most of the time, the article does nothing more than to reflect the continuing confusion about whether or not organisations need CDOs and – assuming that they do – what their remit should be and who they should report to .
But then, without it has to be said much supporting evidence, the piece goes on to add that:
Most [CDOs – they would probably style it “Cdos”] are brought in to instill a data strategy across the business; once that is done their role should no longer be needed.
Now as a Group Theoretician, I am a great fan of symmetry. Symmetry relates to properties that remain invariant when something else is changed. Archetypally, an equilateral triangle is still an equilateral triangle when rotated by 120° . More concretely, the laws of motion work just fine if we wind the clock forward 10 seconds (which incidentally leads to the principle of conservation of energy ).
Let’s assume that the Marketing Week assertion is true. I claim therefore that it must be still be true under the symmetry of changing the C-level role. This would mean that the following also has to be true:
Most [Chief marketing officers] are brought in to instill a marketing strategy across the business; once that is done their role should no longer be needed.
Now maybe this statement is indeed true. However, I can’t really see the guys and gals at Marketing Week agreeing with this. So maybe it’s false instead. Then – employing reductio ad absurdum – the initial statement is also false .
If you don’t work in Marketing, then maybe a further transformation will convince you:
Most [Chief financial officers] are brought in to instill a finance strategy across the business; once that is done their role should no longer be needed.
I could go on, but this is already becoming as tedious to write as it was to read the original Marketing Week claim. The closing sentence of the article is probably its most revealing and informative:
[…] marketers must make sure they are leading [the data] agenda, or someone else will do it for them.
I will leave readers to draw their own conclusions on the merits of this piece and move on to other thoughts that reading it spurred in me.
Sometimes buried in the strangest of places you can find something of value, even if the value is different to the intentions of the person who buried it. Around some of the CDO forums that I attend  there is occasionally talk about just the type of issue that Marketing Week raises. An historical role often comes up in these discussions is that of Chief Electrification Officer . This supposedly was an Executive role in organisations as the 19th Century turned into the 20th and electricity grids began to be created. The person ostensibly filling this role would be responsible for shepherding the organisation’s transition from earlier forms of power (e.g. steam) to the new-fangled streams of electrons. Of course this role would be very important until the transition was completed, after that redundancy surely beckoned.
Well to my way of thinking, there are a couple of problems here. The first one of these is alluded to by my choice of the words “supposedly” and “ostensibly” above. I am not entirely sure, based on my initial research , that this role ever actually existed. All the references I can find to it are modern pieces comparing it to the CDO role, so perhaps it is apochryphal.
The second is somewhat related. Electrification was an engineering problem, indeed it the [US] National Academy of Engineering called it “the greatest engineering achievement of the 20th Century”. Surely the people tackling this would be engineers, potentially led by a Chief Engineer. Did the completion of electrification mean that there was no longer a need for engineers, or did they simply move on to the next engineering problem ?
Extending this analogy, I think that Chief Data Officers are more like Chief Engineers than Chief Electrification Officers, assuming that the latter even exists. Why the confusion? Well I think part of it is because, over the last decade and a bit, organisations have been conditioned to believe the one dimensional perspective that everything is a programme or a project . I am less sure that this applies 100% to the CDO role.
It may well be that one thing that a CDO needs to get going is a data transformation programme. This may purely be focused on cultural aspects of how an organisation records, shares and otherwise uses data. It may be to build a new (or a first) Data Architecture. It may be to remediate issues with an existing Data Architecture. It may be to introduce or expand Data Governance. It may be to improve Data Quality. Or (and, in my experience, this is often the most likely) a combination of all these five, plus other work, such as rapid tactical or interim deliveries. However, there is also a large element of data-centric work which is not project-based and instead falls into the category often described as “business as usual” (I loathe this term – I think that Data Operations & Technology is preferable). A handful of examples are as follows (this is not meant to be an exhaustive list) :
Addressing architectural debt that results from neglect of a Data Assets or the frequently deleterious impact of improperly governed change portfolios . This is often a series of small to medium-sized changes, rather than a project with a discrete scope and start and end dates.
More positively, engaging proactively in the change process in an attempt to act as a steward of Data Assets.
Testing and re-testing of Data facilities subject to change or change in source Data.
Providing training in the use of Data facilities or the importance of getting Data right-first-time.
The above all point to the need for an ongoing Data Function to meet these needs (and to form the core resources of any data programme / project work). I describe such a function in my series about The Anatomy of a Data Function.
There are of course many other such examples, but instead of cataloguing each of them, let’s return to what Marketing Week describe as the central responsibility of a CDO, to formulate a Data Strategy. Surely this is a one-off activity, right?
Well is the Marketing strategy set once and then never changed? If there is some material shift in the overall Business strategy, might the Marketing strategy change as a result? What would be the impact on an existing Marketing strategy of insight showing that this was being less than effective; might this lead to the development of a new Marketing strategy? Would the Marketing strategy need to be revised to cater for new products and services, or new segments and territories? What would be the impact on the Marketing strategy of an acquisition or divestment?
As anyone who has spent significant time in the strategy arena will tell you, it is a fluid area. Things are never set in stone and strategies may need to be significantly revised or indeed abandoned and replaced with something entirely new as dictated by events. Strategy is not a fire and forget exercise, not if you want it to be relevant to your business today, as opposed to a year ago. Specifically with Data Strategy (as I explain in Building Momentum – How to begin becoming a Data-driven Organisation), I would recommend keeping it rather broad brush at the begining of its development, allowing it to be adpated based on feedback from initial interim work and thus ensuring it better meets business needs.
So expecting that a Data Strategy (or any other type of strategy) to be done and dusted, with the key strategist dispensed with, is probably rather naive.
It would be really nice to think that sorting out their Data problems and seizing their Data opportunities are things that organisations can do once and then forget about. With twenty years experience of helping organisations to become more Data-centric, often with technical matters firmly in the background, I have to disabuse people of this all too frequent misconception. To adapt the National Canine Defence League’s [15 long-lived slogan from 1978:
A Chief Data Officer is for life, not just for Christmas.
With that out of the way, I’m off to write a well-informed and insightful article about how Marketing Departments should go about their business. Wish me luck!
I first wrote “knee-jerk reaction” and then thought that maybe I was being unkind. “When they go low, we go high” is a better maxim. Note: link opens a YouTube video.
I am sure that I read somewhere about the importance of the number of data points in any analysis, maybe I should ask a Data Scientist to remind me about this.
For a more balanced view of what real CDOs do, please take a look at my ongoing series of in-depth interviews.
Or Chief Electrical Officer, or Chief Electricity Officer.
I am doing some more digging and will of course update this piece should I find the evidence that has so far been elusive.
Self-driving electric cars come to mind of course. That or running a Starship.
As an aside, where do Programme Managers go when (or should that be if) their Programmes finish?
It might be argued that some of these operational functions could be handed to IT. However, given that some elements of data functions have probably been carved out of IT in the past, this might be a retrograde step.
The peterjamesthomas.com Data and Analytics Dictionary is an active document and I will continue to issue revised versions of it periodically. Here are 20 new definitions, including the first from other contributors (thanks Tenny!):
People are now also welcome to contribute their own definitions. You can use the comments section here, or the dedicated form. Submissions will be subject to editorial review and are not guaranteed to be accepted.
Recent trends have begun to drive a confluence between Mathematics and aspects of the data arena, notably to do with Data Science and Artificial Intelligence approaches like Machine Learning. For this reason, I will periodically share selected articles from the Maths & Science Section here on the main site. I hope that they are of interest to at least some regular readers. I’ll kick this off with three, interconnecting articles about one of the most important numbers in Mathematics, , one of the most misunderstood, , and finally their beautiful relationship to each other .
The number π is surrounded by a fog of misunderstanding and even mysticism. This article seeks to address some common misconceptions about π, to show that in many ways it is just like any other number, but also to demonstrate some of its less common properties.
The rather famous tautology, “It’s déjà vu all over again”, has of course been ascribed to that darling of malapropisms, baseball catcher Yogi Berra. The phrase came to mind for me today when coming across the following exhibit:
The text in the above exhibit is not that clear , so here are the 20 top challenges  faced by those running Data Science teams in human-readable form:
Lack of Data Science talent in the organization
Company politics / Lack of management/financial support for a Data Science team
The lack of a clear question to be answering or a clear direction to go with available data
Unavailability of/difficult access to data
Data Science results not used by business decision makers
Explaining Data Science to others
Lack of significant domain expert input
Organization is small and cannot afford a Data Science team
Team using multiple ad hoc development environments such as Python/R/Java etc.
Limitations of tools
Need to coordinate with IT
Maintaining responsible expectations about the potential impact of Data Science projects
Inability to integrate findings into organization’s decision-making process
Lack of funds to buy useful datasets from external sources
Difficulties in deployment/scoring
Scaling Data Science solution up to full database
Limitations in the state of the art in machine learning
Did not instrument data useful for scientific analysis and decision-making
I prefer not to say
The table above is a transcription of a transcription, so it would be remarkable if no Data Quality issues had crept in, however let’s assume that the figures are robust enough for our purposes. Of course the people surveyed will have reported multiple issues, so the percentages above are not additive. Nevertheless there are some very obvious comments to be made (some of the above items are pertinent to more than one of the points I would like to make):
Data Quality / Availability remain major issues – (1, 5, and 8)
It is indeed true that Machine Learning can be quite good at dealing with some types or bad or missing data. But no technology or approach is going to be able to paper over all of the cracks if your data is essentially incomplete and of poor quality. This point (together with some others below) speaks to the need to not approach Data Science on a stand-alone basis, but as part of a more holistic approach to data matters .
The Human angle and a focus on Culture are imperative – (3, 6, 7, 14 and 15)
Findings are one thing; using these to take action is quite another. At the end of the day, most ventures are successful or fail because of people; the people conducting the venture, the people receiving its intended benefits and so on. Ignore this dimension of data work (or any type of work) at your peril .
Business Questions amd Business Involvement matter – (4, 6, 9 and 15)
While in some circumstances the data can indeed “speak for itself”, it makes a lot more sense for Data Scientists to partner with business colleagues to both get direction and to help ensure that their findings lead to action .
Tools & Technology typically Trumped – (11, 12 and 18)
These first appear outside of the Top 10 (and 11 is a bit dubious to include here – it relates more to a proliferation of tools than to issues with any of them). I would never say that tools and technology are unimportant, but they are typically much less important than other considerations .
The overriding point is of course that – much as I noted out recently in Convergent Evolution – there is little new under the Sun. A survey of Business Intelligence / Data Warehousing professionals back in 2010 would have generated something very like the list above. A survey of EIS  professionals back in 2000 would have done the same.
The important things to do – regardless of the technologies and approaches employed – are to:
Understand what questions are key to the running of an organisation 
Determine what data is available to support decisions in these key areas
Ensure that the data is in a “good enough” state, appropriately consolidated / made consistent, augmented / corrected by any useful external data and made available to the right people in a timely manner
Focus on the human aspects of acting on what data is telling us and how to use data outputs to drive positive actions
Here too, little is new under the Sun. I have been referring to essentially these same four pillars of good practice since the mid 2000s. Some of our technological advances since then have been amazing. The prospect of leveraging the power of both Data Science and Artificial Intelligence in a business context is very exciting. But to truly succeed with these newer approaches, it helps to recall the eternal verities that have always underpinned good data-centric work . The survey above makes this point crystal clear.
A final corollary to this observation is something I covered in A truth universally acknowledged…. The replies to the Kaggle survey highlight the fact that, much like the conductor of an orchestra does not need to be able to play the violin to a virtuoso level, people leading Data Science teams (and broader Data Functions) need a set of rounded skills, ones honed to address the types of issues appearing in the exhibits above. The skill-set that makes for an excellent Data Scientist does not necessarily help so much with many of the less technical issues that will determine the success or failure of Data Science teams.
Other Yogi-isms included, “Always go to other people’s funerals; otherwise they won’t go to yours”, “You can observe a lot by watching” and “If you can’t imitate him, don’t copy him”.
A Data Visualisation challenge to include that much text I realise. I think I might have been tempted to come up with pithier categories to aid legibility.
Today I am talking to Christopher Bannocks, who is Group Chief Data Officer at ING. ING is a leading global financial institution, headquartered in the Netherlands. As stressed in other recent In-depth interviews , data is a critical asset in banking and related activities, so Christopher’s role is a pivotal one. I’m very glad that he has been able to find time in his busy calendar to speak to us.
Hello Christopher, can you start by providing readers with a flavour of your career to date and perhaps also explain why you came to focus on the data arena.
Sure, it’s probably right to say I didn’t start out here, data was not my original choice, and for anyone of a similar age to me, data wasn’t a choice, when I started out, in that respect it’s a “new segment”. I started out on a management development programme in a retail bank in the UK, after which I moved to be an operations manager in investment banking. As part of that time in my career, post Euro migration and Y2K (yes I am genuinely that old, I also remember Vinyl records and Betamax video!)  I was asked to help solve the data problem. What I recognised very quickly was this was an area with under-investment, that was totally central the focus of that time – STP (Straight Through Processing). Equally it provided me with much broader perspectives, connections to all parts of the organisation that I previously didn’t have and it was at that point, some 20 years ago, that I decided this was the thing for me! I have since run and driven transformation in Reference Data, Master Data, KYC , Customer Data, Data Warehousing and more recently Data Lakes and Analytics, constantly building experience and capability in the Data Governance, Quality and data services domains, both inside banks, as a consultant and as a vendor.
I am trying to get a picture of the role and responsibilities of the typical CDO (not that there appears to be such a thing), so would you mind touching on the span of your work at ING? I know you have a strong background in Enterprise Data Management, how does the CDO role differ from this area?
I guess that depends on how you determine the scope of Enterprise Data Management. However, in reality, the CDO role encompasses Enterprise Data Management, although generally speaking the EDM role includes responsibility for the day to day operations of the collection processes, which in my current role I don’t have. I have accountability for the governance and quality through those processes and for making the data available for downstream consumers, like Analytics, Risk, Finance and HR.
My role encompasses being the business driver for the data platform that we are rolling out across the organisation and its success in terms of the data going onto the platform and the curation of that data in a governed state, depending on the consumer requirements.
My role today boils down to 4 key objectives – data availability, data transparency, data quality and data control.
I know that ING consists of many operating areas and has adopted a federated structure with respect to data matters. What are the strengths of this approach and how does it work on a day-to-day basis?
This approach ensures that the CDO role (I have a number of CDOs functionally reporting to me) remains close to the business and the local entity it supports, it ensures that my management team is directly connected to the needs of the business locally, and that the local businesses have a direct connection to the global strategy. What I would say is that there is no “one size fits all” approach to the CDO organisation model. It depends on the company culture and structure and it needs to fit with the stated objectives of the role as designed.
On a day to day basis, we are aligned with the business units and the functional units so we have CDOs in all of these areas. Additionally I have a direct set of reports who drive the standard solutions around tooling, governance, quality, data protection, Data Ethics, Metadata and data glossary and models.
Helping organisations become “data-centric” is a key part of what you do. I often use this phrase myself; but was recently challenged to elucidate its meaning. What does a “data-centric” organisation look like to you? What sort of value does data-centricity release in your experience?
Data centric is a cultural shift, in the structures of the past where we have technology people and process, we now have data that touches all three. You know if you have reached the right place when data becomes part of the decision making process across the organisation, when decisions are only made when data is presented to support it and this is of the requisite quality. This doesn’t mean all decisions require data, some decisions don’t have data and that’s where leaderships decisions can be made, but for those decisions that have good data to support them, these can be made easily and at a lower level in the organisation. Hence becoming data centric supports an agile organisation and servant / leadership principles, utilising data makes decisions faster and outcomes better.
I am on record multiple times  stating that technology choices are much less important than other aspects of data work. However, it is hard to ignore the impact that Big Data and related technologies have had. A few years into the cycle of Big Data adoption, do you see the tools and approaches yielding the expected benefits? Should I revisit my technology-agnostic stance?
I have also been on record multiple times saying that every data problem is a people problem in disguise. I still hold that this is true today although potentially this is changing. The problems of the past and still to this day originate with poor data stewardship, I saw it happening in front of my eyes last week in Heathrow when I purchased something in a well known electronics store. Because I have an overseas postcode the guy at the checkout put dummy data into all the fields to get through the process quickly and not impact my customer experience, I desperately wanted to stop him but also wanted to catch my plane. This is where the process efficiency impacts good data collection. If the software that supports the process isn’t flexible, the issue won’t be fixed without technology intervention, this is often true in data quality problems which have knock on effects to customers, which at the end of the day are why we are all here. This is a people problem (because who is taking responsibility here for fixing it, or educating that guy at the checkout) AND it’s a technology problem, caused by inflexible or badly implemented systems.
However, in the future, with more focus on customer driven checkout, digital channels and better customer experience, better interface driven data controls and robotics and AI, it may become further nuanced. People are still involved, communication remains critical but we cannot ignore technology in the digital age. For a long time, data groups have struggled with getting access to good tools and technology, now this technology domain is growing daily, and the tools are improving all the time. What we can do now with data at a significantly lower cost than ever before is amazing, and continues to improve all the time. Hence ignoring technology can be costly when extending capabilities to your stakeholders and could be a serious mistake, however focusing only on technology and ignoring people, process, communication etc is also a serious mistake. Data Leaders have to be multi-disciplinary today, and be able to keep up with the pace of change.
I have heard you talk about “data platforms”, what do you mean by this and how do these contrast with another perennial theme, that of data democratisation? How does a “data platform” relate to – say – Data Science teams?
Data democratisation is enabled by the data platform. The data platform is the technology enablement of the four pillars I mentioned before, availability, transparency, quality and control. The platform is a collection of technologies that standardise the approach and access to well governed data across the organisation. Data Democratisation is simply making data available and abstracting away from siloed storage mechanisms, but the platform wraps the implementation of quality, controls and structure to the way that happens. Data Science teams then get the data they need, including data curation services to find the data they need quickly, for governed and structured data, Data Science teams can utilise the glossary to identify what they need and understand the level of quality based on consumer views, they also have access to metadata in standard forms. This empowers the analytics capability to move faster, spend less time on data discovery and curation, structure and quality and more time on building analytics.
I mentioned the federated CDO team at ING above and assume this is reflected in the rest of the organisation structure. ING also has customers in 40 countries and I know first-hand that a global footprint adds complexity. What are the challenges in being a CDO in such an environment? Does this put a higher premium on influencing skills for example?
I am not sure it puts a higher premium on influencing skills, these have a high premium in any CDO role, even if you don’t have a federated structure, the reality is if you are in a data role you have more stakeholders than anyone else in the company, so influencing skills remain premium.
A global footprint means complexity for sure, it means differences in a world where you are trying to standardise and it means you have to be tuned in to cultural differences and boundaries. It also means a great deal of variety, opportunities to learn new cultures and approaches, it means you have to listen and understand and flex your style and it means pragmatism plays an important part in your decision making process.
At ING we have an amazing team of people who collaborate in a way I have never experienced before, supported by a strong attachment and commitment to the success of the business and our customers. This makes dealing with the complexity a team effort, with great energy and a fantastic working environment. In an organisation without the drive and passion we have here it would present challenges, with the support of the board and being a core part of the overall strategy, it ensures broad alignment to the goal, which makes the challenge easier for the organisation to solve, not easy, but easier and more fun.
Building on the last point, every CDO I have interviewed has stressed the importance of relationships; something that chimes with my own experience. How do you go about building strong relationships and maintaining them when inevitable differences of opinion or clashes in interests arise?
I touched on this a little earlier. Pragmatism over purism. I see purist everywhere in data, with views that are so rigid that the execution of them is doomed because purism doesn’t build relationships. Relationships are built based on what you bring and give up, on what you can give, not on what you can get. I try every day to achieve this, but I am human too, so I don’t always get it right, I hope I get it right more than I get it wrong and where I get it wrong I hope I can be forgiven for my intention is pure. We owe it to our customers to work together for their benefit, where we have differences the customer outcomes should drive our decisions, in that we have a common goal. Disagreements can be helped and supported by identifying a common goal, this starts to align people behind a common outcome. Individual interests can be put aside in preference of the customer interest.
I know that you are very interested in data ethics and feel that this is an important area for CDOs to consider. Can you tell the readers a bit more about data ethics and why they should be central to an organisation’s approach to data?
In an increasingly digital world, the use of data is becoming widespread and the pace at which it is used is increasing daily, our compute power grows exponentially as does the availability of data. Given this, we need an ethical framework to help us make good decisions with our customers and stakeholders in mind. How do you ensure that decisions in your organisation about how you use data are ethical? What are ethical decisions in your organisation and what are the guiding principles? If this isn’t clear and communicated to help all staff make good decisions, or have good discussions there is a real danger that decisions may not be properly socialised before all angles are considered.
Just meeting the bar of privacy regulation may not be enough, you can still meet that bar and do things that your customers may disagree with of find “creepy” so the correct thought needs to be applied and the organisation engaged to ensure the correct conversations take place, and there is a place to go to discuss ethics.
I am not saying that there is a silver bullet to solve this problem, but the conversation and the ability to have the conversation in a structured way helps the organisation understand its approach and make good decisions in this respect. That’s why CDOs should consider this an important part of the role and a critical engagement with users of data across the organisation.
Finally, I have worked for businesses with a presence in the Netherlands on a number of occasions. As a Brit living abroad, how have you found Amsterdam. What – if any – adaptations have you had to make to your style to thrive in a somewhat different culture?
Having lived in India, I thought my move to the Netherlands could only be easy. I arrived thinking that a 45 minute flight could not possibly provide as many challenges as an 11 hour flight, especially from a cultural perspective. Of course I was wrong because any move to a different culture provides challenges you could never have expected and it’s the small adjustments that take you by surprise the most. It’s always a hugely enjoyable learning experience though. London is a more top down culture whereas in the Netherlands it’s a much flatter approach, my experience here is positive although it does require an adjustment. I work in Amsterdam but live in a small village, chosen deliberately to integrate faster. It’s harder, more of a challenge but helps you understand the culture as you make friends with local people and get closer to the culture. My wife and I have never been a fan of the expat scene, we prefer to integrate, however more difficult this feels at first, it’s worth it in the long run. I must admit though that I haven’t conquered the language yet, it’s a real work in progress!
Christopher, I really enjoyed our chat, which I believe will also be of great interest to readers. Thank you.
Disclosure: At the time of publication, neither peterjamesthomas.com Ltd. nor any of its Directors had any shared commercial interests with Christopher Bannocks, ING or any entities associated with either of these.
If you are a Chief Data Officer, a Chief Analytics Officer, a Director of Data, or hold some other “Top Data Job” and would like to share your thoughts with the readers of this site in an interview like this one, please get in contact.
The interviews that I conduct with leaders in their fields as part of my “In-depth” series have hopefully brought a new and interesting aspect to this site. However, often the boot is on the other foot and I am the person being interviewed about my experience and expertise in the data field and related matters . Maybe interviewing other people helps me when I am in turn interviewed, maybe it’s the other way round. Whatever the case, I enjoyed recording the two conversations appearing below (thanks to the interviewers in both cases) and hope that the content is of interest to readers.
In both instances a link to the site originally publishing the interview is followed by a locally hosted version of the audio track and then a download option. I’d encourage readers to explore the other excellent interviews contained on both sites.
Enterprise Management 360° Podcast – 31st July 2018
If you would like to interview me for your site or periodical, of if you are just interested in further exploring some of the themes I discuss in these two interviews, then please feel free to get in contact.
Work by the inimitable Randall Munroe, author of long-running web-comic, xkcd.com, has been featured (with permission) multiple times on these pages . The above image got me thinking that I had not penned a data visualisation article since the series starting with Hurricanes and Data Visualisation: Part I – Rainbow’s Gravity nearly a year ago. Randall’s perspective led me to consider that staple of PowerPoint presentations, the humble and much-maligned Pie Chart.
While the history is not certain, most authorities credit the pioneer of graphical statistics, William Playfair, with creating this icon, which appeared in his Statistical Breviary, first published in 1801 . Later Florence Nightingale (a statistician in case you were unaware) popularised Pie Charts. Indeed a Pie Chart variant (called a Polar Chart) that Nightingale compiled appears at the beginning of my article Data Visualisation – A Scientific Treatment.
I can’t imagine any reader has managed to avoid seeing a Pie Chart before reading this article. But, just in case, here is one (Since writing Rainbow’s Gravity – see above for a link – I have tried to avoid a rainbow palette in visualisations, hence the monochromatic exhibit):
The above image is a representation of the following dataset:
The Pie Chart consists of a circle divided in to five sectors, each is labelled A through E. The basic idea is of course that the amount of the circle taken up by each sector is proportional to the count of items associated with each category, A through E. What is meant by the innocent “amount of the circle” here? The easiest way to look at this is that going all the way round a circle consumes 360°. If we consider our data set, the total count is 18,000, which will equate to 360°. The count for A is 4,500 and we need to consider what fraction of 18,000 this represents and then apply this to 360°:
So A must take up 90°, or equivalently one quarter of the total circle. Similarly for B:
Or one sixth of the circle.
If we take this approach then – of course – the sum of all of the sectors must equal the whole circle and neither more nor less than this (pace Randall). In our example:
So far, so simple. Now let’s consider a second data-set as follows:
What does its Pie Chart look like? Well it’s actually rather familiar, it looks like this:
This observation stresses something important about Pie Charts. They show how a number of categories contribute to a whole figure, but they only show relative figures (percentages of the whole if you like) and not the absolute figures. The totals in our two data-sets differ by a factor of over 2,100 times, but their Pie Charts are identical. We will come back to this point again later on.
Pie Charts have somewhat fallen into disrepute over the years. Some of this is to do with their ubiquity, but there is also at least one more substantial criticism. This is that the human eye is bad at comparing angles, particularly if they are not aligned to some reference point, e.g. a vertical. To see this consider the two Pie Charts below (please note that these represent a different data set from above – for starters, there are only four categories plotted as opposed to five earlier on):
The details of the underlying numbers don’t actually matter that much, but let’s say that the left-hand Pie Chart represents annual sales in 2016, broken down by four product lines. The right-hand chart has the same breakdown, but for 2017. This provides some context to our discussions.
Suppose what is of interest is how the sales for each product line in the 2016 chart compare to their counterparts in the right-hand one; e.g. A and A’, B and B’ and so on. Well for the As, we have the helpful fact that they both start from a vertical line and then swing down and round, initially rightwards. This can be used to gauge that A’ is a bit bigger than A. What about B and B’? Well they start in different places and end in different places, looking carefully, we can see that B’ is bigger than B. C and C’ are pretty easy, C is a lot bigger. Then we come to D and D’, I find this one a bit tricky, but we can eventually hazard a guess that they are pretty much the same.
So we can compare Pie Charts and talk about how sales change between two years, what’s the problem? The issue is that it takes some time and effort to reach even these basic conclusions. How about instead of working out which is bigger, A or A’, I ask the reader to guess by what percentage A’ is bigger. This is not trivial to do based on just the charts.
If we really want to look at year-on-year growth, we would prefer that the answer leaps off the page; after all, isn’t that the whole point of visualisations rather than tables of numbers? What if we focus on just the right-hand diagram? Can you say with certainty which is bigger, A or C, B or D? You can work to an answer, but it takes longer than should really be the case for a graphical exhibit.
There is a further point to be made here and it relates to what we said Pie Charts show earlier in this piece. What we have in our two Pie Charts above is the make-up of a whole number (in the example we have been working through, this is total annual sales) by categories (product lines). These are percentages and what we have been doing above is to compare the fact that A made up 30% of the total sales in 2016 and 33% in 2017. What we cannot say based on just the above exhibits is how actual sales changed. The total sales may have gone up or down, the Pie Chat does not tell us this, it just deals in how the make-up of total sales has shifted.
Some people try to address this shortcoming, which can result in exhibits such as:
Here some attempt has been made to show the growth in the absolute value of sales year on year. The left-hand Pie Chart is smaller and so we assume that annual sales have increased between 2016 and 2017. The most logical thing to do would be to have the change in total area of the two Pie Charts to be in proportion to the change in sales between the two years (in this case – based on the underlying data – 2017 sales are 69% bigger than 2016 sales). However, such an approach, while adding information, makes the task of comparing sectors from year to year even harder.
The general argument is that Nested Bar Charts are better for the type of scenario I have presented and the types of questions I asked above. Looking at the same annual sales data this way we could generate the following graph:
While Bar Charts are often used to show absolute values, what we have above is the same “percentage of the whole” data that was shown in the Pie Charts. We have already covered the relative / absolute issue inherent in Pie Charts, from now on, each new chart will be like a Pie Chart inasmuch as it will contain relative (percentage of the whole) data, not absolute. Indeed you could think about generating the bar graph above by moving the Pie Chart sectors around and squishing them into new shapes, while preserving their area.
The Bar Chart makes the yearly comparisons a breeze and it is also pretty easy to take a stab at percentage differences. For example B’ looks about a fifth bigger than B (it’s actually 17.5% bigger) . However, what I think gets lost here is a sense of the make-up of the elements of the two sets. We can see that A is the biggest value in the first year and A’ in the second, but it is harder to gauge what percentage of the overall both A and A’ represent.
To do this better, we could move to a Stacked Bar Chart as follows (again with the same sales data):
Once more, we are dealing with how proportions have changed – to put it simply the height of both “skyscrapers” is the same. If we instead shifted to absolute values, then our exhibit might look more like:
The observant reader will note that I have also added dashed lines linking the same category for each year. These help to show growth. Regardless of what angle to the horizontal the lower line for a category makes, if it and the upper category line diverge (as for B and B’), then the category is growing; if they converge (as for C and C’), the category is shrinking . Parallel lines indicate a steady state. Using this approach, we can get a better sense of the relative size of categories in the two years.
However, here – despite the dashed lines – we lose at least some of of the year-on-year comparative power of the Nested Bar Chart above. In turn the Nested Bar Chart loses some of the attributes of the original Pie Chart. In truth, there is no single chart which fits all purposes. Trying to find one is analogous to trying to find a planar projection of a sphere that preserves angles, distances and areas .
Rather than finding the Philosopher’s Stone  of an all-purpose chart, the challenge for those engaged in data visualisation is to anticipate the central purpose of an exhibit and to choose a chart type that best resonates with this. Sometimes, the Pie Chart can be just what is required, as I found myself in my article, A Tale of Two [Brexit] Data Visualisations, which closed with the following image:
Or, to put it another way:
You may very well be well bred
Chart aesthetics filling your head
But there’s always some special case, time or place
To replace perfect taste
Never cry ’bout a Chart of Pie
You can still do fine with a Chart of Pie
People may well laugh at this humble graph
But it can be just the thing you need to help the staff
Never cry ’bout a Chart of Pie
Though without due care things can go awry
Bars are fine, Columns shine
Lines are ace, Radars race
Boxes fly, but never cry about a Chart of Pie
With apologies to the Disney Corporation!
It was pointed out to me by Adam Carless that I had omitted the following thing of beauty from my Pie Chart menagerie. How could I have forgotten?
It is claimed that some Theoretical Physicists (and most Higher Dimensional Geometers) can visualise in four dimensions. Perhaps this facility would be of some use in discerning meaning from the above exhibit.