The peterjamesthomas.com Data Strategy Hub

The peterjamesthomas.com Data Strategy Hub
Today we launch a new on-line resource, The Data Strategy Hub. This presents some of the most popular Data Strategy articles on this site and will expand in coming weeks to also include links to articles and other resources pertaining to Data Strategy from around the Internet.

If you have an article you have written, or one that you read and found helpful, please post a link in a comment here or in the actual Data Strategy Hub and I will consider adding it to the list.
 


peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

The latest edition of The Data & Analytics Dictionary is now out

The Data and Analytics Dictionary

After a hiatus of a few months, the latest version of the peterjamesthomas.com Data and Analytics Dictionary is now available. It includes 30 new definitions, some of which have been contributed by people like Tenny Thomas Soman, George Firican, Scott Taylor and and Taru Väre. Thanks to all of these for their help.

  1. Analysis
  2. Application Programming Interface (API)
  3. Business Glossary (contributor: Tenny Thomas Soman)
  4. Chart (Graph)
  5. Data Architecture – Definition (2)
  6. Data Catalogue
  7. Data Community
  8. Data Domain (contributor: Taru Väre)
  9. Data Enrichment
  10. Data Federation
  11. Data Function
  12. Data Model
  13. Data Operating Model
  14. Data Scrubbing
  15. Data Service
  16. Data Sourcing
  17. Decision Model
  18. Embedded BI / Analytics
  19. Genetic Algorithm
  20. Geospatial Data
  21. Infographic
  22. Insight
  23. Management Information (MI)
  24. Master Data – additional definition (contributor: Scott Taylor)
  25. Optimisation
  26. Reference Data (contributor: George Firican)
  27. Report
  28. Robotic Process Automation
  29. Statistics
  30. Self-service (BI or Analytics)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

If you would like to contribute a definition, which will of course be acknowledged, you can use the comments section here, or the dedicated form, we look forward to hearing from you [1].

If you have found The Data & Analytics Dictionary helpful, we would love to learn more about this. Please post something in the comments section or contact us and we may even look to feature you in a future article.

The Data & Analytics Dictionary will continue to be expanded in coming months.
 


Notes

 
[1]
 
Please note that any submissions will be subject to editorial review and are not guaranteed to be accepted.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

In praise of Jam Doughnuts or: How I learned to stop worrying and love Hybrid Data Organisations

The above infographic is the work of Management Consultants Oxbow Partners [1] and employs a novel taxonomy to categorise data teams. First up, I would of course agree with Oxbow Partners’ statement that:

Organisation of data teams is a critical component of a successful Data Strategy

Indeed I cover elements of this in two articles [2]. So the structure of data organisations is a subject which, in my opinion, merits some consideration.

Oxbow Partners draw distinctions between organisations where the Data Team is separate from the broader business, ones where data capabilities are entirely federated with no discernible “centre” and hybrids between the two. The imaginative names for these are respectively The Burger, The Smoothie and The Jam Doughnut. In this article, I review Oxbow Partners’s model and offer some of my own observations.



The Burger – Centralised

The Burger

Having historically recommended something along the lines of The Burger, not least when an organisation’s data capabilities are initially somewhere between non-existent and very immature, my views have changed over time, much as the characteristics of the data arena have also altered. I think that The Burger still has a role, in particular, in a first phase where data capabilities need to be constructed from scratch, but it has some weaknesses. These include:

  1. The pace of change in organisations has increased in recent years. Also, many organisations have separate divisions or product lines and / or separate geographic territories. Change can be happening in sometimes radically different ways in each of these as market conditions may vary considerably between Division A’s operations in Switzerland and Division B’s operations in Miami. It is hard for a wholly centralised team to react with speed in such a scenario. Even if they are aware of the shifting needs, capacity may not be available to work on multiple areas in parallel.
     
  2. Again in the above scenario, it is also hard for a central team to develop deep expertise in a range of diverse businesses spread across different locations (even if within just one country). A central team member who has to understand the needs of 12 different business units will necessarily be at a disadvantage when considering any single unit compared to a colleague who focuses on that unit and nothing else.
     
  3. A further challenge presented here is maintaining the relationships with colleagues in different business units that are typically a prerequisite for – for example – driving adoption of new data capabilities.


The Smoothie – Federated

The Smoothie

So – to address these shortcomings – maybe The Smoothie is a better organisational design. Well maybe, but also maybe not. Problems with these arrangements include:

  1. Probably biggest of all, it is an extremely high-cost approach. The smearing out of work on data capabilities inevitably leads to duplication of effort with – for example – the same data sourced or combined by different people in parallel. The pace of change in organisations may have increased, but I know few that are happy to bake large costs into their structures as a way to cope with this.
     
  2. The same duplication referred to above creates another problem, the way that data is processed can vary (maybe substantially) between different people and different teams. This leads to the nightmare scenario where people spend all their time arguing about whose figures are right, rather than focussing on what the figures say is happening in the business [3]. Such arrangements can generate business risk as well. In particular, in highly regulated industries heterogeneous treatment of the same data tends to be frowned upon in external reviews.
     
  3. The wholly federated approach also limits both opportunities for economies of scale and identification of areas where data capabilities can meet the needs of more than one business unit.
     
  4. Finally, data resources who are fully embedded in different parts of a business may become isolated and may not benefit from the exchange of ideas that happens when other similar people are part of the immediate team.

So to summarise we have:

Burger vs Smoothie



The Jam Doughnut – Hybrid

The Jam Doughnut

Which leaves us with The Jam Doughnut, in my opinion, this is a Goldilocks approach that captures as much as possible of the advantages of the other two set-ups, while mitigating their drawbacks. It is such an approach that tends to be my recommendation for most organisations nowadays. Let me spend a little more time describing its attributes.

I see the best way of implementing a Jam Doughnut approach is via a hub-and-spoke model. The hub is a central Data Team, the spokes are data-centric staff in different parts of the business (Divisions, Functions, Geographic Territories etc.).

Data Hub and Spoke

It is important to stress that each spoke satellite is not a smaller copy of the central Data Team. Some roles will be more federated, some more centralised according to what makes sense. Let’s consider a few different roles to illustrate this:

  • Data Scientist – I would see a strong central group of these, developing methodologies and tools, but also that many business units would have their own dedicated people; “spoke”-based people could also develop new tools and new approaches, which could be brought into the “hub” for wider dissemination
     
  • Analytics Expert – Similar to the Data Scientists, centralised “hub” staff might work more on standards (e.g. for Data Visualisation), developing frameworks to be leveraged by others (e.g. a generic harness for dashboards that can be leveraged by “spoke” staff), or selecting tools and technologies; “spoke”-based staff would be more into the details of meeting specific business needs
     
  • Data Engineer – Some “spoke” people may be hybrid Data Scientists / Data Engineers and some larger “spoke” teams may have dedicated Data Engineers, but the needle moves more towards centralisation with this role
     
  • Data Architect – Probably wholly centralised, but some “spoke” staff may have an architecture string to their bow, which would of course be helpful
     
  • Data Governance Analyst – Also probably wholly centralised, this is not to downplay the need for people in the “spokes” to take accountability for Data Governance and Data Quality improvement, but these are likely to be part-time roles in the “spokes”, whereas the “hub” will need full-time Data Governance people

It is also important to stress that the various spokes should also be in contact with each other, swapping successful approaches, sharing ideas and so on. Indeed, you could almost see the spokes beginning to merge together somewhat to form a continuum around the Data Team. Maybe the merged spokes could form the “dough”, with the Data Team being the “jam” something like this:

Data Hub and Spoke

I label these types of arrangements a Data Community and this is something that I have looked to establish and foster in a few recent assignments. Broadly a Data Community is something that all data-centric staff would feel part of; they are obviously part of their own segment of the organisation, but the Data Community is also part of their corporate identity. The Data Community facilities best practice approaches, sharing of ideas, helping with specific problems and general discourse between its members. I will be revisiting the concept of a Data Community in coming weeks. For now I would say that one thing that can help it to function as envisaged is sharing common tooling. Again this is a subject that I will return to shortly.

I’ll close by thanking Oxbow Partners for some good mental stimulation – I will look forward to their next data-centric publication.
 


 

Disclosure:

It is peterjamesthomas.com’s policy to disclose any connections with organisations or individuals mentioned in articles.

Oxbow Partners are an advisory firm for the insurance industry covering Strategy, Digital and M&A. Oxbow Partners and peterjamesthomas.com Ltd. have a commercial association and peterjamesthomas.com Ltd. was also engaged by one of Oxbow Partners’ principals, Christopher Hess, when he was at a former organisation.

 
Notes

 
[1]
 
Though the author might have had a minor role in developing some elements of it as well.
 
[2]
 
The Anatomy of a Data Function and A Simple Data Capability Framework.
 
[3]
 
See also The impact of bad information on organisations.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 
 

A Simple Data Capability Framework

Data Capability Framework

Introduction

As part of my consulting business, I end up thinking about Data Capability Frameworks quite a bit. Sometimes this is when I am assessing current Data Capabilities, sometimes it is when I am thinking about how to transition to future Data Capabilities. Regular readers will also recall my tripartite series on The Anatomy of a Data Function, which really focussed more on capabilities than purely organisation structure [1].

Detailed frameworks like the one contained in Anatomy are not appropriate for all audiences. Often I need to provide a more easily-absorbed view of what a Data Function is and what it does. The exhibit above is one that I have developed and refined over the last three or so years and which seems to have resonated with a number of clients. It has – I believe – the merit of simplicity. I have tried to distil things down to the essentials. Here I will aim to walk the reader through its contents, much of which I hope is actually self-explanatory.

The overall arrangement has been chosen intentionally, the top three areas are visible activities, the bottom three are more foundational areas [2], ones that are necessary for the top three boxes to be discharged well. I will start at the top left and work across and then down.
 
 
Collation of Data to provide Information

Dashboard

This area includes what is often described as “traditional” reporting [3], Dashboards and analysis facilities. The Information created here is invaluable for both determining what has happened and discerning trends / turning points. It is typically what is used to run an organisation on a day-to-day basis. Absence of such Information has been the cause of underperformance (or indeed major losses) in many an organisation, including a few that I have been brought in to help. The flip side is that making the necessary investments to provide even basic information has been at the heart of the successful business turnarounds that I have been involved in.

The bulk of Business Intelligence efforts would also fall into this area, but there is some overlap with the area I next describe as well.
 
 
Leverage of Data to generate Insight

Voronoi diagram

In this second area we have disciplines such as Analytics and Data Science. The objective here is to use a variety of techniques to tease out findings from available data (both internal and external) that go beyond the explicit purpose for which it was captured. Thus data to do with bank transactions might be combined with publically available demographic and location data to build an attribute model for both existing and potential clients, which can in turn be used to make targeted offers or product suggestions to them on Digital platforms.

It is my experience that work in this area can have a massive and rapid commercial impact. There are few activities in an organisation where a week’s work can equate to a percentage point increase in profitability, but I have seen insight-focussed teams deliver just that type of ground-shifting result.
 
 
Control of Data to ensure it is Fit-for-Purpose

Data controls

This refers to a wide range of activities from Data Governance to Data Management to Data Quality improvement and indeed related concepts such as Master Data Management. Here as well as the obvious policies, processes and procedures, together with help from tools and technology, we see the need for the human angle to be embraced via strong communications, education programmes and aligning personal incentives with desired data quality outcomes.

The primary purpose of this important work is to ensure that the information an organisation collates and the insight it generates are reliable. A helpful by-product of doing the right things in these areas is that the vast majority of what is required for regulatory compliance is achieved simply by doing things that add business value anyway.
 
 
Data Architecture / Infrastructure

Data architecture

Best practice has evolved in this area. When I first started focussing on the data arena, Data Warehouses were state of the art. More recently Big Data architectures, including things like Data Lakes, have appeared and – at least in some cases – begun to add significant value. However, I am on public record multiple times stating that technology choices are generally the least important in the journey towards becoming a data-centric organisation. This is not to say such choices are unimportant, but rather that other choices are more important, for example how best to engage your potential users and begin to build momentum [4].

Having said this, the model that seems to have emerged of late is somewhat different to the single version of the truth aspired to for many years by organisations. Instead best practice now encompasses two repositories: the first Operational, the second Analytical. At a high-level, arrangements would be something like this:

Data architecture

The Operational Repository would contain a subset of corporate data. It would be highly controlled, highly reconciled and used to support both regular reporting and a large chunk of dashboard content. It would be designed to also feed data to other areas, notably Finance systems. This would be complemented by the Analytical Repository, into which most corporate data (augmented by external data) would be poured. This would be accessed by a smaller number of highly skilled staff, Data Scientists and Analytics experts, who would use it to build models, produce one off analyses and to support areas such as Data Visualisation and Machine Learning.

It is not atypical for Operational Repositories to be SQL-based and Analytical Repsoitories to be Big Data-based, but you could use SQL for both or indeed Big Data for both according to the circumstances of an organisation and its technical expertise.
 
 
Data Operating Model / Organisation Design

Organisational design

Here I will direct readers to my (soon to be updated) earlier work on The Anatomy of a Data Function. However, it is worth mentioning a couple of additional points. First an Operating Model for data must encompass the whole organisation, not just the Data Function. Such a model should cover how data is captured, sourced and used across all departments.

Second I think that the concept of a Data Community is important here, a web of like-minded Data Scientists and Analytics people, sitting in various business areas and support functions, but linked to the central hub of the Data Function by common tooling, shared data sets (ideally Curated) and aligned methodologies. Such a virtual data team is of course predicated on an organisation hiring collaborative people who want to be part of and contribute to the Data Community, but those are the types of people that organisations should be hiring anyway [5].
 
 
Data Strategy

Data strategy

Our final area is that of Data Strategy, something I have written about extensively in these pages [6] and a major part of the work that I do for organisations.

It is an oft-repeated truism that a Data Strategy must reflect an overarching Business Strategy. While this is clearly the case, often things are less straightforward. For example, the Business Strategy may be in flux; this is particularly the case where a turn-around effort is required. Also, how the organisation uses data for competitive advantage may itself become a central pillar of its overall Business Strategy. Either way, rather than waiting for a Business Strategy to be finalised, there are a number of things that will need to be part of any Data Strategy: the establishment of a Data Function; a focus on making data fit-for-purpose to better support both information and insight; creation of consistent and business-focussed reporting and analysis; and the introduction or augmentation of Data Science capabilities. Many of these activities can help to shape a Business Strategy based on facts, not gut feel.

More broadly, any Data Strategy will include: a description of where the organisation is now (threats and opportunities); a vision for commercially advantageous future data capabilities; and a path for moving between the current and the future states. Rather than being PowerPoint-ware, such a strategy needs to be communicated assiduously and in a variety of ways so that it can be both widely understood and form a guide for data-centric activities across the organisation.
 
 
Summary
 
As per my other articles, the data capabilities that a modern organisation needs are broader and more detailed than those I have presented here. However, I have found this simple approach a useful place to start. It covers all the basic areas and provides a scaffold off of which more detailed capabilities may be hung.

The framework has been informed by what I have seen and done in a wide range of organisations, but of course it is not necessarily the final word. As always I would be interested in any general feedback and in any suggestions for improvement.
 


 
Notes

 
[1]
 
In passing, Anatomy is due for its second refresh, which will put greater emphasis on Data Science and its role as an indispensable part of a modern Data Function. Watch this space.
 
[2]
 
Though one would hope that a Data Strategy is also visible!
 
[3]
 
Though nowadays you hear “traditional” Analytics and “traditional” Big Data as well (on the latter see Sic Transit Gloria Magnorum Datorum), no doubt “traditional” Machine Learning will be with us at some point, if it isn’t here already.
 
[4]
 
See also Building Momentum – How to begin becoming a Data-driven Organisation.
 
[5]
 
I will be revisiting the idea of a Data Community in coming months, so again watch this space.
 
[6]
 
Most explicitly in my three-part series:

  1. Forming an Information Strategy: Part I – General Strategy
  2. Forming an Information Strategy: Part II – Situational Analysis
  3. Forming an Information Strategy: Part III – Completing the Strategy

 
peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 
 

A Retrospective of 2018’s Articles

A Review of 2018

This is the second year in which I have produced a retrospective of my blogging activity. As in 2017, I have failed miserably in my original objective of posting this early in January. Despite starting to write this piece on 18th December 2018, I have somehow sneaked into the second quarter before getting round to completing it. Maybe I will do better with 2019’s highlights!

Anyway, 2018 was a record-breaking year for peterjamesthomas.com. The site saw more traffic than in any other year since its inception; indeed hits were over a third higher than in any previous year. This increase was driven in part by the launch of my new Maths & Science section, articles from which claimed no fewer than 6 slots in the 2018 top 10 articles, when measured by hits [1]. Overall the total number of articles and new pages I published exceeded 2017’s figures to claim the second spot behind 2009; our first year in business.

As with every year, some of my work was viewed by tens of thousands of people, while other pieces received less attention. This is my selection of the articles that I enjoyed writing most, which does not always overlap with the most popular ones. Given the advent of the Maths & Science section, there are now seven categories into which I have split articles. These are as follows:

  1. General Data Articles
  2. Data Visualisation
  3. Statistics & Data Science
  4. CDO perspectives
  5. Programme Advice
  6. Analytics & Big Data
  7. Maths & Science

In each category, I will pick out one or two pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.

 
 
General Data Articles
 
A Brief History of Databases
 
February
A Brief History of Databases
An infographic spanning the history of Database technology from its early days in the 1960s to the landscape in the late 2010s..
 
Data Strategy Alarm Bell
 
July
How to Spot a Flawed Data Strategy
What alarm bells might alert you to problems with your Data Strategy; based on the author’s extensive experience of both developing Data Strategies and vetting existing ones.
 
Just the facts...
 
August
Fact-based Decision-making
Fact-based decision-making sounds like a no brainer, but just how hard is it to generate accurate facts?
 
 
Data Visualisation
 
Comparative Pie Charts
 
August
As Nice as Pie
A review of the humble Pie Chart, what it is good at, where it presents problems and some alternatives.
 
 
Statistics & Data Science
 
Data Science Challenges – It’s Deja Vu all over again!
 
August
Data Science Challenges – It’s Deja Vu all over again!
A survey of more than 10,000 Data Scientists highlights a set of problems that will seem very, very familiar to anyone working in the data space for a few years.
 
 
CDO Perspectives
 
The CDO Dilemma
 
February
The CDO – A Dilemma or The Next Big Thing?
Two Forbes articles argue different perspectives about the role of Chief Data Officer. The first (by Lauren deLisa Coleman) stresses its importance, the second (by Randy Bean) highlights some of the challenges that CDOs face.
 
2018 CDO Interviews
 
May onwards
The “In-depth” series of CDO interviews
Rather than a single article, this is a series of four talks with prominent CDOs, reflecting on the role and its challenges.
 
The Chief Marketing Officer and the CDO – A Modern Fable
 
October
The Chief Marketing Officer and the CDO – A Modern Fable
Discussing an alt-facts / “fake” news perspective on the Chief Data Officer role.
 
 
Programme Advice
 
Building Momentum
 
June
Building Momentum – How to begin becoming a Data-driven Organisation
Many companies want to become data driven, but getting started on the journey towards this goal can be tough. This article offers a framework for building momentum in the early stages of a Data Programme.
 
 
Analytics & Big Data
 
Enterprise Data Marketplace
 
January
Draining the Swamp
A review of some of the problems that can beset Data Lakes, together with some ideas about what to do to fix these from Dan Woods (Forbes), Paul Barth (Podium Data) and Dave Wells (Eckerson Group).
 
Sic Transit Gloria Mundi
 
February
Sic Transit Gloria Magnorum Datorum
In a world where the word has developed a very negative connotation, what’s so bad about being traditional?
 
Convergent Evolution of Data Architectures
 
August
Convergent Evolution
What the similarities (and differences) between Ichthyosaurs and Dolphins can tell us about different types of Data Architectures.
 
 
Maths & Science
 
Euler's Number
 
March
Euler’s Number
A long and winding road with the destination being what is probably the most important number in Mathematics.
 The Irrational Ratio  
August
The Irrational Ratio
The number π is surrounded by a fog of misunderstanding and even mysticism. This article seeks to address some common misconceptions about π, to show that in many ways it is just like any other number, but also to demonstrate some of its less common properties.
 
Emmy Noether
 
October
Glimpses of Symmetry, Chapter 24 – Emmy
One of the more recent chapters in my forthcoming book on Group Theory and Particle Physics. This focuses on the seminal contributions of Mathematician Emmy Noether to the fundamentals of Physics and the connection between Symmetry and Conservation Laws.

 
Notes

 
[1]
 

The 2018 Top Ten by Hits
1. The Irrational Ratio
2. A Brief History of Databases
3. Euler’s Number
4. The Data and Analytics Dictionary
5. The Equation
6. A Brief Taxonomy of Numbers
7. When I’m 65
8. How to Spot a Flawed Data Strategy
9. Building Momentum – How to begin becoming a Data-driven Organisation
10. The Anatomy of a Data Function – Part I

 
peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 
 

More Definitions in the Data and Analytics Dictionary

The Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary is an active document and I will continue to issue revised versions of it periodically. Here are 20 new definitions, including the first from other contributors (thanks Tenny!):

  1. Artificial Intelligence Platform
  2. Data Asset
  3. Data Audit
  4. Data Classification
  5. Data Consistency
  6. Data Controls
  7. Data Curation (contributor: Tenny Thomas Soman)
  8. Data Democratisation
  9. Data Dictionary
  10. Data Engineering
  11. Data Ethics
  12. Data Integrity
  13. Data Lineage
  14. Data Platform
  15. Data Strategy
  16. Data Wrangling (contributor: Tenny Thomas Soman)
  17. Explainable AI (contributor: Tenny Thomas Soman)
  18. Information Governance
  19. Referential Integrity
  20. Testing Data (Training Data)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

People are now also welcome to contribute their own definitions. You can use the comments section here, or the dedicated form. Submissions will be subject to editorial review and are not guaranteed to be accepted.
 


 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

In-depth with CDO Christopher Bannocks

In-depth with Christopher Bannocks


Part of the In-depth series of interviews


PJT Today I am talking to Christopher Bannocks, who is Group Chief Data Officer at ING. ING is a leading global financial institution, headquartered in the Netherlands. As stressed in other recent In-depth interviews [1], data is a critical asset in banking and related activities, so Christopher’s role is a pivotal one. I’m very glad that he has been able to find time in his busy calendar to speak to us.
PJT Hello Christopher, can you start by providing readers with a flavour of your career to date and perhaps also explain why you came to focus on the data arena.
CB Sure, it’s probably right to say I didn’t start out here, data was not my original choice, and for anyone of a similar age to me, data wasn’t a choice, when I started out, in that respect it’s a “new segment”. I started out on a management development programme in a retail bank in the UK, after which I moved to be an operations manager in investment banking. As part of that time in my career, post Euro migration and Y2K (yes I am genuinely that old, I also remember Vinyl records and Betamax video!) [2] I was asked to help solve the data problem. What I recognised very quickly was this was an area with under-investment, that was totally central the focus of that time – STP (Straight Through Processing). Equally it provided me with much broader perspectives, connections to all parts of the organisation that I previously didn’t have and it was at that point, some 20 years ago, that I decided this was the thing for me! I have since run and driven transformation in Reference Data, Master Data, KYC [3], Customer Data, Data Warehousing and more recently Data Lakes and Analytics, constantly building experience and capability in the Data Governance, Quality and data services domains, both inside banks, as a consultant and as a vendor.
PJT I am trying to get a picture of the role and responsibilities of the typical CDO (not that there appears to be such a thing), so would you mind touching on the span of your work at ING? I know you have a strong background in Enterprise Data Management, how does the CDO role differ from this area?
CB I guess that depends on how you determine the scope of Enterprise Data Management. However, in reality, the CDO role encompasses Enterprise Data Management, although generally speaking the EDM role includes responsibility for the day to day operations of the collection processes, which in my current role I don’t have. I have accountability for the governance and quality through those processes and for making the data available for downstream consumers, like Analytics, Risk, Finance and HR.

My role encompasses being the business driver for the data platform that we are rolling out across the organisation and its success in terms of the data going onto the platform and the curation of that data in a governed state, depending on the consumer requirements.

My role today boils down to 4 key objectives – data availability, data transparency, data quality and data control.

PJT I know that ING consists of many operating areas and has adopted a federated structure with respect to data matters. What are the strengths of this approach and how does it work on a day-to-day basis?
CB This approach ensures that the CDO role (I have a number of CDOs functionally reporting to me) remains close to the business and the local entity it supports, it ensures that my management team is directly connected to the needs of the business locally, and that the local businesses have a direct connection to the global strategy. What I would say is that there is no “one size fits all” approach to the CDO organisation model. It depends on the company culture and structure and it needs to fit with the stated objectives of the role as designed.

On a day to day basis, we are aligned with the business units and the functional units so we have CDOs in all of these areas. Additionally I have a direct set of reports who drive the standard solutions around tooling, governance, quality, data protection, Data Ethics, Metadata and data glossary and models.

PJT Helping organisations become “data-centric” is a key part of what you do. I often use this phrase myself; but was recently challenged to elucidate its meaning. What does a “data-centric” organisation look like to you? What sort of value does data-centricity release in your experience?
CB Data centric is a cultural shift, in the structures of the past where we have technology people and process, we now have data that touches all three. You know if you have reached the right place when data becomes part of the decision making process across the organisation, when decisions are only made when data is presented to support it and this is of the requisite quality. This doesn’t mean all decisions require data, some decisions don’t have data and that’s where leaderships decisions can be made, but for those decisions that have good data to support them, these can be made easily and at a lower level in the organisation. Hence becoming data centric supports an agile organisation and servant / leadership principles, utilising data makes decisions faster and outcomes better.
PJT I am on record multiple times [4] stating that technology choices are much less important than other aspects of data work. However, it is hard to ignore the impact that Big Data and related technologies have had. A few years into the cycle of Big Data adoption, do you see the tools and approaches yielding the expected benefits? Should I revisit my technology-agnostic stance?
CB I have also been on record multiple times saying that every data problem is a people problem in disguise. I still hold that this is true today although potentially this is changing. The problems of the past and still to this day originate with poor data stewardship, I saw it happening in front of my eyes last week in Heathrow when I purchased something in a well known electronics store. Because I have an overseas postcode the guy at the checkout put dummy data into all the fields to get through the process quickly and not impact my customer experience, I desperately wanted to stop him but also wanted to catch my plane. This is where the process efficiency impacts good data collection. If the software that supports the process isn’t flexible, the issue won’t be fixed without technology intervention, this is often true in data quality problems which have knock on effects to customers, which at the end of the day are why we are all here. This is a people problem (because who is taking responsibility here for fixing it, or educating that guy at the checkout) AND it’s a technology problem, caused by inflexible or badly implemented systems.

However, in the future, with more focus on customer driven checkout, digital channels and better customer experience, better interface driven data controls and robotics and AI, it may become further nuanced. People are still involved, communication remains critical but we cannot ignore technology in the digital age. For a long time, data groups have struggled with getting access to good tools and technology, now this technology domain is growing daily, and the tools are improving all the time. What we can do now with data at a significantly lower cost than ever before is amazing, and continues to improve all the time. Hence ignoring technology can be costly when extending capabilities to your stakeholders and could be a serious mistake, however focusing only on technology and ignoring people, process, communication etc is also a serious mistake. Data Leaders have to be multi-disciplinary today, and be able to keep up with the pace of change.

PJT I have heard you talk about “data platforms”, what do you mean by this and how do these contrast with another perennial theme, that of data democratisation? How does a “data platform” relate to – say – Data Science teams?
CB Data democratisation is enabled by the data platform. The data platform is the technology enablement of the four pillars I mentioned before, availability, transparency, quality and control. The platform is a collection of technologies that standardise the approach and access to well governed data across the organisation. Data Democratisation is simply making data available and abstracting away from siloed storage mechanisms, but the platform wraps the implementation of quality, controls and structure to the way that happens. Data Science teams then get the data they need, including data curation services to find the data they need quickly, for governed and structured data, Data Science teams can utilise the glossary to identify what they need and understand the level of quality based on consumer views, they also have access to metadata in standard forms. This empowers the analytics capability to move faster, spend less time on data discovery and curation, structure and quality and more time on building analytics.
PJT I mentioned the federated CDO team at ING above and assume this is reflected in the rest of the organisation structure. ING also has customers in 40 countries and I know first-hand that a global footprint adds complexity. What are the challenges in being a CDO in such an environment? Does this put a higher premium on influencing skills for example?
CB I am not sure it puts a higher premium on influencing skills, these have a high premium in any CDO role, even if you don’t have a federated structure, the reality is if you are in a data role you have more stakeholders than anyone else in the company, so influencing skills remain premium.

A global footprint means complexity for sure, it means differences in a world where you are trying to standardise and it means you have to be tuned in to cultural differences and boundaries. It also means a great deal of variety, opportunities to learn new cultures and approaches, it means you have to listen and understand and flex your style and it means pragmatism plays an important part in your decision making process.

At ING we have an amazing team of people who collaborate in a way I have never experienced before, supported by a strong attachment and commitment to the success of the business and our customers. This makes dealing with the complexity a team effort, with great energy and a fantastic working environment. In an organisation without the drive and passion we have here it would present challenges, with the support of the board and being a core part of the overall strategy, it ensures broad alignment to the goal, which makes the challenge easier for the organisation to solve, not easy, but easier and more fun.

PJT Building on the last point, every CDO I have interviewed has stressed the importance of relationships; something that chimes with my own experience. How do you go about building strong relationships and maintaining them when inevitable differences of opinion or clashes in interests arise?
CB I touched on this a little earlier. Pragmatism over purism. I see purist everywhere in data, with views that are so rigid that the execution of them is doomed because purism doesn’t build relationships. Relationships are built based on what you bring and give up, on what you can give, not on what you can get. I try every day to achieve this, but I am human too, so I don’t always get it right, I hope I get it right more than I get it wrong and where I get it wrong I hope I can be forgiven for my intention is pure. We owe it to our customers to work together for their benefit, where we have differences the customer outcomes should drive our decisions, in that we have a common goal. Disagreements can be helped and supported by identifying a common goal, this starts to align people behind a common outcome. Individual interests can be put aside in preference of the customer interest.
PJT I know that you are very interested in data ethics and feel that this is an important area for CDOs to consider. Can you tell the readers a bit more about data ethics and why they should be central to an organisation’s approach to data?
CB In an increasingly digital world, the use of data is becoming widespread and the pace at which it is used is increasing daily, our compute power grows exponentially as does the availability of data. Given this, we need an ethical framework to help us make good decisions with our customers and stakeholders in mind. How do you ensure that decisions in your organisation about how you use data are ethical? What are ethical decisions in your organisation and what are the guiding principles? If this isn’t clear and communicated to help all staff make good decisions, or have good discussions there is a real danger that decisions may not be properly socialised before all angles are considered.

Just meeting the bar of privacy regulation may not be enough, you can still meet that bar and do things that your customers may disagree with of find “creepy” so the correct thought needs to be applied and the organisation engaged to ensure the correct conversations take place, and there is a place to go to discuss ethics.

I am not saying that there is a silver bullet to solve this problem, but the conversation and the ability to have the conversation in a structured way helps the organisation understand its approach and make good decisions in this respect. That’s why CDOs should consider this an important part of the role and a critical engagement with users of data across the organisation.

PJT Finally, I have worked for businesses with a presence in the Netherlands on a number of occasions. As a Brit living abroad, how have you found Amsterdam. What – if any – adaptations have you had to make to your style to thrive in a somewhat different culture?
CB Having lived in India, I thought my move to the Netherlands could only be easy. I arrived thinking that a 45 minute flight could not possibly provide as many challenges as an 11 hour flight, especially from a cultural perspective. Of course I was wrong because any move to a different culture provides challenges you could never have expected and it’s the small adjustments that take you by surprise the most. It’s always a hugely enjoyable learning experience though. London is a more top down culture whereas in the Netherlands it’s a much flatter approach, my experience here is positive although it does require an adjustment. I work in Amsterdam but live in a small village, chosen deliberately to integrate faster. It’s harder, more of a challenge but helps you understand the culture as you make friends with local people and get closer to the culture. My wife and I have never been a fan of the expat scene, we prefer to integrate, however more difficult this feels at first, it’s worth it in the long run. I must admit though that I haven’t conquered the language yet, it’s a real work in progress!
PJT Christopher, I really enjoyed our chat, which I believe will also be of great interest to readers. Thank you.

Christoper Bannocks can be reached at via his LinkedIn profile.


Disclosure: At the time of publication, neither peterjamesthomas.com Ltd. nor any of its Directors had any shared commercial interests with Christopher Bannocks, ING or any entities associated with either of these.


If you are a Chief Data Officer, a Chief Analytics Officer, a Director of Data, or hold some other “Top Data Job” and would like to share your thoughts with the readers of this site in an interview like this one, please get in contact.

 
Notes

 
[1]
 
Specifically:

 
[2]
 
So does the interviewer.
 
[3]
 
Know your customer.
 
[4]
 
Most directly in: A bad workman blames his [Business Intelligence] tools

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Latest Interviews / Podcasts

Interviews and Podcasts

The interviews that I conduct with leaders in their fields as part of my “In-depth” series have hopefully brought a new and interesting aspect to this site. However, often the boot is on the other foot and I am the person being interviewed about my experience and expertise in the data field and related matters [1]. Maybe interviewing other people helps me when I am in turn interviewed, maybe it’s the other way round. Whatever the case, I enjoyed recording the two conversations appearing below (thanks to the interviewers in both cases) and hope that the content is of interest to readers.

In both instances a link to the site originally publishing the interview is followed by a locally hosted version of the audio track and then a download option. I’d encourage readers to explore the other excellent interviews contained on both sites.



 
Enterprise Management 360° Podcast – 31st July 2018

 



 
Venturi Voice 3650° Podcast – 22nd April 2018

 

Downloadable link: Conducting a Data Orchestra

 
If you would like to interview me for your site or periodical, of if you are just interested in further exploring some of the themes I discuss in these two interviews, then please feel free to get in contact.
 


 
Notes

 
[1]
 
A list of other video interviews and podcasts I have taken part in can be viewed in the Media section of this site.

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Convergent Evolution

Ichthyosaur and Dolphin

No this article has not escaped from my Maths & Science section, it is actually about data matters. But first of all, channeling Jennifer Aniston [1], “here comes the Science bit – concentrate”.


 
Shared Shapes

The Theory of Common Descent holds that any two organisms, extant or extinct, will have a common ancestor if you roll the clock back far enough. For example, each of fish, amphibians, reptiles and mammals had a common ancestor over 500 million years ago. As shown below, the current organism which is most like this common ancestor is the Lancelet [2].

Chordate Common Ancestor

To bring things closer to home, each of the Great Apes (Orangutans, Gorillas, Chimpanzees, Bonobos and Humans) had a common ancestor around 13 million years ago.

Great Apes Common Ancestor

So far so simple. As one would expect, animals sharing a recent common ancestor would share many attributes with both it and each other.

Convergent Evolution refers to something else. It describes where two organisms independently evolve very similar attributes that were not features of their most recent common ancestor. Thus these features are not inherited, instead evolutionary pressure has led to the same attributes developing twice. An example is probably simpler to understand.

The image at the start of this article is of an Ichthyosaur (top) and Dolphin. It is striking how similar their body shapes are. They also share other characteristics such as live birth of young, tail first. The last Ichthyosaur died around 66 million years ago alongside many other archosaurs, notably the Dinosaurs [3]. Dolphins are happily still with us, but the first toothed whale (not a Dolphin, but probably an ancestor of them) appeared around 30 million years ago. The ancestors of the modern Bottlenose Dolphins appeared a mere 5 million years ago. Thus there is tremendous gap of time between the last Ichthyosaur and the proto-Dolphins. Ichthyosaurs are reptiles, they were covered in small scales [4]. Dolphins are mammals and covered in skin not massively different to our own. The most recent common ancestor of Ichthyosaurs and Dolphins probably lived around quarter of a billion years ago and looked like neither of them. So the shape and other attributes shared by Ichthyosaurs and Dolphins do not come from a common ancestor, they have developed independently (and millions of years apart) as adaptations to similar lifestyles as marine hunters. This is the essence of Convergent Evolution.

That was the Science, here comes the Technology…


 
A Brief Hydrology of Data Lakes

From 2000 to 2015, I had some success [5] with designing and implementing Data Warehouse architectures much like the following:

Data Warehouse Architecture (click to view larger version in a new window)

As a lot of my work then was in Insurance or related fields, the Analytical Repositories tended to be Actuarial Databases and / or Exposure Management Databases, developed in collaboration with such teams. Even back then, these were used for activities such as Analytics, Dashboards, Statistical Modelling, Data Mining and Advanced Visualisation.

Overlapping with the above, from around 2012, I began to get involved in also designing and implementing Big Data Architectures; initially for narrow purposes and later Data Lakes spanning entire enterprises. Of course some architectures featured both paradigms as well.

One of the early promises of a Data Lake approach was that – once all relevant data had been ingested – this would be directly leveraged by Data Scientists to derive insight.

Over time, it became clear that it would be useful to also have some merged / conformed and cleansed data structures in the Data Lake. Once the output of Data Science began to be used to support business decisions, a need arose to consider how it could be audited and both data privacy and information security considerations also came to the fore.

Next, rather than just being the province of Data Scientists, there were moves to use Data Lakes to support general Data Discovery and even business Reporting and Analytics as well. This required additional investments in metadata.

The types of issues with Data Lake adoption that I highlighted in Draining the Swamp earlier this year also led to the advent of techniques such as Data Curation [6]. In parallel, concerns about expensive Data Science resource spending 80% of their time in Data Wrangling [7] led to the creation of a new role, that of Data Engineer. These people take on much of the heavy lifting of consolidating, fixing and enriching datasets, allowing the Data Scientists to focus on Statistical Analysis, Data Mining and Machine Learning.

Big Data Architecture (click to view larger version in a new window)

All of which leads to a modified Big Data / Data Lake architecture, embodying people and processes as well as technology and looking something like the exhibit above.

This is where the observant reader will see the concept of Convergent Evolution playing out in the data arena as well as the Natural World.


 
In Closing

Convergent Evolution of Data Architectures

Lest it be thought that I am saying that Data Warehouses belong to a bygone era, it is probably worth noting that the archosaurs, Ichthyosaurs included, dominated the Earth for orders of magnitude longer that the mammals and were only dethroned by an asymmetric external shock, not any flaw their own finely honed characteristics.

Also, to be crystal clear, much as while there are similarities between Ichthyosaurs and Dolphins there are also clear differences, the same applies to Data Warehouse and Data Lake architectures. When you get into the details, differences between Data Lakes and Data Warehouses do emerge; there are capabilities that each has that are not features of the other. What is undoubtedly true however is that the same procedural and operational considerations that played a part in making some Warehouses seem unwieldy and unresponsive are also beginning to have the same impact on Data Lakes.

If you are in the business of turning raw data into actionable information, then there are inevitably considerations that will apply to any technological solution. The key lesson is that shape of your architecture is going to be pretty similar, regardless of the technical underpinnings.


 
Notes

 
[1]
 
The two of us are constantly mistaken for one another.
 
[2]
 
To be clear the common ancestor was not a Lancelet, rather Lancelets sit on the branch closest to this common ancestor.
 
[3]
 
Ichthyosaurs are not Dinosaurs, but a different branch of ancient reptiles.
 
[4]
 
This is actually a matter of debate in paleontological circles, but recent evidence suggests small scales.
 
[5]
 
See:

 
[6]
 
A term that is unaccountably missing from The Data & Analytics Dictionary – something to add to the next release. UPDATE: Now remedied here.
 
[7]
 
Ditto. UPDATE: Now remedied here

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Version 2 of The Anatomy of a Data Function

Between November and December 2017, I published the three parts of my Anatomy of a Data Function. These were cunningly called Part I, Part II and Part III. Eight months is a long time in the data arena and I have now issued an update.

The Anatomy of a Data Function

Larger PDF version (opens in a new tab)

The changes in Version 2 are confined to the above organogram and Part I of the text. They consist of the following:

  1. Split Artificial Intelligence out of Data Science in order to better reflect the ascendancy of this area (and also its use outside of Data Science).
     
  2. Change Data Science to Data Science / Engineering in order to better reflect the continuing evolution of this area.

My aim will be to keep this trilogy up-to-date as best practice Data Functions change their shapes and contents.


 
If you would like help building or running your Data Function, or would just like to have an informal chat about the area, please get in touch
 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases