Did GDPR highlight the robustness of your Data Architecture, the strength of your Data Governance and the fitness of your Data Strategy?

GDPR

So GDPR Day is upon us – the sun still came up and the Earth is still spinning (these facts may be related of course). I hope that most GDPR teams and the Executives who have relied upon their work were able to go to bed last night secure in the knowledge that a good job had been done and that their organisations and customers were protected. Undoubtedly, in coming days, there will be some stories of breaches of the regulations, maybe some will be high-profile and the fines salutary, but it seems that most people have got over the line, albeit often by Herculean efforts and sometimes by the skins of their teeth.

Does it have to be like this?

A well-thought-out Data Architecture embodying a business-focussed Data Strategy and intertwined with the right Data Governance, should combine to make responding to things like GDPR relatively straightforward. Were they in your organisation?

If instead GDPR compliance was achieved in spite of your Data Architectures, Governance and Strategies, then I suspect you are in the majority. Indeed years of essentially narrow focus on GDPR will have consumed resources that might otherwise have gone towards embedding the control and leverage of data into the organisation’s DNA.

Maybe now is a time for reflection. Will your Data Strategy, Data Governance and Data Architecture help you to comply with the next set of data-related regulations (and it is inevitable that there will be more), or will they hinder you, as will have been the case for many with GDPR?

If you feel that the answer to this question is that there are significant problems with how your organisation approaches data, then maybe now is the time to grasp the nettle. Having helped many companies to both develop and execute successful Data Strategies, you could start by reading my trilogy on creating an Information / Data Strategy:

  1. General Strategy
  2. Situational Analysis
  3. Completing the Strategy

I’m also more than happy to discuss your data problems and opportunities either formally or informally, so feel free to get in touch.
 
 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Sic Transit Gloria Magnorum Datorum

Sic transit gloria mundi

It happens to all of us eventually I suppose.

Just the other day, I heard someone referring to “traditional Big Data”. Since when did Big Data become “traditional”, I didn’t get the e-mail? Of course, in the technology field, the epithet “traditional” is code for “broken”, “no longer of any use” and – most damningly of all – “deeply uncool”. The term is widely used, whether – with this connotation – it is either helpful or accurate is perhaps a matter for debate. This usage makes me recall the rather silly debate about Analytics versus “traditional” Business Intelligence that occurred around 2009 [1].

By way of context, the person talking about “traditional Big Data” was referring to the difference between some of the original denizens of the Hadoop ecosystem and more recent offerings like Databricks or Beam. They also had in mind the various quasi-proprietary flavours of Big Data and/or Big Data plug-ins offered by (that word again) “traditional” vendors. In this sense, the usage is probably appropriate, albeit somewhat jarring. In the more pejorative sense I refer to above, “traditional” is somewhat misleading when applied to either Big Data or – in the author’s opinion – several of its precursors.

Shiny!

While we inhabit a world which places a premium on innovation, favouring the new and the shiny [2], traditional methods have much to offer. If something – a technique or technology – has achieved “traditional” status, it means that it has become part of how things are done. While shaking up the status quo can be beneficial, “traditional” approaches have the not insignificant benefit of having been tried and tested. “Traditional” data tools are ones that have survived some time and are still used. While not guaranteeing success, it should at least be possible to be successful with such tools because other people have done this before.

Maybe, several years after its move into the mainstream, Big Data has become “traditional”. However I would take this as meaning “fit for purpose”, “useful” and “still pretty cool”. Then I think the same about many of the technologies that were described as “traditional” in contrast to Big Data. As ever, the main things that lead to either success or failure in data-centric work [3] have very little to do with technology, be that traditional or à la mode.
 


 
Notes

 
[1]
 
If you have the stomach for it, see Business Analytics vs Business Intelligence and succeeding articles.
 
[2]
 
See also 2009’s The latest and greatest versus the valuable.
 
[3]
 
I itemise a few of these in last year’s 20 Risks that Beset Data Programmes.

 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

A Brief History of Databases

A Brief History of Databases

Larger PDF version (opens in a new tab)

The pace of change in the field of database technology seems to be constantly accelerating. No doubt in five year’s time [1], Big Data and the Hadoop suite [2] will seem to be as old-fashioned as earlier technologies can appear to some people nowadays. Today there is a great variety of database technologies that are in use in different organisations for different purposes. There are also a lot of vendors, some of whom have more than one type of database product. I think that it is worthwhile considering both the genesis of databases and some of the major developments that have occurred between then and now.

The infographic appearing at the start of this article seeks to provide just such a perspective. It presents an abridged and simplified perspective on the history of databases from the 1960s to the late 2010s. It is hard to make out the text in the above diagram, so I would recommend that readers click on the link provided in order to view a much larger version with bigger and more legible text.

The infographic references a number of terms. Below I provide links to definitions of several of these, which are taken from The Data and Analytics Dictionary. The list progresses from the top of the diagram downwards, but starts with a definition of “database” itself:

To my mind, it is interesting to see just how long we have been grappling with the best way to set up databases. Also of note is that some of the Big Data technologies are actually relatively venerable, dating to the mid-to-late 2000s (some elements are even older, consisting of techniques for handling flat files on UNIX or Mainframe computers back in the day).

I hope that both the infographic and the definitions provided above contribute to the understanding of the history of databases and also that they help to elucidate the different types of database that are available to organisations today.
 


 
Acknowledgements

The following people’s input is acknowledged on the document itself, but my thanks are also repeated here:

Of course any errors and omissions remain the responsibility of the author.


 
Notes

 
[1]
 
If not significantly before then.
 
[2]
 
One of J K Rowling’s lesser-known works.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary and The Anatomy of a Data Function

 

A further extension of the Data and Analytics Dictionary

The Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary is an active document and I will continue to issue revised versions of it periodically. A larger update is in the works, but for now here are a dozen new definitions:

  1. Binary
  2. Business Analyst
  3. Chief Analytics Officer (CAO)
  4. Data
  5. Data Analyst
  6. Data Business Analyst
  7. Data Marketplace
  8. Data Steward
  9. Digital
  10. End User Computing (EUC)
  11. Information
  12. Web Analytics

As previously stated, ideas for what to include next would be more than welcome (any suggestions used will also be acknowledged).
 


 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

Draining the Swamp

Draining the Swamp

The title phrase of this article has entered the collective consciousness from political circles in recent months and years. Readers will be glad to hear that the political commentary content of this piece is precisely zero. Instead I am going to talk about Data Lakes, also referred to pejoratively by those who are not fans as Data Swamps.

Having started my relationship with Data matters back in the early days of Relational Databases and having driven corporate success through Data Warehouses and Business Intelligence, I have also done work in the Big Data arena since around 2013. A central concept in the Big Data paradigm is that of a Data Lake; a large Hadoop repository into which all data that an organisation might want to use is poured, often essentially as is. The thinking is that – in a Big Data implementation – storage is cheap [1] and you never fully know what data you might need in advance, so why not save it all?

It is probably fair to say that – much like many other major programmes of work over the years [2] – the creation of Data Lakes, or perhaps more accurately the leverage of their contents, has resulted in at best mixed results for the organisations that undertake such an endeavour. The thing with mixed results is that it is not all doom and gloom, some people are successful, others are not. The important thing is to determine what are the factors that lead to good and bad outcomes.

Well first of all, I would suggest that – like any other data programme – the formation of a Data Lake is subject to the types of potential issues that I review in my 2017 article, 20 Risks that Beset Data Programmes. Of these, Data Lakes are particularly susceptible to risk 16:

In the absence of [understanding key business decisions], the programme becoming a technology-driven one.

The business gets what IT or Change think that they need, not what is actually needed. There is more focus on shiny toys than on actionable information. The programme forgets the needs of its customers.

The issue here is that some people buy into the misconception that all you have to do is fill the Data Lake and sit back and wait for precious Data gems to flow from it. Understanding a business and its key decisions is tough and perhaps it is not surprising that people would like to skip this step and instead focus on easier activities. Sadly, this approach is not going to work for Data Lakes or anything else.
 


 
Dan Woods

However Data Lakes also face some specific risks and in search of better understanding these, I turned to a recent Forbes article, Can Failed Data Lakes Succeed As Data Marketplaces? penned by Dan Woods (@danwoodsearly) [3]. Dan does not mince words in his introduction:

All over the world, data lake projects are foundering, not because they are not a step in the right direction, but because they are essentially uncompleted experiments.

he adds:

The main roadblock has been that once companies store their data in the data lake, they struggle to find a way to operationalize it. The data lake has never become a product like a data warehouse. Proof of concepts are tweaked to keep a desultory flow of signals going.

and finally states:

[…] for certain use cases, Hadoop and purpose-built data lake-like infrastructure are solving complex and high-value problems. But in most other businesses, the data lake got stuck at the proof of concept stage.

This chimes with my experience – the ability to synthesise and analyse vast troves of data is indispensable in addressing some business problems, but a sledge-hammer to crack a walnut for others. Data Lakes are no more universal panaceas than anything else we have invented to date. As always, the main issues are not technology, but good processes, consistent definitions, improved data quality and matching available data to real business questions.
 


 
Paul Barth

In seeking salvation (Dan’s word) for Data Lakes, he sought the opinion of one of my LinkedIn contacts, Paul Barth (@BarthPS), CEO of Podium Data. Paul analyses the root causes of Data Lake issues, splitting these into three main ones [4]:

  1. Polluted data lakes

    Too many projects targeted at filling or exploiting the Data Lake kick off in parallel. This leads to an incoherent landscape and inaccessible / difficult to understand data.
     

  2. Bottlenecked data lakes

    Essentially treating the Data Lake as if it was a Data Warehouse where the technology is designed for different and less structured purposes. This leads to a quasi-warehouse that is less performant than actual warehouses.
     

  3. Risky data lakes

    Where there is a desire to quickly populate the Data Lake, not least to provide grist to the Data Science mill, appropriate controls on access to data can be neglected; particularly an issue where personally identifiable data is involved. This can lead to regulatory, legal and reputational peril.

Barth’s solution to these problems is the establishment of a Data Marketplace. This is a concept previously referenced on these pages in Predictions about Prediction, a review of consultancy Eckerson Group‘s views on Data and Analytics in 2017 [5]. Back then, Eckerson Group had the following to say about the area:

[An Enterprise Data Marketplace (EDM) is] an Amazon-like data marketplace where analysts can seek datasets, see reviews of others, and select the best-fit datasets for their needs helps to encourage dataset reuse, minimize redundancy, and prevent flawed analysis that results from working with less than ideal data. Data cataloging tools, data curation practices, data preparation technologies, and data services will be combined to create a marketplace for data seekers. Enterprise Data Marketplaces return us to the single-source vision that was once touted as the real benefit of Enterprise Data Warehouses.

Enterprise Data Marketplace

So, as illustrated above, a Data Marketplace is essentially a collection of tagged data sets, which have in some cases been treated to increase consistency and utility, combined with information about their contents and usages. These are overlaid by what is essentially a “social media” layer where “shoppers” can search for data and provide feedback on its utility (e.g. a rating mechanism) and also add their own documentation. This means that useful data sets get highly rated and have more explanatory material attached to them.
 


 
Dave Wells

Eckerson Group build on this concept in their white paper The Rise of the Data Marketplace (opens a PDF document), work commissioned in part by Podium Data. In this Eckerson’s Dave Wells (@_DaveWells_) characterises an Enterprise Data Marketplace as having the following attributes [6]:

  • Categorization organises the marketplace to simplify browsing. For example a shopper seeking budget data doesn’t need to browse through unrelated data sets about customers, employees or other data subjects. Categories complement tagging and smart search algorithms, offering a variety of ways to find data sets.
     
  • Curation is active management of the data sets that are available in the EDM. Curation selects and qualifies data sets, describes each data set, and collects and manages metadata about the collection and each individual data set.
     
  • Cataloging exposes data sets for data shoppers, including descriptions and metadata. The catalog is a view into the inventory of curated data sets. Rich metadata and powereful search are important catalog features.
     
  • Crowdsourcing is the equivalent of a social network for data. Data shoppers actively participate in catloging, curating and categorizing data. This virtuous cycle (a chain of events that reinforces outcomes through a feedback loop) continuously improves the quality and value of data in the marketplace.

Back in the Forbes article, Barth focuses on using the Data Marketplace’s interactive elements to identify the most valuable data (that which is searched for most frequently and has the best shopper rating). This data can then be the subject of focussed investment. Such investment is of the sort familiar in Data Warehouse activities, but it is directed by shoppers’ “social media” preferences rather than more formal requirements gathering exercises.
 


 
Dan Woods makes the pertinent observation that:

So, as the challenge now is not one of technology, but of setting a vision, companies have to decide how to incorporate a new set of requirements to get the most out of their data. […] Even within one company, there may be the need for multiple requirements to be met. Marketing may not need the precision that the accounting department requires. Groups with regulatory mandates may have strong compliance requirements that drive the need for data that is 100% accurate, while those doing exploration for product development purposes may prefer to have larger datasets to work with, and 90% accuracy is all that they require. The data lake must be able to employ multiple approaches as needed by different applications and groups of users.

His article finishes with the following clarion call to implement the Data Marketplace vision:

Companies achieve data transparency with data warehouses because of the use of canonical data models. Yet data in data warehouses was trapped in slow processes that lacked agility. The data warehouse data was well understood but couldn’t evolve at the speed of business. The data lake wasn’t able to correct this problem because companies didn’t implement lakes with a sufficiently comprehensive vision. That’s what they need to do now.


 
"Grimpen Mire"

While when I hear about Data Warehouses that take months to change, poor design and a lack of automation both come to mind, it is unarguable that some Data Warehouses can be plagued by long turn-around times [7]. Equally I have seen enough Data Lakes turn into Grimpen Mire to perceive that there are some major issues inherent in an unmodified approach to this area [8]. The Data Marketplace idea is an intriguing one, a mash-up [9] of different approaches that may just yield some tangible results.

I also think that the inherent focus on users’ needs as opposed to technological considerations is the right way to go. I have been making this point for many years now [10] and have full confidence that I will still be doing so in ten years’ time. As with most aspects of life, it is with people, and how a programme interacts with them, that success and failure factors are most readily found. It seems to me that the Data Marketplace approach seeks to embrace this verity, which can only be a point in its favour.
 


 
Acknowledgements

I would like to thank each of Forbes / Dan Woods, Podium Data / Paul Barth and Eckerson Group / Dave Wells for both reviewing this article and allowing me to quote their work. Such generous behaviour is not as typical as one might like to think and always merits recognition.
 


 
Notes

 
[1]
 
Though the total cost of saving such data extends beyond just disk costs and can become significant.
 
[2]
 
See my earlier article Ever tried? Ever failed? for a treatment of what is clearly a fundamental physical constant – that 60- 70% of all types of major programmes don’t fully achieve their objectives (aka fail). Data Lakes appear to also be governed by this Law of Nature.
 
[3]
 
You may need to navigate past a Forbes banner screen before you can access the actual article.
 
[4]
 
The following is my take in Paul’s analysis, for his actual words, see the Forbes article.
 
[5]
 
Watch this space for a review of Eckerson Group’s predictions for 2018.
 
[6]
 
Which I reproduce with permission.
 
[7]
 
By way of contrast, warehouses that my teams have built have been able to digest acquisitions and meet new and onerous regulatory requirements in a matter of weeks, not months.
 
[8]
 
I should stress here a difference between Data Lakes, which seek to be all-embracing, and more focussed Big Data activities, e.g. the building of complex seismological or meteorological models to assess catastrophic insurance risk (see Hurricanes and Data Visualisation: Part II – Map Reading). I have helped the latter to be very successful myself and seen good results in other organisations.
 
[9]
 
Do people still say “mash-up”?
 
[10]
 
For example in my 2008 trilogy:

  1. Marketing Change
  2. Education and cultural transformation
  3. Sustaining Cultural Change

 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

A Retrospective of 2017’s Articles

A Review of 2017

This article was originally intended for publication late in the year it reviews, but, as they [1] say, the best-laid schemes o’ mice an’ men gang aft agley…

In 2017 I wrote more articles [2] than in any year since 2009, which was the first full year of this site’s existence. Some were viewed by thousands of people, others received less attention. Here I am going to ignore the metric of popular acclaim and instead highlight a few of the articles that I enjoyed writing most, or sometimes re-reading a few months later [3]. Given the breadth of subject matter that appears on peterjamesthomas.com, I have split this retrospective into six areas, which are presented in decreasing order of the number of 2017 articles I wrote in each. These are as follows:

  1. General Data Articles
  2. Data Visualisation
  3. Statistics & Data Science
  4. CDO perspectives
  5. Programme Advice
  6. Analytics & Big Data

In each category, I will pick out two or three of pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.

 
 
General Data Articles
 
The Data & Analytics Dictionary
 
August
The Data and Analytics Dictionary
My attempt to navigate the maze of data and analytics terminology. Everything from Algorithm to Web Analytics.
 
The Anatomy of a Data Function
 
November & December
The Anatomy of a Data Function: Part I, Part II and Part III
Three articles focussed on the structure and components of a modern Data Function and how its components interact with both each other and the wider organisation in order to support business goals.
 
 
Data Visualisation
 
Nucleosynthesis and Data Visualisation
 
January
Nucleosynthesis and Data Visualisation
How one of the most famous scientific data visualisations, the Periodic Table, has been repurposed to explain where the atoms we are all made of come from via the processes of nucleosynthesis.
 
Hurricanes and Data Visualisation
 
September & October
Hurricanes and Data Visualisation: Part I – Rainbow’s Gravity and Part II – Map Reading
Two articles on how Data Visualisation is used in Meteorology. Part I provides a worked example illustrating some of the problems that can arise when adopting a rainbow colour palette in data visualisation. Part II grapples with hurricane prediction and covers some issues with data visualisations that are intended to convey safety information to the public.
 
 
Statistics & Data Science
 
Toast
 
February
Toast
What links Climate Change, the Manhattan Project, Brexit and Toast? How do these relate to the public’s trust in Science? What does this mean for Data Scientists?
Answers provided by Nature, The University of Cambridge and the author.
 
How to be Surprisingly Popular
 
February
How to be Surprisingly Popular
The wisdom of the crowd relies upon essentially democratic polling of a large number of respondents; an approach that has several shortcomings, not least the lack of weight attached to people with specialist knowledge. The Surprisingly Popular algorithm addresses these shortcomings and so far has out-performed existing techniques in a range of studies.
 
A Nobel Laureate’s views on creating Meaning from Data
 
October
A Nobel Laureate’s views on creating Meaning from Data
The 2017 Nobel Prize for Chemistry was awarded to Structural Biologist Richard Henderson and two other co-recipients. What can Machine Learning practitioners learn from Richard’s observations about how to generate images from Cryo-Electron Microscopy data?
 
 
CDO Perspectives
 
Alphabet Soup
 
January
Alphabet Soup
Musings on the overlapping roles of Chief Analytics Officer and Chief Data Officer and thoughts on whether there should be just one Top Data Job in an organisation.
 
A Sweeter Spot for the CDO?
 
February
A Sweeter Spot for the CDO?
An extension of my concept of the Chief Data Officer sweet spot, inspired by Bruno Aziza of AtScale.
 
A truth universally acknowledged…
 
September
A truth universally acknowledged…
Many Chief Data Officer job descriptions have a list of requirements that resemble Swiss Army Knives. This article argues that the CDO must be the conductor of an orchestra, not someone who is a virtuoso in every single instrument.
 
 
Programme Advice
 
Bumps in the Road
 
January
Bumps in the Road
What the aftermath of repeated roadworks can tell us about the potentially deleterious impact of Change Programmes on Data Landscapes.
 
20 Risks that Beset Data Programmes
 
February
20 Risks that Beset Data Programmes
A review of 20 risks that can plague data programmes. How effectively these are managed / mitigated can make or break your programme.
 
Ideas for avoiding Big Data failures and for dealing with them if they happen
 
March
Ideas for avoiding Big Data failures and for dealing with them if they happen
Paul Barsch (EY & Teradata) provides some insight into why Big Data projects fail, what you can do about this and how best to treat any such projects that head off the rails. With additional contributions from Big Data gurus Albert Einstein, Thomas Edison and Samuel Beckett.
 
 
Analytics & Big Data
 
Bigger and Better (Data)?
 
February
Bigger and Better (Data)?
Some examples of where bigger data is not necessarily better data. Provided by Bill Vorhies and Larry Greenemeier .
 
Elephants’ Graveyard?
 
March
Elephants’ Graveyard?
Thoughts on trends in interest in Hadoop and Spark, featuring George Hill, James Kobielus, Kashif Saiyed and Martyn Richard Jones, together with the author’s perspective on the importance of technology in data-centric work.
 
 
and Finally…

I would like to close this review of 2017 with a final article, one that somehow defies classification:

 
25 Indispensable Business Terms
 
April
25 Indispensable Business Terms
An illustrated Buffyverse take on Business gobbledygook – What would Buffy do about thinking outside the box? To celebrate 20 years of Buffy the Vampire Slayer and 1st April 2017.

 
Notes

 
[1]
 
“They” here obviously standing for Robert Burns.
 
[2]
 
Thirty-four articles and one new page.
 
[3]
 
Of course some of these may also have been popular, I’m not being masochistic here!

 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

Data and Information

Seasons Greetings for 2017

After having just published three rather lengthy articles in a series [1], here is a piece whose size is at the opposite end of the spectrum.

I am often asked to distinguish between data and information. Indeed this happened just the other day as part of LinkedIn discussions relating to some of my recent articles [2]. In the Data and Analytics Dictionary, I offer the following definition of Information:

Information is the first stop in the journey from Data to Information to Insight to Action. Data may be viewed as raw material, which needs to be refined in order to be useful. Information can be thought of as data enhanced with both relationships and understanding of context.

Here, I will look to be more visual in my definitions, hopefully also embracing the spirit of the time of year. In my opinion, the following image provides a good way to think about the difference between these two related concepts:

Data and Information

Consistent with my Dictionary definition, Information is something you get by organising data based on some knowledge of how it is meant to fit together.

As with most analogies, there are both some interesting ways to extend this and some areas in which it breaks down. In the first column, sometimes not all of the bricks you need are available or the right size (a data quality problem). In the second, you can clearly build a set of Lego bricks [3] into several different forms. It is to be hoped that data, particularly Financial data, is not massaged to provide more than one meaning.

However, I think the up-side of this simple analogy outweighs its fairly obvious limitations. I offer it to readers as a final thought before the 2017 holiday season commences.
 


 
Notes

 
[1]
 
The Anatomy of a Data Function, Parts I, II and III.
 
[2]
 
The discussions may be viewed here (you need to be a member of LinkedIn to view these).
 
[3]
 
Actually Duplo in this case.

 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

The revised and expanded Data and Analytics Dictionary

The Data and Analytics Dictionary

Since its launch in August of this year, the peterjamesthomas.com Data and Analytics Dictionary has received a welcome amount of attention with various people on different social media platforms praising its usefulness, particularly as an introduction to the area. A number of people have made helpful suggestions for new entries or improvements to existing ones. I have also been rounding out the content with some more terms relating to each of Data Governance, Big Data and Data Warehousing. As a result, The Dictionary now has over 80 main entries (not including ones that simply refer the reader to another entry, such as Linear Regression, which redirects to Model).

The most recently added entries are as follows:

  1. Anomaly Detection
  2. Behavioural Analytics
  3. Complex Event Processing
  4. Data Discovery
  5. Data Ingestion
  6. Data Integration
  7. Data Migration
  8. Data Modelling
  9. Data Privacy
  10. Data Repository
  11. Data Virtualisation
  12. Deep Learning
  13. Flink
  14. Hive
  15. Information Security
  16. Metadata
  17. Multidimensional Approach
  18. Natural Language Processing (NLP)
  19. On-line Transaction Processing
  20. Operational Data Store (ODS)
  21. Pig
  22. Table
  23. Sentiment Analysis
  24. Text Analytics
  25. View

It is my intention to continue to revise this resource. Adding some more detail about Machine Learning and related areas is probably the next focus.

As ever, ideas for what to include next would be more than welcome (any suggestions used will also be acknowledged).
 


 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

The peterjamesthomas.com Data and Analytics Dictionary

The Data and Analytics Dictionary

I find myself frequently being asked questions around terminology in Data and Analytics and so thought that I would try to define some of the more commonly used phrases and words. My first attempt to do this can be viewed in a new page added to this site (this also appears in the site menu):

The Data and Analytics Dictionary

I plan to keep this up-to-date as the field continues to evolve.

I hope that my efforts to explain some concepts in my main area of specialism are both of interest and utility to readers. Any suggestions for new entries or comments on existing ones are more than welcome.
 

 

An in-depth Interview with Allan Engelhardt about Analytics

In-depth with Allan Engelhardt


Part of the In-depth series of interviews

PJT Today’s interview is with Allan Engelhardt, co-founder and principal of insights and analytics consultancy Cybaea. Allan and I know each other from when we both worked at Bupa. I was interested to understand the directions that he has been pursuing in recent years.
PJT Allan, we know each other well, but could you provide a pen picture of your career to date and the types of work that you have been engaged in?
AE I started out in experimental physics working on (very) big data from CERN, the large research lab near Geneva, and worked there after getting my degree. Then, like many other physicists, I was recruited into financial services, in my case to do risk management. From there to a consultancy helping business make use of bleeding edge technology and then on to CRM and customer loyalty. This last move was important for me, allowing me to move beyond the technology to be as much about commercial business strategy and operations.

In 2002 a couple of us left the consultancy to help customers move beyond transactional infrastructure, which is really what ‘CRM’ was about at the time, to create high value solution on top, and to create the organizational and commercial ownership of the customer needed to consistently drive value from data, inventing the concept of Customer Value Management which is now universally implemented by telcos across the world and increasingly adopted by other industries.

PJT There is no ISO definition of either insight or analytics. As an expert in these fields, can I ask you to offer your take on the meaning of these terms?
AE To me analytics is about finding meaning from information and data, while insights is about understanding the business opportunities in that meaning. But different people use the terms differently.
PJT I must give you an opportunity to both explain what Cybaea does and how the name came about.
AE At Cybaea we are passionate about value creation and commercial results. We have been called ‘Management consultants with a black belt in data’ and we help organizations identify and act upon data driven opportunities in the areas of:

Cybaea offering

  1. Customer Value Management (CVM), including acquisition, churn, cross-sell, segmentation, and more, across online and offline channels and industries, both B2C and B2B.
  2. Customer Experience and Advocacy, including Net Promoter System and Net Promoter Economics, customer journey optimization, and customer experience.
  3. Innovation and Growth, including data-driven product and proposition development, data monetisation, and distribution and sales strategy.

For our customers, CVM projects typically deliver additional 5% EBITDA growth annually, which you can measure very robustly because much of it is direct marketing. Experience and Advocacy projects typically deliver in the region of 20% EBITDA improvement to our clients, but it is harder to measure accurately because you must go above the line for this level of impact. And for Innovation and Growth, the sky is the limit.

As for the name, we founded the company in 2002 and wanted a short domain name that was a real word. It turned out to be difficult to find an available, short ‘.com’ at the peak of the dot-bomb era! We settled on ‘cybaea’ which my Latin dictionary translated as ‘trading vessel’; historically, it was a type of merchant ship of Greek origin, common in the Mediterranean, which Cicero describes as “most beautiful and richly adorned”. We always say we want to change the name, but it never happens; I guess if it was good enough for Cicero, then it is good enough for us.

PJT While at Bupa you led work that was very beneficial to the organisation and which is now the subject of a public Cybaea case study, can you tell readers a bit more about this?
AE Certainly, and the case study is available at for anyone who wants to read more.

This was working with Bupa Global; a Bupa business unit that primarily provides international private medical insurance for 2 million customers living in over 195 different countries. Towards the end of 2013, Bupa Global set out on a strategic journey to deliver sustained growth. A key element of this was the design and launch of a completely new set of products and propositions, replacing the existing portfolio, with the objective of attracting and servicing new customer segments, complying with changing regulation and meeting customer expectations.

The strategic driver was therefore very much in the Innovation and Growth space we outlined above, and I joined Bupa’s global Leadership Team to create and lead the commercial insights function that would support this change with deep understanding of the target customers and the markets in which they live. Additionally, Bupa had very high ambitions for its Net Promoter programme (Experience and Advocacy) where we delivered the most advanced installation across the global business, and for Customer Value Management we demonstrated nearly 2% reduction in the Claims line (EBITDA) from one single project.

For the new propositions, we initially interviewed over 3,000 individuals on five continents to understand value- and purchase drivers, researched 195 markets to size demand across all customer segments, and further deep-dived into key markets to understand the competitors with products, features, and prices, as well as the regulatory environment, and distribution options. This was supported by a very practical Customer Lifetime Value model, which we developed.

Suffice to say that in two years we had designed and implemented a completely new set of propositions and taken them live in more than twenty priority markets where they replaced the old products.

The strategic and commercial results were clearly delivered. But when I asked our CEO what he thought was the main contribution of the team and the new insights function, he focused on trust: “Every major strategic decision we made was backed by robust data and deep insights in which the executive team had full confidence.”

In a period of change, trust is perhaps the key currency. Trust that you are doing the right things for the right reasons, and the ability to explain why that is. This is key to get everybody behind the changes that need to happen. This is what the scientific method applied to data, analytics, and insights can bring to a commercial organization, and it inspires me to continue what we are doing.

PJT We have both been engaged in what is now generally called the Data arena for many years, some aspects of the technology employed have changed a lot during this time. What do you think modern technology enables today that was harder to achieve in the past and are there any areas where things are much the same as they were a decade or more ago?
AE Ever since the launch of the Amazon EC2 cloud computing service in late 2006 [1], data storage and processing infrastructure has been easily and cheaply available to everybody for most practical workloads. So, for ten years you have not had any excuse for not getting your data in order and doing serious analysis.

The main trend that excites me now is the breakthroughs happening in Deep Learning and Natural Language Processing, expanding the impact of data into completely new areas. This is great for consumers and for those companies that are at the leading edge of analytics and insights. For other organizations, however, who are struggling to deliver value from data, it means that the gap between where they are versus best practice is widening exponentially, which is a big worry.

PJT Taking technology to one side, what do you think are the main factors in successfully generating insight and developing analytical capabilities that are tightly coupled with value generation?
AE Two things are always at the forefront of my mind. The first is kind of obvious, namely to start with the business value you are trying to create and work backwards from that. Too often we see people start with the data (‘I got to clean all the data in my warehouse first!’), the technology (‘We need some Big Data infrastructure!’), or the analytics (‘We need a predictive churn model!’). That is cart before the horse. Not that these things are not important; rather, that there are almost certainly a lot of opportunities you could execute right now to generate real and measurable business value and drive a faster return on your investments.

The second is to not under-estimate the business change that is needed to exploit the insights. Analytical leaders have appetite for change and they plan and resource accordingly. Data and models are only part of the project to deliver the value and they are really clear on this.

PJT Looking at the other side of the coin, what at the pitfalls to look out for and do you have any recommendations for avoiding them?
AE The flip-side of the two points previously mentioned are obvious pitfalls: not starting from the business change and value you are trying to create. And it is not easy: great data scientists are not always great commercially-minded business people and so you need the right kind of skills to bridge that gap. McKinsey talks of ‘business translators who combine data savvy with industry and functional expertise’, which is a helpful summary [2]. Less helpfully they also note that these people are nearly impossible to find, so you may need to find or grow them internally.

Which gets to a second pitfall. When thinking about generating value from data, many want to do it all themselves. And I understand why: after all, data may well be a strategic asset for your organization.

But when you recruit, you should be clear in your mind if you are recruiting to deliver the change of creating the first models and changed business processes, or if you are recruiting to sustain the change by keeping the models current and incrementally improving the insights and processes. These two outcomes require people with quite different skills and vastly different temperaments.

We call them Explorers versus Farmers.

For the first, you want commercially-focused business people who can drive change in the organization; who can make things work quickly, whether that is data, analytics, or business processes, to demonstrate value; and who are supremely comfortable with uncertainties and unknowns.

For the second, you want people who are technically skilled to deliver and maintain the optimal stable platform and who love doing incremental improvements to technology, data, and business processes.

Explorers versus Farmers. Call them what you will, but note that they are different.

PJT Many companies are struggling with how to build analytical teams. Do they grow their own talent, do they hire numerate graduates or post graduates, do they seek to employ highly skilled and experienced individuals, do they form partnerships with external parties, or is a mixture of all of these approaches sensible? What approaches do you see at Cybaea clients adopting?
AE We are mostly seeing one of two approaches: one is to do nothing and soldier on as always relying on traditional business intelligence while the other is to hire usually highly technical people to build an internal team. Neither is optimal in getting to the value.

The do-nothing approach can make sense. Not, however, when it is adopted because management fears change (change will happen, regardless) or because they feel they don’t understand data (everybody understands data if it is communicated well). Those companies are just leaving money on the table: every organization have quick wins that can deliver value in weeks.

But it may be that you have no capacity for change and have made the informed decision that data and analytics must wait, reflecting the commercial reality. The key here is ‘informed’ and the follow-on question is if there are other ways that the company can realise some of the value from data right now.

The second approach at least recognises the value potential of data and aims to move the organization towards realising that value. But it is back to those ‘business translator’ roles we discussed before and making sure you have them, as well as making sure the business is aligned around the change that will be needed. Making money from data is a business function, not a technical one, and the function that drives the change must sit within the commercial business, not in IT or some other department that is still an arms-length support function.

We see the best organizations, the analytical leaders, employing flexible approaches. They focus on the outcomes and they have a sense of urgency driven from the top. They make it work.

PJT I know that a concept you are very interested in is Analytics as a Service (AaaS). Can you tell readers some more about what this means and also the work that Cybaea is doing in this area?
AE There is a war on analytical talent and a ‘winner takes it all’ dynamic is emerging with medium-sized enterprises especially losing out. Good people want to work with good people which generates a strong network effect giving advantage to large organizations with larger analytical teams and more variety of applications. Leading firms have depth of analytical talent and can recruit, trial, and filter more candidates, leaving them with the best talent.

Our analytics-as-a-service offering is for organizations of any size who want to realise value from data and insights right now, but who are not yet ready to build their own internal teams. We partner with the commercial teams to be their (commercial) insights function and deliver not just reports but real business change. Customers can pay monthly, pay for results, or we can do a build-operate-transfer model.

One of our first projects was with a small telco. They were too small to maintain a strong analytical team in-house, purely because of scale. We set up a monthly workshop with the commercial Marketing team. We analysed their data offline and used the time for a structured conversation about the new campaigns and the new changes to the web site they should implement this month. We would point them to our reports and dashboards which had models, graphs, t-tests, and p-values in abundance, but would focus the conversation on moving the business forward.

The following month we would repeat and identify new campaigns and new changes. After six months, they had more than 20 highly effective and precisely targeted campaigns running, and we handed over the maintenance (‘farming’) of the models to their IT teams. It is a model that works well across industries.

PJT Do you have a view on how the insights and analytics field is likely to change in coming years? Are there any emerging areas which you think readers should keep an eye on?
AE Many people are focused on the data explosion that is often called the ‘Internet of Things’ but more broadly means that more data gets generated and we consume more data for our analytics. I do think this opens tremendous opportunities for many businesses and technically I am excited to get back to processing live event streams as they happen.

But practically, we are seeing more success from deep learning. We have found that once an organization successfully implements one solution, whether artificial intelligence or complex natural language processing, then they want more. It is that powerful and that transformational, and breakthroughs in these fields are further expanding the impact into completely new area. My advice is that most organizations should at least trial what these approaches can do for them, and we have set up a sister-organization to develop and deliver solutions here.

PJT What are your plans for Cybaea in coming months?
AE I have two main priorities. First, I have our long-standing partner from India in London for a couple of months to figure out how we scale in the UK. This is for the analytics as a service but also for fast projects to deliver insights or analytical tools and applications.

Second, I am looking to identify the right partners and associates for Cybaea here in the UK to allow us to grow the business. We have great assets in our methodologies, clients, and people, and a tremendous opportunity for delivering commercial value from data, so I am very excited for the future.

PJT Allan, I would like to thank you for sharing with us the benefit of your experience and expertise in data matters, both of which have been very illuminating.

Allan Engelhardt can be reached at Allan.Engelhardt@cybaea.net. Cybaea’s website is www.cybaea.net and they have social media presence on LinkedIn and Google+.


Disclosure: Neither peterjamesthomas.com Ltd. nor any of its directors have any direct financial interest in either Cybaea or any of the other organisations mentioned in this article.


If you are a Chief Data Officer, a Chief Analytics Officer, a Director of Data, or hold some other “Top Data Job” and would like to share your thoughts with the readers of this site in an interview like this one, please get in contact.

 
Notes

 
[1]
 
https://aws.amazon.com/about-aws/whats-new/2006/08/24/announcing-amazon-elastic-compute-cloud-amazon-ec2—beta/
 
[2]
 
McKinsey report The Age of Analytics, dated December 2016, http://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases