An expanded and more mobile-friendly version of the Data & Analytics Dictionary

The Data and Analytics Dictionary

A revised and expanded version of the peterjamesthomas.com Data and Analytics Dictionary has been published.

Mobile version of The Data & Analytics Dictionary (yes I have an iPhone 6s in 2020, please don't judge me!)

The previous Dictionary was not the easiest to read on mobile devices. Because of this, the layout has been amended in this release and the mobile experience should now be greatly enhanced. Any feedback on usability would be welcome.

The new Dictionary includes 22 additional definitions, bringing the total number of entries to 220, totalling well over twenty thousand words. As usual, the new definitions range across the data arena: from Data Science and Machine Learning; to Information and Reporting; to Data Governance and Controls. They are as follows:

  1. Analysis Facility
  2. Analytical Repository
  3. Boosting [Machine Learning]
  4. Conformed Data (Conformed Dimension)
  5. Data Capability
  6. Data Capability Framework (Data Capability Model)
  7. Data Capability Review (Data Capability Assessment)
  8. Data Driven
  9. Data Governance Framework
  10. Data Issue Management
  11. Data Maturity
  12. Data Maturity Model
  13. Data Owner
  14. Data Protection Officer (DPO)
  15. Data Roadmap
  16. Geospatial Tool
  17. Image Recognition (Computer Vision)
  18. Overfitting
  19. Pattern Recognition
  20. Robot (Robotics, Bot)
  21. Random Forest
  22. Structured Reporting Framework

Please remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

If you would like to contribute a definition, which will of course be acknowledged, you can use the comments section here, or the dedicated form, we look forward to hearing from you [1].

If you have found The Data & Analytics Dictionary helpful, we would love to learn more about this. Please post something in the comments section or contact us and we may even look to feature you in a future article.

The Data & Analytics Dictionary will continue to be expanded in coming months.
 


Notes

 
[1]
 
Please note that any submissions will be subject to editorial review and are not guaranteed to be accepted.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

Thank you to Ankit Rathi for including me in his list of Data Science / Artificial Intelligence practitioners that he admires

It’s always nice to learn that your work is appreciated and so thank you to Ankit Rathi for including me in his list of Data Science and Artificial Intelligence practitioners.

I am in good company as he also gives call outs to:


peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

The latest edition of The Data & Analytics Dictionary is now out

The Data and Analytics Dictionary

After a hiatus of a few months, the latest version of the peterjamesthomas.com Data and Analytics Dictionary is now available. It includes 30 new definitions, some of which have been contributed by people like Tenny Thomas Soman, George Firican, Scott Taylor and and Taru Väre. Thanks to all of these for their help.

  1. Analysis
  2. Application Programming Interface (API)
  3. Business Glossary (contributor: Tenny Thomas Soman)
  4. Chart (Graph)
  5. Data Architecture – Definition (2)
  6. Data Catalogue
  7. Data Community
  8. Data Domain (contributor: Taru Väre)
  9. Data Enrichment
  10. Data Federation
  11. Data Function
  12. Data Model
  13. Data Operating Model
  14. Data Scrubbing
  15. Data Service
  16. Data Sourcing
  17. Decision Model
  18. Embedded BI / Analytics
  19. Genetic Algorithm
  20. Geospatial Data
  21. Infographic
  22. Insight
  23. Management Information (MI)
  24. Master Data – additional definition (contributor: Scott Taylor)
  25. Optimisation
  26. Reference Data (contributor: George Firican)
  27. Report
  28. Robotic Process Automation
  29. Statistics
  30. Self-service (BI or Analytics)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

If you would like to contribute a definition, which will of course be acknowledged, you can use the comments section here, or the dedicated form, we look forward to hearing from you [1].

If you have found The Data & Analytics Dictionary helpful, we would love to learn more about this. Please post something in the comments section or contact us and we may even look to feature you in a future article.

The Data & Analytics Dictionary will continue to be expanded in coming months.
 


Notes

 
[1]
 
Please note that any submissions will be subject to editorial review and are not guaranteed to be accepted.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

New Thinking, Old Thinking and a Fairytale

BPR & Business Transformation vs Data Science

Of course it can be argued that you can use statistics (and Google Trends in particular) to prove anything [1], but I found the above figures striking. The above chart compares monthly searches for Business Process Reengineering (including its arguable rebranding as Business Transformation) and monthly searches for Data Science between 2004 and 2019. The scope is worldwide.



Brunel’s Heirs

Back then the engineers were real engineers...

Business Process Reengineering (BPR) used to be a big deal. Optimising business processes was intended to deliver reduced costs, increased efficiency and to transform also-rans into World-class organisations. Work in this area was often entwined with the economic trend of Globalisation. Supply chains were reinvented, moving from in-country networks to globe-spanning ones. Many business functions mirrored this change, moving certain types of work from locations where staff command higher salaries to ones in other countries where they don’t (or at least didn’t at the time [2]). Often BPR work explicitly included a dimension of moving process elements offshore, maybe sometimes to people who were better qualified to carry them out, but always to ones who were cheaper. Arguments about certain types of work being better carried out by co-located staff were – in general – sacrificed on the altar of reduced costs. In practice, many a BPR programme morphed into the narrower task of downsizing an organisation.

In 1995, Thomas Davenport, an EY consultant who was one of the early BPR luminaries, had this to say on the subject:

“When I wrote about ‘business process redesign’ in 1990, I explicitly said that using it for cost reduction alone was not a sensible goal. And consultants Michael Hammer and James Champy, the two names most closely associated with reengineering, have insisted all along that layoffs shouldn’t be the point. But the fact is, once out of the bottle, the reengineering genie quickly turned ugly.”

Fast Company – Reengineering – The Fad That Forgot People, Thomas Davenport, November 1995 [3a]

A decade later, Gartner had some rather sobering thoughts to offer on the same subject:

Gartner predicted that through 2008, about 60% of organizations that outsource customer-facing functions will see client defections and hidden costs that outweigh any potential cost savings. And reduced costs aren’t guaranteed […]. Gartner found that companies that employ outsourcing firms for customer service processes pay 30% more than top global companies pay to do the same functions in-house.

Computerworld – Gartner: Customer-service outsourcing often fails, Scarlet Pruitt, March 2005

It is important here to bear in mind that neither of the above critiques comes from people implacably opposed to BPR, but rather either a proponent or a neutral observer. Clearly, somewhere along the line, things started to go wrong in the world of BPR.



Dilbert’s Dystopia

© Scott Adams (2017) - dilbert.com

© Scott Adams (2017) – dilbert.com

Even when organisations abjured moving functions to other countries and continents, they generally embraced another 1990s / 2000s trend, open plan offices, with more people crammed into available space, allowing some facilities to be sold and freed-up space to be sub-let. Of course such changes have a tangible payback, no one would do them otherwise. What was not generally accounted for were the associated intangible costs. Some of these are referenced by The Atlantic in an article (which, in turn, cites a study published by The Royal Society entitled The impact of the ‘open’ workspace on human collaboration):

“If you’re under 40, you might have never experienced the joy of walls at work. In the late 1990s, open offices started to catch on among influential employers—especially those in the booming tech industry. The pitch from designers was twofold: Physically separating employees wasted space (and therefore money), and keeping workers apart was bad for collaboration. Other companies emulated the early adopters. In 2017, a survey estimated that 68 percent of American offices had low or no separation between workers.

Now that open offices are the norm, their limitations have become clear. Research indicates that removing partitions is actually much worse for collaborative work and productivity than closed offices ever were.”

The Atlantic – Workers Love AirPods Because Employers Stole Their Walls, Amanda Mull, April 2019

When you consider each of lost productivity, the collateral damage caused when staff vote with their feet and the substantial cost of replacing them, incremental savings on your rental bills can seem somewhat less alluring.



Reengineering Redux

Don't forget about us...

Nevertheless, some organisations did indeed reap benefits as a result of some or all of the activities listed above; it is worth noting however that these tended to be the organisations that were better run to start with. Others, maybe historically poor performers, spent years turning their organisations inside out with the anticipated payback receding ever further out of sight. In common with failure in many areas, issues with BPR have often been ascribed to a neglect of the human aspects of change. Indeed, one noted BPR consultant, the above-referenced Michael Hammer, said the following when interviewed by The Wall Street Journal:

“I wasn’t smart enough about that. I was reflecting my engineering background and was insufficiently appreciative of the human dimension. I’ve learned that’s critical.”

The Wall Street Journal – Reengineering Gurus Take Steps to Remodel Their Stalling Vehicles, Joseph White, November 1996 [3b]

As with most business trends, Business Transformation (to adopt the more current term) can add substantial value – if done well. An obvious parallel in my world is to consider another business activity that reached peak popularity in the 2000s, Data Warehouse programmes [4]. These could also add substantial value – if done well; but sadly many of them weren’t. Figures suggest that both BPR and Data Warehouse programmes have a failure rate of 60 – 70% [5]. As ever, the key is how you do these activities, but this is a topic I have covered before [6] and not part of my central thesis in this article.

My opinion is that the fall-off you see in searches for BPR / Business Transformation reflects two things: a) many organisations have gone through this process (or tried to) already and b) the results of such activities have been somewhat mixed.



“O Brave New World”

Constant Change

Many pundits opine that we are now in an era of constant change and also refer to the tectonic shift that technologies like Artificial Intelligence will lead to. They argue further that new approaches and new thinking will be needed to meet these new challenges. Take for example, Bernard Marr, writing in Forbes:

Since we’re in the midst of the transformative impact of the Fourth Industrial Revolution, the time is now to start preparing for the future of work. Even just five years from now, more than one-third of the skills we believe are essential for today’s workforce will have changed according to the Future of Jobs Report from the World Economic Forum. Fast-paced technological innovations mean that most of us will soon share our workplaces with artificial intelligences and bots, so how can you stay ahead of the curve?

Forbes – The 10 Vital Skills You Will Need For The Future Of Work, Bernard Marr, April 2019

However, neither these opinions, nor the somewhat chequered history of things like BPR and open plan office seem to stop many organisations seeking to apply 1990s approaches in the (soon to be) 2020s. As a result, the successors to BPR are still all too common. Indeed, to make a possibly contrarian point, in some cases this may be exactly what organisations should be doing. Where I agree with Bernard Marr and his ilk is that this is not all that they should be doing. The whole point of this article is to recommend that they do other things as well. As comforting as nostalgia can be, sometimes the other things are much more important than reliving the 1990s.



Gentlemen (and Ladies) Place your Bets

Place your bets

Here we come back to the upward trend in searches for Data Science. It could be argued of course that this is yet another business fad (indeed some are speaking about Big Data in just those terms already [7]), but I believe that there is more substance to the area than this. To try to illustrate this, let me start by telling you a fairytale [8]; yes your read that right, a fairytale.

   
What is this Python ye spake of?

\mathfrak{Once} upon a time, there was a Kingdom, the once great Kingdom of Suzerain. Of late it had fallen from its former glory and, accordingly, the King’s Chief Minister, one who saw deeper and further than most, devised a scheme which she prophesied would arrest the realm’s decline. This would entail a grand alliance with Elven artisans from beyond the Altitudinous Mountains and a tribe of journeyman Dwarves [9] from the furthermost shore of the Benthic Sea. Metalworking that had kept many a Suzerain smithy busy would now be done many leagues from the borders of the Kingdom. The artefacts produced by the Elves and Dwarves were of the finest quality, but their craftsmen and women demanded fewer golden coins than the Suzerain smiths.

\mathfrak{In} a vision the Chief Minister saw the Kingdom’s treasury swelling. Once all was in place, the new alliances would see a fifth more gold being locked in Suzerain treasure chests before each winter solstice. Yet the King’s Chief Minister also foresaw that reaching an agreement with the Elves and Dwarves would cost much gold; there were also Suzerain smiths to be requited. Further she predicted that the Kingdom would be in turmoil for many Moons; all told three winters would come and go before the Elves and Dwarves would be working with due celerity.

\mathfrak{Before} the Moon had changed, a Wizard appeared at court, from where none knew. He bore a leather bag, overspilling gold coins, in his long, delicate fingers. When the King demanded to know whence this bounty came, the Wizard stated that for five days and five nights he had surveyed Suzerain with his all-seeing-eye. This led him to discover that gold coins were being dispatched to the Goblins of the Great Arboreal Forest, gold which was not their rightful weregild [10]. The bag held those coins that had been put aside for the Goblins over the next four seasons. Just this bag contained a tenth of the gold that was customarily deposited in the King’s treasure chests by winter time. The Wizard declared his determination to deploy his discerning divination daily [11], should the King confer on him the high office of Chief Wizard of Suzerain [12].

\mathfrak{The} King was a wise King, but now he was gripped with uncertainty. The office of Chief Wizard commanded a stipend that was not inconsiderable. He doubted that he could both meet this and fulfil the Chief Minister’s vision. On one hand, the Wizard had shown in less than a Moon’s quarter that his thaumaturgy could yield gold from the aether. On the other, the Chief Minister’s scheme would reap dividends twofold the mage’s bounty every four seasons; but only after three winters had come and gone. The King saw that he must ponder deeply on these weighty matters and perhaps even dare to seek the counsel of his ancestors’ spirits. This would take time.

\mathfrak{As} it happens, the King never consulted the augurs and never decided as the Kingdom of Suzerain was totally obliterated by a marauding dragon the very next day, but the moral of the story is still crystal clear…

 

I will leave readers to infer the actual moral of the story, save to say that while few BPR practitioners self-describe as Wizards, Data Scientist have been known to do this rather too frequently.

It is hard to compare ad hoc Data Science projects, which can have a very major payback sometimes and a more middling one on other occasions, with a longer term transformation. On one side you have an immediate stream of one off and somewhat variable benefits, on the other deferred, but ongoing and steady, annual benefits. One thing that favours a Data Science approach is that this is seldom dependent on root and branch change to the organisation, just creative use of internal and external datasets that already exist. Another is that you can often start right away.

Perhaps the King in our story should have put his faith in both his Chief Minister and the Wizard (as well as maybe purchasing a dragon early warning system [13]); maybe a simple tax on the peasantry was all that was required to allow investment in both areas. However, if his supply of gold was truly limited, my commercial judgement is that new thinking is very often a much better bet than old. I’m on team Wizard.
 


  
Notes

 
[1]
 
There are many caveats around these figures. Just one obvious point is that people searching for a term on Google is not the same as what organisations are actually doing. However, I think it is hard to argue that that they are not at least indicative.
 
[2]
 
“Aye, there’s the rub”
 
[3a/b]
 
The Davenport and Hammer quotes were initially sourced from the Wikipedia page on BPR.
 
[4]
 
Feel free to substitute Data Lake for Data Warehouse if you want a more modern vibe, sadly it won’t change the failure statistics.
 
[5]
 
In Ideas for avoiding Big Data failures and for dealing with them if they happen I argued that a 60% failure rate for most human endeavours represents a fundamental Physical Constant, like the speed of light in a vacuum or the mass of an electron:

“Data warehouses play a crucial role in the success of an information program. However more than 50% of data warehouse projects will have limited acceptance, or will be outright failures”

– Gartner 2007

“60-70% of the time Enterprise Resource Planning projects fail to deliver benefits, or are cancelled”

– CIO.com 2010

“61% of acquisition programs fail”

– McKinsey 2009

 
[6]
 
For example in 20 Risks that Beset Data Programmes.
 
[7]
 
See Sic Transit Gloria Magnorum Datorum.
 
[8]
 
The scenario is an entirely real one, but details have been changed ever so slightly to protect the innocent.
 
[9]
 
Of course the plural of Dwarf is Dwarves (or Dwarrows), not Dwarfs, what is wrong with you?
 
[10]
 
Goblins are not renowned for their honesty it has to be said.
 
[11]
 
Wizards love alliteration.
 
[12]
 
CWO?
 
[13]
 
And a more competent Chief Risk Officer.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

A Simple Data Capability Framework

Introduction

As part of my consulting business, I end up thinking about Data Capability Frameworks quite a bit. Sometimes this is when I am assessing current Data Capabilities, sometimes it is when I am thinking about how to transition to future Data Capabilities. Regular readers will also recall my tripartite series on The Anatomy of a Data Function, which really focussed more on capabilities than purely organisation structure [1].

Detailed frameworks like the one contained in Anatomy are not appropriate for all audiences. Often I need to provide a more easily-absorbed view of what a Data Function is and what it does. The exhibit above is one that I have developed and refined over the last three or so years and which seems to have resonated with a number of clients. It has – I believe – the merit of simplicity. I have tried to distil things down to the essentials. Here I will aim to walk the reader through its contents, much of which I hope is actually self-explanatory.

The overall arrangement has been chosen intentionally, the top three areas are visible activities, the bottom three are more foundational areas [2], ones that are necessary for the top three boxes to be discharged well. I will start at the top left and work across and then down.
 
 
Collation of Data to provide Information

Dashboard

This area includes what is often described as “traditional” reporting [3], Dashboards and analysis facilities. The Information created here is invaluable for both determining what has happened and discerning trends / turning points. It is typically what is used to run an organisation on a day-to-day basis. Absence of such Information has been the cause of underperformance (or indeed major losses) in many an organisation, including a few that I have been brought in to help. The flip side is that making the necessary investments to provide even basic information has been at the heart of the successful business turnarounds that I have been involved in.

The bulk of Business Intelligence efforts would also fall into this area, but there is some overlap with the area I next describe as well.
 
 
Leverage of Data to generate Insight

Voronoi diagram

In this second area we have disciplines such as Analytics and Data Science. The objective here is to use a variety of techniques to tease out findings from available data (both internal and external) that go beyond the explicit purpose for which it was captured. Thus data to do with bank transactions might be combined with publically available demographic and location data to build an attribute model for both existing and potential clients, which can in turn be used to make targeted offers or product suggestions to them on Digital platforms.

It is my experience that work in this area can have a massive and rapid commercial impact. There are few activities in an organisation where a week’s work can equate to a percentage point increase in profitability, but I have seen insight-focussed teams deliver just that type of ground-shifting result.
 
 
Control of Data to ensure it is Fit-for-Purpose

Data controls

This refers to a wide range of activities from Data Governance to Data Management to Data Quality improvement and indeed related concepts such as Master Data Management. Here as well as the obvious policies, processes and procedures, together with help from tools and technology, we see the need for the human angle to be embraced via strong communications, education programmes and aligning personal incentives with desired data quality outcomes.

The primary purpose of this important work is to ensure that the information an organisation collates and the insight it generates are reliable. A helpful by-product of doing the right things in these areas is that the vast majority of what is required for regulatory compliance is achieved simply by doing things that add business value anyway.
 
 
Data Architecture / Infrastructure

Data architecture

Best practice has evolved in this area. When I first started focussing on the data arena, Data Warehouses were state of the art. More recently Big Data architectures, including things like Data Lakes, have appeared and – at least in some cases – begun to add significant value. However, I am on public record multiple times stating that technology choices are generally the least important in the journey towards becoming a data-centric organisation. This is not to say such choices are unimportant, but rather that other choices are more important, for example how best to engage your potential users and begin to build momentum [4].

Having said this, the model that seems to have emerged of late is somewhat different to the single version of the truth aspired to for many years by organisations. Instead best practice now encompasses two repositories: the first Operational, the second Analytical. At a high-level, arrangements would be something like this:

Data architecture

The Operational Repository would contain a subset of corporate data. It would be highly controlled, highly reconciled and used to support both regular reporting and a large chunk of dashboard content. It would be designed to also feed data to other areas, notably Finance systems. This would be complemented by the Analytical Repository, into which most corporate data (augmented by external data) would be poured. This would be accessed by a smaller number of highly skilled staff, Data Scientists and Analytics experts, who would use it to build models, produce one off analyses and to support areas such as Data Visualisation and Machine Learning.

It is not atypical for Operational Repositories to be SQL-based and Analytical Repsoitories to be Big Data-based, but you could use SQL for both or indeed Big Data for both according to the circumstances of an organisation and its technical expertise.
 
 
Data Operating Model / Organisation Design

Organisational design

Here I will direct readers to my (soon to be updated) earlier work on The Anatomy of a Data Function. However, it is worth mentioning a couple of additional points. First an Operating Model for data must encompass the whole organisation, not just the Data Function. Such a model should cover how data is captured, sourced and used across all departments.

Second I think that the concept of a Data Community is important here, a web of like-minded Data Scientists and Analytics people, sitting in various business areas and support functions, but linked to the central hub of the Data Function by common tooling, shared data sets (ideally Curated) and aligned methodologies. Such a virtual data team is of course predicated on an organisation hiring collaborative people who want to be part of and contribute to the Data Community, but those are the types of people that organisations should be hiring anyway [5].
 
 
Data Strategy

Data strategy

Our final area is that of Data Strategy, something I have written about extensively in these pages [6] and a major part of the work that I do for organisations.

It is an oft-repeated truism that a Data Strategy must reflect an overarching Business Strategy. While this is clearly the case, often things are less straightforward. For example, the Business Strategy may be in flux; this is particularly the case where a turn-around effort is required. Also, how the organisation uses data for competitive advantage may itself become a central pillar of its overall Business Strategy. Either way, rather than waiting for a Business Strategy to be finalised, there are a number of things that will need to be part of any Data Strategy: the establishment of a Data Function; a focus on making data fit-for-purpose to better support both information and insight; creation of consistent and business-focussed reporting and analysis; and the introduction or augmentation of Data Science capabilities. Many of these activities can help to shape a Business Strategy based on facts, not gut feel.

More broadly, any Data Strategy will include: a description of where the organisation is now (threats and opportunities); a vision for commercially advantageous future data capabilities; and a path for moving between the current and the future states. Rather than being PowerPoint-ware, such a strategy needs to be communicated assiduously and in a variety of ways so that it can be both widely understood and form a guide for data-centric activities across the organisation.
 
 
Summary
 
As per my other articles, the data capabilities that a modern organisation needs are broader and more detailed than those I have presented here. However, I have found this simple approach a useful place to start. It covers all the basic areas and provides a scaffold off of which more detailed capabilities may be hung.

The framework has been informed by what I have seen and done in a wide range of organisations, but of course it is not necessarily the final word. As always I would be interested in any general feedback and in any suggestions for improvement.
 


 
Notes

 
[1]
 
In passing, Anatomy is due for its second refresh, which will put greater emphasis on Data Science and its role as an indispensable part of a modern Data Function. Watch this space.
 
[2]
 
Though one would hope that a Data Strategy is also visible!
 
[3]
 
Though nowadays you hear “traditional” Analytics and “traditional” Big Data as well (on the latter see Sic Transit Gloria Magnorum Datorum), no doubt “traditional” Machine Learning will be with us at some point, if it isn’t here already.
 
[4]
 
See also Building Momentum – How to begin becoming a Data-driven Organisation.
 
[5]
 
I will be revisiting the idea of a Data Community in coming months, so again watch this space.
 
[6]
 
Most explicitly in my three-part series:

  1. Forming an Information Strategy: Part I – General Strategy
  2. Forming an Information Strategy: Part II – Situational Analysis
  3. Forming an Information Strategy: Part III – Completing the Strategy

 
peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 
 

A Retrospective of 2018’s Articles

A Review of 2018

This is the second year in which I have produced a retrospective of my blogging activity. As in 2017, I have failed miserably in my original objective of posting this early in January. Despite starting to write this piece on 18th December 2018, I have somehow sneaked into the second quarter before getting round to completing it. Maybe I will do better with 2019’s highlights!

Anyway, 2018 was a record-breaking year for peterjamesthomas.com. The site saw more traffic than in any other year since its inception; indeed hits were over a third higher than in any previous year. This increase was driven in part by the launch of my new Maths & Science section, articles from which claimed no fewer than 6 slots in the 2018 top 10 articles, when measured by hits [1]. Overall the total number of articles and new pages I published exceeded 2017’s figures to claim the second spot behind 2009; our first year in business.

As with every year, some of my work was viewed by tens of thousands of people, while other pieces received less attention. This is my selection of the articles that I enjoyed writing most, which does not always overlap with the most popular ones. Given the advent of the Maths & Science section, there are now seven categories into which I have split articles. These are as follows:

  1. General Data Articles
  2. Data Visualisation
  3. Statistics & Data Science
  4. CDO perspectives
  5. Programme Advice
  6. Analytics & Big Data
  7. Maths & Science

In each category, I will pick out one or two pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.

 
 
General Data Articles
 
A Brief History of Databases
 
February
A Brief History of Databases
An infographic spanning the history of Database technology from its early days in the 1960s to the landscape in the late 2010s..
 
Data Strategy Alarm Bell
 
July
How to Spot a Flawed Data Strategy
What alarm bells might alert you to problems with your Data Strategy; based on the author’s extensive experience of both developing Data Strategies and vetting existing ones.
 
Just the facts...
 
August
Fact-based Decision-making
Fact-based decision-making sounds like a no brainer, but just how hard is it to generate accurate facts?
 
 
Data Visualisation
 
Comparative Pie Charts
 
August
As Nice as Pie
A review of the humble Pie Chart, what it is good at, where it presents problems and some alternatives.
 
 
Statistics & Data Science
 
Data Science Challenges – It’s Deja Vu all over again!
 
August
Data Science Challenges – It’s Deja Vu all over again!
A survey of more than 10,000 Data Scientists highlights a set of problems that will seem very, very familiar to anyone working in the data space for a few years.
 
 
CDO Perspectives
 
The CDO Dilemma
 
February
The CDO – A Dilemma or The Next Big Thing?
Two Forbes articles argue different perspectives about the role of Chief Data Officer. The first (by Lauren deLisa Coleman) stresses its importance, the second (by Randy Bean) highlights some of the challenges that CDOs face.
 
2018 CDO Interviews
 
May onwards
The “In-depth” series of CDO interviews
Rather than a single article, this is a series of four talks with prominent CDOs, reflecting on the role and its challenges.
 
The Chief Marketing Officer and the CDO – A Modern Fable
 
October
The Chief Marketing Officer and the CDO – A Modern Fable
Discussing an alt-facts / “fake” news perspective on the Chief Data Officer role.
 
 
Programme Advice
 
Building Momentum
 
June
Building Momentum – How to begin becoming a Data-driven Organisation
Many companies want to become data driven, but getting started on the journey towards this goal can be tough. This article offers a framework for building momentum in the early stages of a Data Programme.
 
 
Analytics & Big Data
 
Enterprise Data Marketplace
 
January
Draining the Swamp
A review of some of the problems that can beset Data Lakes, together with some ideas about what to do to fix these from Dan Woods (Forbes), Paul Barth (Podium Data) and Dave Wells (Eckerson Group).
 
Sic Transit Gloria Mundi
 
February
Sic Transit Gloria Magnorum Datorum
In a world where the word has developed a very negative connotation, what’s so bad about being traditional?
 
Convergent Evolution of Data Architectures
 
August
Convergent Evolution
What the similarities (and differences) between Ichthyosaurs and Dolphins can tell us about different types of Data Architectures.
 
 
Maths & Science
 
Euler's Number
 
March
Euler’s Number
A long and winding road with the destination being what is probably the most important number in Mathematics.
 The Irrational Ratio  
August
The Irrational Ratio
The number π is surrounded by a fog of misunderstanding and even mysticism. This article seeks to address some common misconceptions about π, to show that in many ways it is just like any other number, but also to demonstrate some of its less common properties.
 
Emmy Noether
 
October
Glimpses of Symmetry, Chapter 24 – Emmy
One of the more recent chapters in my forthcoming book on Group Theory and Particle Physics. This focuses on the seminal contributions of Mathematician Emmy Noether to the fundamentals of Physics and the connection between Symmetry and Conservation Laws.

 
Notes

 
[1]
 

The 2018 Top Ten by Hits
1. The Irrational Ratio
2. A Brief History of Databases
3. Euler’s Number
4. The Data and Analytics Dictionary
5. The Equation
6. A Brief Taxonomy of Numbers
7. When I’m 65
8. How to Spot a Flawed Data Strategy
9. Building Momentum – How to begin becoming a Data-driven Organisation
10. The Anatomy of a Data Function – Part I

 
peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 
 

The Chief Marketing Officer and the CDO – A Modern Fable

The Fox and the Grapes

This Fox has a longing for grapes:
He jumps, but the bunch still escapes.
So he goes away sour;
And, ’tis said, to this hour
Declares that he’s no taste for grapes.

— W.J.Linton (after Aesop)

Note:

Not all of the organisations I have worked with or for have had a C-level Executive accountable primarily for Marketing. Where they have, I have normally found the people holding these roles to be better informed about data matters than their peers. I have always found it easy and enjoyable to collaborate with such people. The same goes in general for Marketing Managers. This article is not about Marketing professionals, it is about poorly researched journalism.


 
Prelude…

The Decline and Fall of the CDO Empire?

I recently came across an article in Marketing Week with the clickbait-worthy headline of Why the rise of the chief data officer will be short-lived (their choice of capitalisation). The subhead continues in the same vein:

Chief data officers (ditto) are becoming increasingly common, but for a data strategy to work their appointments can only ever be a temporary fix.

Intrigued, I felt I had to avail myself of the wisdom and domain expertise contained in the article (the clickbait worked of course). The first few paragraphs reveal the actual motivation. The piece is a reaction [1] to the most senior Marketing person at easyJet being moved out of his role, which is being abolished, and – as part of the same reorganisation – a Chief Data Officer (CDO) being appointed. Now the first thing to say, based on the article’s introductory comments, is that easyJet did not have a Chief Marketing Officer. The role that was abolished was instead Chief Commercial Officer, so there was no one charged full-time with Marketing anyway. The Marketing responsibilities previously supported part-time by the CCO have now been spread among other executives.

The next part of the article covers the views of a Marketing Week columnist (pause for irony) before moving on to arrangements for the management of data matters in three UK-based organisations:

  • Camelot – who run the UK National Lottery
     
  • Mumsnet – which is a web-site for UK parents
     
  • Flubit – a growing on-line marketplace aiming to compete with Amazon

The first two of these have CDOs (albeit with one doing the role alongside other responsibilities). Both of these people:

[…] come at data as people with backgrounds in its use in marketing

Flubit does not have a CDO, which is used as supporting evidence for the superfluous nature of the role [2].

Suffice it to say that a straw poll consisting of the handful of organisations that the journalist was able to get a comment from is not the most robust of approaches [3]. Most of the time, the article does nothing more than to reflect the continuing confusion about whether or not organisations need CDOs and – assuming that they do – what their remit should be and who they should report to [4].

But then, without it has to be said much supporting evidence, the piece goes on to add that:

Most [CDOs – they would probably style it “Cdos”] are brought in to instill a data strategy across the business; once that is done their role should no longer be needed.

Symmetry

Now as a Group Theoretician, I am a great fan of symmetry. Symmetry relates to properties that remain invariant when something else is changed. Archetypally, an equilateral triangle is still an equilateral triangle when rotated by 120° [5]. More concretely, the laws of motion work just fine if we wind the clock forward 10 seconds (which incidentally leads to the principle of conservation of energy [6]).

Let’s assume that the Marketing Week assertion is true. I claim therefore that it must be still be true under the symmetry of changing the C-level role. This would mean that the following also has to be true:

Most [Chief marketing officers] are brought in to instill a marketing strategy across the business; once that is done their role should no longer be needed.

Now maybe this statement is indeed true. However, I can’t really see the guys and gals at Marketing Week agreeing with this. So maybe it’s false instead. Then – employing reductio ad absurdum – the initial statement is also false [7].

If you don’t work in Marketing, then maybe a further transformation will convince you:

Most [Chief financial officers] are brought in to instill a finance strategy across the business; once that is done their role should no longer be needed.

I could go on, but this is already becoming as tedious to write as it was to read the original Marketing Week claim. The closing sentence of the article is probably its most revealing and informative:

[…] marketers must make sure they are leading [the data] agenda, or someone else will do it for them.

I will leave readers to draw their own conclusions on the merits of this piece and move on to other thoughts that reading it spurred in me.


 
…and Fugue

Electrification

Sometimes buried in the strangest of places you can find something of value, even if the value is different to the intentions of the person who buried it. Around some of the CDO forums that I attend [8] there is occasionally talk about just the type of issue that Marketing Week raises. An historical role often comes up in these discussions is that of Chief Electrification Officer [9]. This supposedly was an Executive role in organisations as the 19th Century turned into the 20th and electricity grids began to be created. The person ostensibly filling this role would be responsible for shepherding the organisation’s transition from earlier forms of power (e.g. steam) to the new-fangled streams of electrons. Of course this role would be very important until the transition was completed, after that redundancy surely beckoned.

Well to my way of thinking, there are a couple of problems here. The first one of these is alluded to by my choice of the words “supposedly” and “ostensibly” above. I am not entirely sure, based on my initial research [10], that this role ever actually existed. All the references I can find to it are modern pieces comparing it to the CDO role, so perhaps it is apochryphal.

The second is somewhat related. Electrification was an engineering problem, indeed it the [US] National Academy of Engineering called it “the greatest engineering achievement of the 20th Century”. Surely the people tackling this would be engineers, potentially led by a Chief Engineer. Did the completion of electrification mean that there was no longer a need for engineers, or did they simply move on to the next engineering problem [11]?

Extending this analogy, I think that Chief Data Officers are more like Chief Engineers than Chief Electrification Officers, assuming that the latter even exists. Why the confusion? Well I think part of it is because, over the last decade and a bit, organisations have been conditioned to believe the one dimensional perspective that everything is a programme or a project [12]. I am less sure that this applies 100% to the CDO role.

It may well be that one thing that a CDO needs to get going is a data transformation programme. This may purely be focused on cultural aspects of how an organisation records, shares and otherwise uses data. It may be to build a new (or a first) Data Architecture. It may be to remediate issues with an existing Data Architecture. It may be to introduce or expand Data Governance. It may be to improve Data Quality. Or (and, in my experience, this is often the most likely) a combination of all these five, plus other work, such as rapid tactical or interim deliveries. However, there is also a large element of data-centric work which is not project-based and instead falls into the category often described as “business as usual” (I loathe this term – I think that Data Operations & Technology is preferable). A handful of examples are as follows (this is not meant to be an exhaustive list) [13]:

  1. Addressing architectural debt that results from neglect of a Data Assets or the frequently deleterious impact of improperly governed change portfolios [14]. This is often a series of small to medium-sized changes, rather than a project with a discrete scope and start and end dates.
     
  2. More positively, engaging proactively in the change process in an attempt to act as a steward of Data Assets.
     
  3. Establishing a regular Data Audit.
     
  4. Regular Data Management activities.
     
  5. Providing tailored Analytics to help understand some unscheduled or unexpected event.
     
  6. Establishment of a data “SWAT team” to respond to urgent architecture, quality or reporting needs.
     
  7. Running a Data Governance committee and related activities.
     
  8. Creating and managing a Data Science capability.
     
  9. Providing help and advice to those struggling to use Data facilities.
     
  10. Responding to new Data regulations.
     
  11. Creating and maintaining a target operating model for Data and is use.
     
  12. Supporting Data Services to aid systems integration.
     
  13. Production of regular reports and refreshing self-serve Data Repositories.
     
  14. Testing and re-testing of Data facilities subject to change or change in source Data.
     
  15. Providing training in the use of Data facilities or the importance of getting Data right-first-time.

The above all point to the need for an ongoing Data Function to meet these needs (and to form the core resources of any data programme / project work). I describe such a function in my series about The Anatomy of a Data Function.

Data Strategy

There are of course many other such examples, but instead of cataloguing each of them, let’s return to what Marketing Week describe as the central responsibility of a CDO, to formulate a Data Strategy. Surely this is a one-off activity, right?

Well is the Marketing strategy set once and then never changed? If there is some material shift in the overall Business strategy, might the Marketing strategy change as a result? What would be the impact on an existing Marketing strategy of insight showing that this was being less than effective; might this lead to the development of a new Marketing strategy? Would the Marketing strategy need to be revised to cater for new products and services, or new segments and territories? What would be the impact on the Marketing strategy of an acquisition or divestment?

As anyone who has spent significant time in the strategy arena will tell you, it is a fluid area. Things are never set in stone and strategies may need to be significantly revised or indeed abandoned and replaced with something entirely new as dictated by events. Strategy is not a fire and forget exercise, not if you want it to be relevant to your business today, as opposed to a year ago. Specifically with Data Strategy (as I explain in Building Momentum – How to begin becoming a Data-driven Organisation), I would recommend keeping it rather broad brush at the begining of its development, allowing it to be adpated based on feedback from initial interim work and thus ensuring it better meets business needs.

So expecting that a Data Strategy (or any other type of strategy) to be done and dusted, with the key strategist dispensed with, is probably rather naive.


 
Coda

Coda

It would be really nice to think that sorting out their Data problems and seizing their Data opportunities are things that organisations can do once and then forget about. With twenty years experience of helping organisations to become more Data-centric, often with technical matters firmly in the background, I have to disabuse people of this all too frequent misconception. To adapt the National Canine Defence League’s [15 long-lived slogan from 1978:

A Chief Data Officer is for life, not just for Christmas.

With that out of the way, I’m off to write a well-informed and insightful article about how Marketing Departments should go about their business. Wish me luck!
 


 
Notes

 
[1]
 
I first wrote “knee-jerk reaction” and then thought that maybe I was being unkind. “When they go low, we go high” is a better maxim. Note: link opens a YouTube video.
 
[2]
 
I am sure that I read somewhere about the importance of the number of data points in any analysis, maybe I should ask a Data Scientist to remind me about this.
 
[3]
 
For a more balanced view of what real CDOs do, please take a look at my ongoing series of in-depth interviews.
 
[4]
 
As discussed in:

 
[5]
 
See Glimpses of Symmetry, Chapter 3 – Shifting Shapes for more on the properties of equilateral triangles.
 
[6]
 
As demonstrated by Emmy Noether in 1915.
 
[7]
 
At this point I think I am meant to say “Fake news! SAD!!!”
 
[8]
 
The [informal] proceedings of some of these may be viewed at:

 
[9]
 
Or Chief Electrical Officer, or Chief Electricity Officer.
 
[10]
 
I am doing some more digging and will of course update this piece should I find the evidence that has so far been elusive.
 
[11]
 
Self-driving electric cars come to mind of course. That or running a Starship.

Scotty

 
[12]
 
As an aside, where do Programme Managers go when (or should that be if) their Programmes finish?
 
[13]
 
It might be argued that some of these operational functions could be handed to IT. However, given that some elements of data functions have probably been carved out of IT in the past, this might be a retrograde step.
 
[14]
 
See Bumps in the Road.
 
[15]
 
Now Dogs Trust.

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

More Definitions in the Data and Analytics Dictionary

The Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary is an active document and I will continue to issue revised versions of it periodically. Here are 20 new definitions, including the first from other contributors (thanks Tenny!):

  1. Artificial Intelligence Platform
  2. Data Asset
  3. Data Audit
  4. Data Classification
  5. Data Consistency
  6. Data Controls
  7. Data Curation (contributor: Tenny Thomas Soman)
  8. Data Democratisation
  9. Data Dictionary
  10. Data Engineering
  11. Data Ethics
  12. Data Integrity
  13. Data Lineage
  14. Data Platform
  15. Data Strategy
  16. Data Wrangling (contributor: Tenny Thomas Soman)
  17. Explainable AI (contributor: Tenny Thomas Soman)
  18. Information Governance
  19. Referential Integrity
  20. Testing Data (Training Data)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

People are now also welcome to contribute their own definitions. You can use the comments section here, or the dedicated form. Submissions will be subject to editorial review and are not guaranteed to be accepted.
 


 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Data Science Challenges – It’s Deja Vu all over again!

The late Yogi Berra

The rather famous tautology, “It’s déjà vu all over again”, has of course been ascribed to that darling of malapropisms, baseball catcher Yogi Berra [1]. The phrase came to mind for me today when coming across the following exhibit:

Business Over Broadway - Kaggle Survey (Click to view a larger version in a new window)

© Business Over Broadway (2018). Based on Kaggle’s State of Data Science Survey 2017 (Sample size: 10,153).

The text in the above exhibit is not that clear [2], so here are the 20 top challenges [3] faced by those running Data Science teams in human-readable form:

# Challenge Cited by
1 Dirty Data 35.9%
2 Lack of Data Science talent in the organization 30.2%
3 Company politics / Lack of management/financial support for a Data Science team 27.0%
4 The lack of a clear question to be answering or a clear direction to go with available data 22.1%
5 Unavailability of/difficult access to data 22.0%
6 Data Science results not used by business decision makers 17.7%
7 Explaining Data Science to others 16.0%
8 Privacy issues 14.4%
9 Lack of significant domain expert input 14.2%
10 Organization is small and cannot afford a Data Science team 13.0%
11 Team using multiple ad hoc development environments such as Python/R/Java etc. 12.7%
12 Limitations of tools 12.0%
13 Need to coordinate with IT 11.8%
14 Maintaining responsible expectations about the potential impact of Data Science projects 11.5%
15 Inability to integrate findings into organization’s decision-making process 9.8%
16 Lack of funds to buy useful datasets from external sources 9.6%
17 Difficulties in deployment/scoring 8.6%
18 Scaling Data Science solution up to full database 8.4%
19 Limitations in the state of the art in machine learning 7.7%
20 Did not instrument data useful for scientific analysis and decision-making 6.5%
21 I prefer not to say 4.8%
22 Other 2.9%

The table above is a transcription of a transcription, so it would be remarkable if no Data Quality issues had crept in, however let’s assume that the figures are robust enough for our purposes. Of course the people surveyed will have reported multiple issues, so the percentages above are not additive. Nevertheless there are some very obvious comments to be made (some of the above items are pertinent to more than one of the points I would like to make):

  • Data Quality / Availability remain major issues(1, 5, and 8)

    It is indeed true that Machine Learning can be quite good at dealing with some types or bad or missing data. But no technology or approach is going to be able to paper over all of the cracks if your data is essentially incomplete and of poor quality. This point (together with some others below) speaks to the need to not approach Data Science on a stand-alone basis, but as part of a more holistic approach to data matters [4].
     

  • The Human angle and a focus on Culture are imperative(3, 6, 7, 14 and 15)

    Findings are one thing; using these to take action is quite another. At the end of the day, most ventures are successful or fail because of people; the people conducting the venture, the people receiving its intended benefits and so on. Ignore this dimension of data work (or any type of work) at your peril [5].
     

  • Business Questions amd Business Involvement matter(4, 6, 9 and 15)

    While in some circumstances the data can indeed “speak for itself”, it makes a lot more sense for Data Scientists to partner with business colleagues to both get direction and to help ensure that their findings lead to action [6].
     

  • Tools & Technology typically Trumped(11, 12 and 18)

    These first appear outside of the Top 10 (and 11 is a bit dubious to include here – it relates more to a proliferation of tools than to issues with any of them). I would never say that tools and technology are unimportant, but they are typically much less important than other considerations [7].

The overriding point is of course that – much as I noted out recently in Convergent Evolution – there is little new under the Sun. A survey of Business Intelligence / Data Warehousing professionals back in 2010 would have generated something very like the list above. A survey of EIS [8] professionals back in 2000 would have done the same.

The important things to do – regardless of the technologies and approaches employed – are to:

  1. Understand what questions are key to the running of an organisation [9]
     
  2. Determine what data is available to support decisions in these key areas
     
  3. Ensure that the data is in a “good enough” state, appropriately consolidated / made consistent, augmented / corrected by any useful external data and made available to the right people in a timely manner
     
  4. Focus on the human aspects of acting on what data is telling us and how to use data outputs to drive positive actions

Here too, little is new under the Sun. I have been referring to essentially these same four pillars of good practice since the mid 2000s. Some of our technological advances since then have been amazing. The prospect of leveraging the power of both Data Science and Artificial Intelligence in a business context is very exciting. But to truly succeed with these newer approaches, it helps to recall the eternal verities that have always underpinned good data-centric work [10]. The survey above makes this point crystal clear.

A final corollary to this observation is something I covered in A truth universally acknowledged…. The replies to the Kaggle survey highlight the fact that, much like the conductor of an orchestra does not need to be able to play the violin to a virtuoso level, people leading Data Science teams (and broader Data Functions) need a set of rounded skills, ones honed to address the types of issues appearing in the exhibits above. The skill-set that makes for an excellent Data Scientist does not necessarily help so much with many of the less technical issues that will determine the success or failure of Data Science teams.
 


 
Notes

 
[1]
 
Other Yogi-isms included, “Always go to other people’s funerals; otherwise they won’t go to yours”, “You can observe a lot by watching” and “If you can’t imitate him, don’t copy him”.
 
[2]
 
A Data Visualisation challenge to include that much text I realise. I think I might have been tempted to come up with pithier categories to aid legibility.
 
[3]
 
Ignoring “I prefer not to say” and “Other”.
 
[4]
 
As laid out in my many articles about the importance of Cultural Transformation.
 
[5]
 
See: Building Momentum – How to begin becoming a Data-driven Organisation.
 
[6]
 
I make precisely this point in my recent interview for Venturi Voice (starting just after 31:38).
 
[7]
 
I make this point most forcibly back in: A bad workman blames his [Business Intelligence] tools. The technology may be different, but the learnings are just as relevant today.
 
[8]
 
Executive Information Systems for those of tender years.
 
[9]
 
Machine learning techniques can clearly help here, but only if in concert with dialogue with people actually on the front-line and leading business areas.
 
[10]
 
In your search for such eternal verities, you could do much worse than starting with: 20 Risks that Beset Data Programmes.

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Latest Interviews / Podcasts

Interviews and Podcasts

The interviews that I conduct with leaders in their fields as part of my “In-depth” series have hopefully brought a new and interesting aspect to this site. However, often the boot is on the other foot and I am the person being interviewed about my experience and expertise in the data field and related matters [1]. Maybe interviewing other people helps me when I am in turn interviewed, maybe it’s the other way round. Whatever the case, I enjoyed recording the two conversations appearing below (thanks to the interviewers in both cases) and hope that the content is of interest to readers.

In both instances a link to the site originally publishing the interview is followed by a locally hosted version of the audio track and then a download option. I’d encourage readers to explore the other excellent interviews contained on both sites.



 
Enterprise Management 360° Podcast – 31st July 2018


 



 
Venturi Voice 3650° Podcast – 22nd April 2018


 

Downloadable link: Conducting a Data Orchestra

 
If you would like to interview me for your site or periodical, of if you are just interested in further exploring some of the themes I discuss in these two interviews, then please feel free to get in contact.
 


 
Notes

 
[1]
 
A list of other video interviews and podcasts I have taken part in can be viewed in the Media section of this site.

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases