Data Visualisation according to a Four-year-old

Solar System

When I recently published the latest edition of The Data & Analytics Dictionary, I included an entry on Charts which briefly covered a number of the most frequently used ones. Given that entries in the Dictionary are relatively brief [1] and that its layout allows little room for illustrations, I decided to write an expanded version as an article. This will be published in the next couple of weeks.

One of the exhibits that I developed for this charts article was to illustrate the use of Bubble Charts. Given my childhood interest in Astronomy, I came up with the following – somewhat whimsical – exhibit:

Bubble Planets

Bubble Charts are used to plot three dimensions of data on a two dimensional graph. Here the horizontal axis is how far each of the gas and ice giants is from the Sun [2], the vertical axis is how many satellites each planet has [3] and the final dimension – indicated by the size of the “bubbles” – is the actual size of each planet [4].

Anyway, I thought it was a prettier illustration of the utility of Bubble Charts that the typical market size analysis they are often used to display.

However, while I was doing this, my older daughter wandered into my office and said “look at the picture I drew for you Daddy” [5]. Coincidentally my muse had been her muse and the result is the Data Visualisation appearing at the top of this article. Equally coincidentally, my daughter had also encoded three dimensions of data in her drawing:

  1. Rank of distance from the Sun
  2. Colour / appearance
  3. Number of satellites [6]

She also started off trying to capture relative size. After a great start with Mercury, Venus and Earth, she then ran into some Data Quality issues with the later planets (she is only four).

Here is an annotated version:

Solar System (annotated)

I think I’m at least OK at Data Visualisation, but my daughter’s drawing rather knocked mine into a cocked hat [7]. And she included a comet, which makes any Data Visualisation better in my humble opinion; what Chart would not benefit from the inclusion of a comet?
 


Notes

 
[1]
 
For me at least that is.
 
[2]
 
Actually the measurement is the closest that each planet comes to the Sun, its perihelion.
 
[3]
 
This may seem a somewhat arbitrary thing to plot, but a) the exhibit is meant to be illustrative only and b) there does nevertheless seem to be a correlation of sorts; I’m sure there is some Physical reason for this, which I’ll have to look into sometime.
 
[4]
 
Bubble Charts typically offer the option to scale bubbles such that either their radius / diameter or their area is in proportion to the value to be displayed. I chose the equatorial radius as my metric.
 
[5]
 
It has to be said that this is not an atypical occurence.
 
[6]
 
For at least the four rocky planets, it might have taken a while to draw all 79 of Jupiter’s moons.
 
[7]
 
I often check my prose for phrases that may be part of British idiom but not used elsewhere. In doing this, I learnt today that “knock into a cocked hat” was originally an American phrase; it is first found in the 1830s.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

The latest edition of The Data & Analytics Dictionary is now out

The Data and Analytics Dictionary

After a hiatus of a few months, the latest version of the peterjamesthomas.com Data and Analytics Dictionary is now available. It includes 30 new definitions, some of which have been contributed by people like Tenny Thomas Soman, George Firican, Scott Taylor and and Taru Väre. Thanks to all of these for their help.

  1. Analysis
  2. Application Programming Interface (API)
  3. Business Glossary (contributor: Tenny Thomas Soman)
  4. Chart (Graph)
  5. Data Architecture – Definition (2)
  6. Data Catalogue
  7. Data Community
  8. Data Domain (contributor: Taru Väre)
  9. Data Enrichment
  10. Data Federation
  11. Data Function
  12. Data Model
  13. Data Operating Model
  14. Data Scrubbing
  15. Data Service
  16. Data Sourcing
  17. Decision Model
  18. Embedded BI / Analytics
  19. Genetic Algorithm
  20. Geospatial Data
  21. Infographic
  22. Insight
  23. Management Information (MI)
  24. Master Data – additional definition (contributor: Scott Taylor)
  25. Optimisation
  26. Reference Data (contributor: George Firican)
  27. Report
  28. Robotic Process Automation
  29. Statistics
  30. Self-service (BI or Analytics)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

If you would like to contribute a definition, which will of course be acknowledged, you can use the comments section here, or the dedicated form, we look forward to hearing from you [1].

If you have found The Data & Analytics Dictionary helpful, we would love to learn more about this. Please post something in the comments section or contact us and we may even look to feature you in a future article.

The Data & Analytics Dictionary will continue to be expanded in coming months.
 


Notes

 
[1]
 
Please note that any submissions will be subject to editorial review and are not guaranteed to be accepted.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

A Retrospective of 2018’s Articles

A Review of 2018

This is the second year in which I have produced a retrospective of my blogging activity. As in 2017, I have failed miserably in my original objective of posting this early in January. Despite starting to write this piece on 18th December 2018, I have somehow sneaked into the second quarter before getting round to completing it. Maybe I will do better with 2019’s highlights!

Anyway, 2018 was a record-breaking year for peterjamesthomas.com. The site saw more traffic than in any other year since its inception; indeed hits were over a third higher than in any previous year. This increase was driven in part by the launch of my new Maths & Science section, articles from which claimed no fewer than 6 slots in the 2018 top 10 articles, when measured by hits [1]. Overall the total number of articles and new pages I published exceeded 2017’s figures to claim the second spot behind 2009; our first year in business.

As with every year, some of my work was viewed by tens of thousands of people, while other pieces received less attention. This is my selection of the articles that I enjoyed writing most, which does not always overlap with the most popular ones. Given the advent of the Maths & Science section, there are now seven categories into which I have split articles. These are as follows:

  1. General Data Articles
  2. Data Visualisation
  3. Statistics & Data Science
  4. CDO perspectives
  5. Programme Advice
  6. Analytics & Big Data
  7. Maths & Science

In each category, I will pick out one or two pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.

 
 
General Data Articles
 
A Brief History of Databases
 
February
A Brief History of Databases
An infographic spanning the history of Database technology from its early days in the 1960s to the landscape in the late 2010s..
 
Data Strategy Alarm Bell
 
July
How to Spot a Flawed Data Strategy
What alarm bells might alert you to problems with your Data Strategy; based on the author’s extensive experience of both developing Data Strategies and vetting existing ones.
 
Just the facts...
 
August
Fact-based Decision-making
Fact-based decision-making sounds like a no brainer, but just how hard is it to generate accurate facts?
 
 
Data Visualisation
 
Comparative Pie Charts
 
August
As Nice as Pie
A review of the humble Pie Chart, what it is good at, where it presents problems and some alternatives.
 
 
Statistics & Data Science
 
Data Science Challenges – It’s Deja Vu all over again!
 
August
Data Science Challenges – It’s Deja Vu all over again!
A survey of more than 10,000 Data Scientists highlights a set of problems that will seem very, very familiar to anyone working in the data space for a few years.
 
 
CDO Perspectives
 
The CDO Dilemma
 
February
The CDO – A Dilemma or The Next Big Thing?
Two Forbes articles argue different perspectives about the role of Chief Data Officer. The first (by Lauren deLisa Coleman) stresses its importance, the second (by Randy Bean) highlights some of the challenges that CDOs face.
 
2018 CDO Interviews
 
May onwards
The “In-depth” series of CDO interviews
Rather than a single article, this is a series of four talks with prominent CDOs, reflecting on the role and its challenges.
 
The Chief Marketing Officer and the CDO – A Modern Fable
 
October
The Chief Marketing Officer and the CDO – A Modern Fable
Discussing an alt-facts / “fake” news perspective on the Chief Data Officer role.
 
 
Programme Advice
 
Building Momentum
 
June
Building Momentum – How to begin becoming a Data-driven Organisation
Many companies want to become data driven, but getting started on the journey towards this goal can be tough. This article offers a framework for building momentum in the early stages of a Data Programme.
 
 
Analytics & Big Data
 
Enterprise Data Marketplace
 
January
Draining the Swamp
A review of some of the problems that can beset Data Lakes, together with some ideas about what to do to fix these from Dan Woods (Forbes), Paul Barth (Podium Data) and Dave Wells (Eckerson Group).
 
Sic Transit Gloria Mundi
 
February
Sic Transit Gloria Magnorum Datorum
In a world where the word has developed a very negative connotation, what’s so bad about being traditional?
 
Convergent Evolution of Data Architectures
 
August
Convergent Evolution
What the similarities (and differences) between Ichthyosaurs and Dolphins can tell us about different types of Data Architectures.
 
 
Maths & Science
 
Euler's Number
 
March
Euler’s Number
A long and winding road with the destination being what is probably the most important number in Mathematics.
 The Irrational Ratio  
August
The Irrational Ratio
The number π is surrounded by a fog of misunderstanding and even mysticism. This article seeks to address some common misconceptions about π, to show that in many ways it is just like any other number, but also to demonstrate some of its less common properties.
 
Emmy Noether
 
October
Glimpses of Symmetry, Chapter 24 – Emmy
One of the more recent chapters in my forthcoming book on Group Theory and Particle Physics. This focuses on the seminal contributions of Mathematician Emmy Noether to the fundamentals of Physics and the connection between Symmetry and Conservation Laws.

 
Notes

 
[1]
 

The 2018 Top Ten by Hits
1. The Irrational Ratio
2. A Brief History of Databases
3. Euler’s Number
4. The Data and Analytics Dictionary
5. The Equation
6. A Brief Taxonomy of Numbers
7. When I’m 65
8. How to Spot a Flawed Data Strategy
9. Building Momentum – How to begin becoming a Data-driven Organisation
10. The Anatomy of a Data Function – Part I

 
peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 
 

More Definitions in the Data and Analytics Dictionary

The Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary is an active document and I will continue to issue revised versions of it periodically. Here are 20 new definitions, including the first from other contributors (thanks Tenny!):

  1. Artificial Intelligence Platform
  2. Data Asset
  3. Data Audit
  4. Data Classification
  5. Data Consistency
  6. Data Controls
  7. Data Curation (contributor: Tenny Thomas Soman)
  8. Data Democratisation
  9. Data Dictionary
  10. Data Engineering
  11. Data Ethics
  12. Data Integrity
  13. Data Lineage
  14. Data Platform
  15. Data Strategy
  16. Data Wrangling (contributor: Tenny Thomas Soman)
  17. Explainable AI (contributor: Tenny Thomas Soman)
  18. Information Governance
  19. Referential Integrity
  20. Testing Data (Training Data)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

People are now also welcome to contribute their own definitions. You can use the comments section here, or the dedicated form. Submissions will be subject to editorial review and are not guaranteed to be accepted.
 


 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

As Nice as Pie

If you can't get your graphing tool to do the shading, just add some clip art of cosmologists discussing the unusual curvature of space in the area.

© Randall Munroe of xkcd.com – Image adjusted to fit dimensions of this page

Work by the inimitable Randall Munroe, author of long-running web-comic, xkcd.com, has been featured (with permission) multiple times on these pages [1]. The above image got me thinking that I had not penned a data visualisation article since the series starting with Hurricanes and Data Visualisation: Part I – Rainbow’s Gravity nearly a year ago. Randall’s perspective led me to consider that staple of PowerPoint presentations, the humble and much-maligned Pie Chart.


 
While the history is not certain, most authorities credit the pioneer of graphical statistics, William Playfair, with creating this icon, which appeared in his Statistical Breviary, first published in 1801 [2]. Later Florence Nightingale (a statistician in case you were unaware) popularised Pie Charts. Indeed a Pie Chart variant (called a Polar Chart) that Nightingale compiled appears at the beginning of my article Data Visualisation – A Scientific Treatment.

I can’t imagine any reader has managed to avoid seeing a Pie Chart before reading this article. But, just in case, here is one (Since writing Rainbow’s Gravity – see above for a link – I have tried to avoid a rainbow palette in visualisations, hence the monochromatic exhibit):

Basic Pie Chart

The above image is a representation of the following dataset:

 
Label Count
A 4,500
B 3,000
C 3,000
D 3,000
E 4,500
Total 18,000
 

The Pie Chart consists of a circle divided in to five sectors, each is labelled A through E. The basic idea is of course that the amount of the circle taken up by each sector is proportional to the count of items associated with each category, A through E. What is meant by the innocent “amount of the circle” here? The easiest way to look at this is that going all the way round a circle consumes 360°. If we consider our data set, the total count is 18,000, which will equate to 360°. The count for A is 4,500 and we need to consider what fraction of 18,000 this represents and then apply this to 360°:

\dfrac{4,500}{18,000}\times 360^o=\dfrac{1}{4}\times 360^o=90^o

So A must take up 90°, or equivalently one quarter of the total circle. Similarly for B:

\dfrac{3,000}{18,000}\times 360^o=\dfrac{1}{6}\times 360^o=60^o

Or one sixth of the circle.

If we take this approach then – of course – the sum of all of the sectors must equal the whole circle and neither more nor less than this (pace Randall). In our example:

 
Label Degrees
A 90°
B 60°
C 60°
D 60°
E 90°
Total 360°
 

So far, so simple. Now let’s consider a second data-set as follows:

 
Label Count
A 9,480,301
B 6,320,201
C 6,320,200
D 6,320,201
E 9,480,301
Total 37,921,204
 

What does its Pie Chart look like? Well it’s actually rather familiar, it looks like this:

Basic Pie Chart

This observation stresses something important about Pie Charts. They show how a number of categories contribute to a whole figure, but they only show relative figures (percentages of the whole if you like) and not the absolute figures. The totals in our two data-sets differ by a factor of over 2,100 times, but their Pie Charts are identical. We will come back to this point again later on.


 
Pie Charts have somewhat fallen into disrepute over the years. Some of this is to do with their ubiquity, but there is also at least one more substantial criticism. This is that the human eye is bad at comparing angles, particularly if they are not aligned to some reference point, e.g. a vertical. To see this consider the two Pie Charts below (please note that these represent a different data set from above – for starters, there are only four categories plotted as opposed to five earlier on):

Comparative Pie Charts

The details of the underlying numbers don’t actually matter that much, but let’s say that the left-hand Pie Chart represents annual sales in 2016, broken down by four product lines. The right-hand chart has the same breakdown, but for 2017. This provides some context to our discussions.

Suppose what is of interest is how the sales for each product line in the 2016 chart compare to their counterparts in the right-hand one; e.g. A and A’, B and B’ and so on. Well for the As, we have the helpful fact that they both start from a vertical line and then swing down and round, initially rightwards. This can be used to gauge that A’ is a bit bigger than A. What about B and B’? Well they start in different places and end in different places, looking carefully, we can see that B’ is bigger than B. C and C’ are pretty easy, C is a lot bigger. Then we come to D and D’, I find this one a bit tricky, but we can eventually hazard a guess that they are pretty much the same.

So we can compare Pie Charts and talk about how sales change between two years, what’s the problem? The issue is that it takes some time and effort to reach even these basic conclusions. How about instead of working out which is bigger, A or A’, I ask the reader to guess by what percentage A’ is bigger. This is not trivial to do based on just the charts.

If we really want to look at year-on-year growth, we would prefer that the answer leaps off the page; after all, isn’t that the whole point of visualisations rather than tables of numbers? What if we focus on just the right-hand diagram? Can you say with certainty which is bigger, A or C, B or D? You can work to an answer, but it takes longer than should really be the case for a graphical exhibit.

Aside:

There is a further point to be made here and it relates to what we said Pie Charts show earlier in this piece. What we have in our two Pie Charts above is the make-up of a whole number (in the example we have been working through, this is total annual sales) by categories (product lines). These are percentages and what we have been doing above is to compare the fact that A made up 30% of the total sales in 2016 and 33% in 2017. What we cannot say based on just the above exhibits is how actual sales changed. The total sales may have gone up or down, the Pie Chat does not tell us this, it just deals in how the make-up of total sales has shifted.

Some people try to address this shortcoming, which can result in exhibits such as:

Comparative Pie Charts - with Growth

Here some attempt has been made to show the growth in the absolute value of sales year on year. The left-hand Pie Chart is smaller and so we assume that annual sales have increased between 2016 and 2017. The most logical thing to do would be to have the change in total area of the two Pie Charts to be in proportion to the change in sales between the two years (in this case – based on the underlying data – 2017 sales are 69% bigger than 2016 sales). However, such an approach, while adding information, makes the task of comparing sectors from year to year even harder.


 
The general argument is that Nested Bar Charts are better for the type of scenario I have presented and the types of questions I asked above. Looking at the same annual sales data this way we could generate the following graph:

Comparative Bar Charts

Aside:

While Bar Charts are often used to show absolute values, what we have above is the same “percentage of the whole” data that was shown in the Pie Charts. We have already covered the relative / absolute issue inherent in Pie Charts, from now on, each new chart will be like a Pie Chart inasmuch as it will contain relative (percentage of the whole) data, not absolute. Indeed you could think about generating the bar graph above by moving the Pie Chart sectors around and squishing them into new shapes, while preserving their area.

The Bar Chart makes the yearly comparisons a breeze and it is also pretty easy to take a stab at percentage differences. For example B’ looks about a fifth bigger than B (it’s actually 17.5% bigger) [3]. However, what I think gets lost here is a sense of the make-up of the elements of the two sets. We can see that A is the biggest value in the first year and A’ in the second, but it is harder to gauge what percentage of the overall both A and A’ represent.

To do this better, we could move to a Stacked Bar Chart as follows (again with the same sales data):

Stacked Bar Chart

Aside:

Once more, we are dealing with how proportions have changed – to put it simply the height of both “skyscrapers” is the same. If we instead shifted to absolute values, then our exhibit might look more like:

Stacked Bar Chart (Absolute Values)

The observant reader will note that I have also added dashed lines linking the same category for each year. These help to show growth. Regardless of what angle to the horizontal the lower line for a category makes, if it and the upper category line diverge (as for B and B’), then the category is growing; if they converge (as for C and C’), the category is shrinking [4]. Parallel lines indicate a steady state. Using this approach, we can get a better sense of the relative size of categories in the two years.


 
However, here – despite the dashed lines – we lose at least some of of the year-on-year comparative power of the Nested Bar Chart above. In turn the Nested Bar Chart loses some of the attributes of the original Pie Chart. In truth, there is no single chart which fits all purposes. Trying to find one is analogous to trying to find a planar projection of a sphere that preserves angles, distances and areas [5].

Rather than finding the Philosopher’s Stone [6] of an all-purpose chart, the challenge for those engaged in data visualisation is to anticipate the central purpose of an exhibit and to choose a chart type that best resonates with this. Sometimes, the Pie Chart can be just what is required, as I found myself in my article, A Tale of Two [Brexit] Data Visualisations, which closed with the following image:

Brexit Flag
UK Referendum on EU Membership – Number voting by age bracket (see caveats in original article)

Or, to put it another way:

You may very well be well bred
Chart aesthetics filling your head
But there’s always some special case, time or place
To replace perfect taste

For instance…

Never cry ’bout a Chart of Pie
You can still do fine with a Chart of Pie
People may well laugh at this humble graph
But it can be just the thing you need to help the staff

Never cry ’bout a Chart of Pie
Though without due care things can go awry
Bars are fine, Columns shine
Lines are ace, Radars race
Boxes fly, but never cry about a Chart of Pie

With apologies to the Disney Corporation!


 
Addendum:

It was pointed out to me by Adam Carless that I had omitted the following thing of beauty from my Pie Chart menagerie. How could I have forgotten?

3D Pie Chart

It is claimed that some Theoretical Physicists (and most Higher Dimensional Geometers) can visualise in four dimensions. Perhaps this facility would be of some use in discerning meaning from the above exhibit.
 


 
Notes

 
[1]
 
Including:

 
[2]
 
Playfair also most likely was the first to introduce line, area and bar charts.
 
[3]
 
Recall again we are comparing percentages, so 50% is 25% bigger than 40%.
 
[4]
 
This assertion would not hold for absolute values, or rather parallel lines would indicate that the absolute value of sales (not the relative one) had stayed constant across the two years.
 
[5]
 
A little-known Mathematician, going by the name of Gauss, had something to say about this back in 1828 – Disquisitiones generales circa superficies curvas. I hope you read Latin.
 
[6]
 
The Philosopher's Stone

No, not that one!.

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Convergent Evolution

Ichthyosaur and Dolphin

No this article has not escaped from my Maths & Science section, it is actually about data matters. But first of all, channeling Jennifer Aniston [1], “here comes the Science bit – concentrate”.


 
Shared Shapes

The Theory of Common Descent holds that any two organisms, extant or extinct, will have a common ancestor if you roll the clock back far enough. For example, each of fish, amphibians, reptiles and mammals had a common ancestor over 500 million years ago. As shown below, the current organism which is most like this common ancestor is the Lancelet [2].

Chordate Common Ancestor

To bring things closer to home, each of the Great Apes (Orangutans, Gorillas, Chimpanzees, Bonobos and Humans) had a common ancestor around 13 million years ago.

Great Apes Common Ancestor

So far so simple. As one would expect, animals sharing a recent common ancestor would share many attributes with both it and each other.

Convergent Evolution refers to something else. It describes where two organisms independently evolve very similar attributes that were not features of their most recent common ancestor. Thus these features are not inherited, instead evolutionary pressure has led to the same attributes developing twice. An example is probably simpler to understand.

The image at the start of this article is of an Ichthyosaur (top) and Dolphin. It is striking how similar their body shapes are. They also share other characteristics such as live birth of young, tail first. The last Ichthyosaur died around 66 million years ago alongside many other archosaurs, notably the Dinosaurs [3]. Dolphins are happily still with us, but the first toothed whale (not a Dolphin, but probably an ancestor of them) appeared around 30 million years ago. The ancestors of the modern Bottlenose Dolphins appeared a mere 5 million years ago. Thus there is tremendous gap of time between the last Ichthyosaur and the proto-Dolphins. Ichthyosaurs are reptiles, they were covered in small scales [4]. Dolphins are mammals and covered in skin not massively different to our own. The most recent common ancestor of Ichthyosaurs and Dolphins probably lived around quarter of a billion years ago and looked like neither of them. So the shape and other attributes shared by Ichthyosaurs and Dolphins do not come from a common ancestor, they have developed independently (and millions of years apart) as adaptations to similar lifestyles as marine hunters. This is the essence of Convergent Evolution.

That was the Science, here comes the Technology…


 
A Brief Hydrology of Data Lakes

From 2000 to 2015, I had some success [5] with designing and implementing Data Warehouse architectures much like the following:

Data Warehouse Architecture (click to view larger version in a new window)

As a lot of my work then was in Insurance or related fields, the Analytical Repositories tended to be Actuarial Databases and / or Exposure Management Databases, developed in collaboration with such teams. Even back then, these were used for activities such as Analytics, Dashboards, Statistical Modelling, Data Mining and Advanced Visualisation.

Overlapping with the above, from around 2012, I began to get involved in also designing and implementing Big Data Architectures; initially for narrow purposes and later Data Lakes spanning entire enterprises. Of course some architectures featured both paradigms as well.

One of the early promises of a Data Lake approach was that – once all relevant data had been ingested – this would be directly leveraged by Data Scientists to derive insight.

Over time, it became clear that it would be useful to also have some merged / conformed and cleansed data structures in the Data Lake. Once the output of Data Science began to be used to support business decisions, a need arose to consider how it could be audited and both data privacy and information security considerations also came to the fore.

Next, rather than just being the province of Data Scientists, there were moves to use Data Lakes to support general Data Discovery and even business Reporting and Analytics as well. This required additional investments in metadata.

The types of issues with Data Lake adoption that I highlighted in Draining the Swamp earlier this year also led to the advent of techniques such as Data Curation [6]. In parallel, concerns about expensive Data Science resource spending 80% of their time in Data Wrangling [7] led to the creation of a new role, that of Data Engineer. These people take on much of the heavy lifting of consolidating, fixing and enriching datasets, allowing the Data Scientists to focus on Statistical Analysis, Data Mining and Machine Learning.

Big Data Architecture (click to view larger version in a new window)

All of which leads to a modified Big Data / Data Lake architecture, embodying people and processes as well as technology and looking something like the exhibit above.

This is where the observant reader will see the concept of Convergent Evolution playing out in the data arena as well as the Natural World.


 
In Closing

Convergent Evolution of Data Architectures

Lest it be thought that I am saying that Data Warehouses belong to a bygone era, it is probably worth noting that the archosaurs, Ichthyosaurs included, dominated the Earth for orders of magnitude longer that the mammals and were only dethroned by an asymmetric external shock, not any flaw their own finely honed characteristics.

Also, to be crystal clear, much as while there are similarities between Ichthyosaurs and Dolphins there are also clear differences, the same applies to Data Warehouse and Data Lake architectures. When you get into the details, differences between Data Lakes and Data Warehouses do emerge; there are capabilities that each has that are not features of the other. What is undoubtedly true however is that the same procedural and operational considerations that played a part in making some Warehouses seem unwieldy and unresponsive are also beginning to have the same impact on Data Lakes.

If you are in the business of turning raw data into actionable information, then there are inevitably considerations that will apply to any technological solution. The key lesson is that shape of your architecture is going to be pretty similar, regardless of the technical underpinnings.


 
Notes

 
[1]
 
The two of us are constantly mistaken for one another.
 
[2]
 
To be clear the common ancestor was not a Lancelet, rather Lancelets sit on the branch closest to this common ancestor.
 
[3]
 
Ichthyosaurs are not Dinosaurs, but a different branch of ancient reptiles.
 
[4]
 
This is actually a matter of debate in paleontological circles, but recent evidence suggests small scales.
 
[5]
 
See:

 
[6]
 
A term that is unaccountably missing from The Data & Analytics Dictionary – something to add to the next release. UPDATE: Now remedied here.
 
[7]
 
Ditto. UPDATE: Now remedied here

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Version 2 of The Anatomy of a Data Function

Between November and December 2017, I published the three parts of my Anatomy of a Data Function. These were cunningly called Part I, Part II and Part III. Eight months is a long time in the data arena and I have now issued an update.

The Anatomy of a Data Function

Larger PDF version (opens in a new tab)

The changes in Version 2 are confined to the above organogram and Part I of the text. They consist of the following:

  1. Split Artificial Intelligence out of Data Science in order to better reflect the ascendancy of this area (and also its use outside of Data Science).
     
  2. Change Data Science to Data Science / Engineering in order to better reflect the continuing evolution of this area.

My aim will be to keep this trilogy up-to-date as best practice Data Functions change their shapes and contents.


 
If you would like help building or running your Data Function, or would just like to have an informal chat about the area, please get in touch
 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Link directly to entries in the Data and Analytics Dictionary

The Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary has always had internal tags (anchors for those old enough to recall their HTML) which allowed me, as its author, to link to individual entries from other web-pages I write. An example of the use of these is my article, A Brief History of Databases.

I have now made these tags public. Each entry in the Dictionary is followed by the full tag address in a box. This is accompanied by a link icon as follows:

Data Dictionary excerpt

Clicking on the link icon will copy the tag address to your clipboard. Alternatively the tag URL may just be copied from the box containing it directly. You can then use this address in your own article to link back to the D&AD entry.

As with the vast majority of my work, the contents of the Data and Analytics Dictionary is covered by a Creative Commons Attribution 4.0 International Licence. This means you can include my text or images in your own web-pages, presentations, Word documents etc. You can even modify my work, so long as you point out that you have done this.

If you would like to link back to the Data and Analytics Dictionary to provide definitions of terms that you are using, this should now be very easy. For example:

Lorem ipsum dolor sit amet, consectetur adipiscing Big Data elit. Duis tempus nisi sit amet libero vehicula Data Lake, sed tempor leo consectetur. Pellentesque suscipit sed felisData Governance ac mattis. Fusce mattis luctus posuere. Duis a Spark mattis velit. In scelerisque massa ac turpis viverra, acLogistic Regression pretium neque condimentum.

Equally, I’d be delighted if you wanted to include part of all of the text of an entry in the Data and Analytics Dictionary in your own work, commercial or personal; a link back using this new functionality would be very much appreciated.

I hope that this new functionality will be useful. An update to the Dictionary’s contents will be published in the next couple of months.
 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

A Retrospective of 2017’s Articles

A Review of 2017

This article was originally intended for publication late in the year it reviews, but, as they [1] say, the best-laid schemes o’ mice an’ men gang aft agley…

In 2017 I wrote more articles [2] than in any year since 2009, which was the first full year of this site’s existence. Some were viewed by thousands of people, others received less attention. Here I am going to ignore the metric of popular acclaim and instead highlight a few of the articles that I enjoyed writing most, or sometimes re-reading a few months later [3]. Given the breadth of subject matter that appears on peterjamesthomas.com, I have split this retrospective into six areas, which are presented in decreasing order of the number of 2017 articles I wrote in each. These are as follows:

  1. General Data Articles
  2. Data Visualisation
  3. Statistics & Data Science
  4. CDO perspectives
  5. Programme Advice
  6. Analytics & Big Data

In each category, I will pick out two or three of pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.

 
 
General Data Articles
 
The Data & Analytics Dictionary
 
August
The Data and Analytics Dictionary
My attempt to navigate the maze of data and analytics terminology. Everything from Algorithm to Web Analytics.
 
The Anatomy of a Data Function
 
November & December
The Anatomy of a Data Function: Part I, Part II and Part III
Three articles focussed on the structure and components of a modern Data Function and how its components interact with both each other and the wider organisation in order to support business goals.
 
 
Data Visualisation
 
Nucleosynthesis and Data Visualisation
 
January
Nucleosynthesis and Data Visualisation
How one of the most famous scientific data visualisations, the Periodic Table, has been repurposed to explain where the atoms we are all made of come from via the processes of nucleosynthesis.
 
Hurricanes and Data Visualisation
 
September & October
Hurricanes and Data Visualisation: Part I – Rainbow’s Gravity and Part II – Map Reading
Two articles on how Data Visualisation is used in Meteorology. Part I provides a worked example illustrating some of the problems that can arise when adopting a rainbow colour palette in data visualisation. Part II grapples with hurricane prediction and covers some issues with data visualisations that are intended to convey safety information to the public.
 
 
Statistics & Data Science
 
Toast
 
February
Toast
What links Climate Change, the Manhattan Project, Brexit and Toast? How do these relate to the public’s trust in Science? What does this mean for Data Scientists?
Answers provided by Nature, The University of Cambridge and the author.
 
How to be Surprisingly Popular
 
February
How to be Surprisingly Popular
The wisdom of the crowd relies upon essentially democratic polling of a large number of respondents; an approach that has several shortcomings, not least the lack of weight attached to people with specialist knowledge. The Surprisingly Popular algorithm addresses these shortcomings and so far has out-performed existing techniques in a range of studies.
 
A Nobel Laureate’s views on creating Meaning from Data
 
October
A Nobel Laureate’s views on creating Meaning from Data
The 2017 Nobel Prize for Chemistry was awarded to Structural Biologist Richard Henderson and two other co-recipients. What can Machine Learning practitioners learn from Richard’s observations about how to generate images from Cryo-Electron Microscopy data?
 
 
CDO Perspectives
 
Alphabet Soup
 
January
Alphabet Soup
Musings on the overlapping roles of Chief Analytics Officer and Chief Data Officer and thoughts on whether there should be just one Top Data Job in an organisation.
 
A Sweeter Spot for the CDO?
 
February
A Sweeter Spot for the CDO?
An extension of my concept of the Chief Data Officer sweet spot, inspired by Bruno Aziza of AtScale.
 
A truth universally acknowledged…
 
September
A truth universally acknowledged…
Many Chief Data Officer job descriptions have a list of requirements that resemble Swiss Army Knives. This article argues that the CDO must be the conductor of an orchestra, not someone who is a virtuoso in every single instrument.
 
 
Programme Advice
 
Bumps in the Road
 
January
Bumps in the Road
What the aftermath of repeated roadworks can tell us about the potentially deleterious impact of Change Programmes on Data Landscapes.
 
20 Risks that Beset Data Programmes
 
February
20 Risks that Beset Data Programmes
A review of 20 risks that can plague data programmes. How effectively these are managed / mitigated can make or break your programme.
 
Ideas for avoiding Big Data failures and for dealing with them if they happen
 
March
Ideas for avoiding Big Data failures and for dealing with them if they happen
Paul Barsch (EY & Teradata) provides some insight into why Big Data projects fail, what you can do about this and how best to treat any such projects that head off the rails. With additional contributions from Big Data gurus Albert Einstein, Thomas Edison and Samuel Beckett.
 
 
Analytics & Big Data
 
Bigger and Better (Data)?
 
February
Bigger and Better (Data)?
Some examples of where bigger data is not necessarily better data. Provided by Bill Vorhies and Larry Greenemeier .
 
Elephants’ Graveyard?
 
March
Elephants’ Graveyard?
Thoughts on trends in interest in Hadoop and Spark, featuring George Hill, James Kobielus, Kashif Saiyed and Martyn Richard Jones, together with the author’s perspective on the importance of technology in data-centric work.
 
 
and Finally…

I would like to close this review of 2017 with a final article, one that somehow defies classification:

 
25 Indispensable Business Terms
 
April
25 Indispensable Business Terms
An illustrated Buffyverse take on Business gobbledygook – What would Buffy do about thinking outside the box? To celebrate 20 years of Buffy the Vampire Slayer and 1st April 2017.

 
Notes

 
[1]
 
“They” here obviously standing for Robert Burns.
 
[2]
 
Thirty-four articles and one new page.
 
[3]
 
Of course some of these may also have been popular, I’m not being masochistic here!

 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

The Anatomy of a Data Function – Part III

Part I Part II Part III

Sepia's Anatomy

This is the third and final part of my review of the anatomy of a Data Function, Part I may be viewed here and Part II here.

Update:

The data arena is a fluid one. The original set of Anatomy of a Data Function articles dates back to November 2017. As of August 2018, the data function schematic has been updated to separate out Artificial Intelligence from Data Science and to change the latter to Data Science / Engineering. No doubt further changes will be made from time to time.

In the first article, I introduced the following Data Function organogram:

The Anatomy of a Data Function

Larger PDF version (opens in a new tab)

and went on to cover each of Data Strategy, Analytics & Insight and Data Operations & Technology. In Part II, I discussed the two remaining Data Function areas of Data Architecture and Data Management. In this final article, I wanted to cover the Related Areas that appear on the right of the above diagram. This naturally segues into talking about the practicalities of establishing a Data Function and highlighting some problems to be avoided or managed.

As in Parts I and II, unless otherwise stated, text indented as a quotation is excerpted from the Data and Analytics Dictionary.
 
 
Related Areas

Related Areas

I have outlined some of the key areas with which the Data Function will work. This is not intended to be a comprehensive list and indeed the boxes may be different in different organisations. Regardless of the departments that appear here, the general approach will however be similar. I won’t go through each function in great detail here. There are some obvious points to make however. The first is an overall one that clearly a collaborative approach is mandatory. While there are undeniably some police-like attributes of any Data Function, it would be best if these were carried out by friendly community policemen or women, not paramilitaries.

So rather more:

Community Police

and rather less:

Not quite so Community Police
 
Data Privacy and Information Security

Though strongly related, these areas do not generally fall under the Data Function. Indeed some legislation requires that they are separate functions. Data Privacy and Information Security are related, but also distinct from each other. Definitions are as follows:

[Data Privacy] pertains to data held by organisations about individuals (customers, counterparties etc.) and specifically to data that can be used to identify people (personally identifiable data), or is sensitive in nature, such as medical records, financial transactions and so on. There is a legal obligation to safeguard such information and many regulations around how it can be used and how long it can be retained. Often the storage and use of such data requires explicit consent from the person involved.

Data and Analytics Dictionary entry: Data Privacy

Information Security consists of the steps that are necessary to make sure that any data or information, particularly sensitive information (trade secrets, financial information, intellectual property, employee details, customer and supplier details and so on), is protected from unauthorised access or use. Threats to be guarded against would include everything from intentional industrial espionage, to ad hoc hacking, to employees releasing or selling company information. The practice of Information Security also applies to the (nowadays typical) situation where some elements of internal information is made available via the internet. There is a need here to ensure that only those people who are authenticated to access such information can do so.

Data and Analytics Dictionary entry: Information Security

 
Digital

Digital is not a box that would have necessarily have appeared on this chart 15, or even 10, years ago. However, nowadays this is often an important (and large) department in many organisations. Digital departments leverage data heavily; both what they gather themselves and and data drawn from other parts of the organisation. This can be to show customers their transactions, to guide next best actions, or to suggest potentially useful products or services. Given this, collaboration with the Data Function should be particularly strong.
 
Change Management

There are some specific points to make with respect to Change collaboration. One dimension of this was covered in Part II. Looking at things the other way round, as well as being a regular department, with what are laughingly referred to as “business as usual” responsibilities [1], the Data Function will also drive a number of projects and programmes. Depending on how this is approached in an organisation, this means either that the Data Function will need its own Project Managers etc., or to have such allocated from Change. This means that interactions with Change are bidirectional, which may be particularly challenging.

For some reason, Change departments have often ended up holding the purse strings for all projects and programmes (perhaps a less than ideal outcome), so a Data Function looking to get its own work done may run counter to this (see also the second section of this article).
 
IT

While the role of IT is perhaps narrower nowadays than historically [2], they are deeply involved in the world of data and the infrastructure that supports its movement around the organisation. This means that the Data Function needs to pay particular attention to its relationship with IT.
 
Embedded Analytics Teams

A wholly centralised approach to delivering Analytics is neither feasible, nor desirable. I generally recommend hybrid arrangements with a strong centralised group and affiliated analytical resource embedded in business teams. In some organisations such people may be part of the Data Function, or have a dotted line into it. In others the connection may be less formal. Whatever the arrangements, the best result would be if embedded analytical staff viewed themselves as part of a broader analytical and data community, which can share tips, work to standards and leverage each other’s work.
 
Data Stewards

Data Stewards are a concept that arises from a requirement to embed Data Governance policies and processes. Data Function Governance staff and Data Architects both need to work closely with Data Stewards. A definition is as follows:

This is a concept that arises out of Data Governance. It recognises that accountability for things like data quality, metadata and the implementation of data policies needs to be devolved to business departments and often locations. A Data Steward is the person within a particular part of an organisation who is responsible for ensuring that their data is fit for purpose and that their area adheres to data policies and guidelines.

Data and Analytics Dictionary entry: Data Steward

  
End User Computing

There are several good reasons for engaging with this area. First, the various EUCs that have been developed will embody some element (unsatisfied elsewhere) of requirements for the processing and or distribution of data; these needs probably need to be met. Second, EUCs can present significant risks to organisations (as well as delivering significant benefits) and ameliorating these (while hopefully retaining the benefits) should be on the list of any Data Function. Third, the people who have built EUCs tend to be knowledgeable about an organisation’s data, the sort of people who can be useful sources of information and also potential allies.

[End User Computing] is a term used to cover systems developed by people other than an organisation’s IT department or an approved commercial software vendor. It may be that such software is developed and maintained by a small group of people within a department, but more typically a single person will have created and cares for the code. EUCs may be written in mainstream languages such as Java, C++ or Python, but are frequently instead Excel- or Access-based, leveraging their shared macro/scripting language, VBA (for Visual Basic for Applications). While related to Microsoft Visual Basic (the precursor to .NET), VBA is not a stand-alone language and can only run within a Microsoft Office application, such as Excel.

Data and Analytics Dictionary entry: End User Computing (EUC)

 
Third Party Providers

Often such organisations may be contracted through the IT function; however the Data Function may also hire its own consultants / service providers. In either case, the Data Function will need to pay similar attention to external groups as it does to internal service providers.
 
 
Building a Data Function for the Practical Man [3]

Flag Planting for the Practical Man

When I published Part I of this trilogy, many people were kind enough to say that they found reading it helpful. However, some of the same people went on to ask for some practical advice on how to go about setting up such a Data Function and – in particular – how to navigate the inevitable political hurdles. While I don’t believe in recipes for success that are guaranteed to work in all circumstances, the second section of this article will cover three selected high-level themes that I think are helpful to bear in mind at the start of a Data Function journey. Here I am assuming that you are the leader of the nascent Data Function and it is your accountability to build the team while adding demonstrable business value [4].

Starting Small

It is a truth universally acknowledged, that a Leader newly in possession of a Data Function, must be in want of some staff [5]. However seldom will such a person be furnished with a budget and headcount commensurate with the task at hand; at least in the early days. Often instead, the mission, should you choose to accept it, is to begin to make a difference in the Data World with a skeleton crew at best [6]. Well no one can work miracles and so it is a question of judgement where to apply scarce resource.

My view is that this is best applied in shining a light on the existing data landscape, but in two ways. First, at the Analytics end of the spectrum, looking to unearth novel findings from an organisation’s data; the sort of task you give to a capable Data Scientist with some background in the industry sector they are operating in. Second, at the Governance end of the spectrum, documenting failures in existing data processing and reporting; in particular any that could expose the organisation to specific and tangible risks. In B2C organisations, an obvious place to look is in customer data. In B2B ones instead you can look at transactions with counterparties, or in the preparation of data for external reports, either Financial or Regulatory. Here the ideal person is a competent Data Analyst with some knowledge of the existing data landscape, in particular the compromises that have to be made to work with it.

In both cases, the objective is to tell the organisation things it does not know. Positively, a glimmer of what nuggets its data holds and the impact this could have. Negatively, examples of where a poor data landscape leads to legal, regulatory, or reputational risks.

These activities can add value early on and increase demand for more of this type of work. The first investigation can lead to the creation of a Data Science team, the second to the establishment of regular Data Audits and people to run these.

A corollary here is a point that I ceaselessly make, data exploitation and data control are two sides of the same coin. By making progress in areas that are at least superficially at antipodal locations within a Data Function, the connective tissue between them becomes more apparent.

BAU or Project?

There is a pernicious opinion held by an awful lot of people which goes as follows.

  1. We have issues with our data, its quality, completeness and fitness for purpose.
  2. We do not do a good enough job of leveraging our data to guide decision making.
  3. Therefore we need a data project / programme to sort this out once and for all.
  4. Where is the telephone number of the Change Director?

Well there is some logic to the above and setting up a data project (more likely programme) is a helpful thing to do. However, this is necessary, but not sufficient [7]. Let’s think of a comparison?

  1. We need to ensure that our Financial and Management accounts are sound.
  2. It would be helpful if business leaders had good Financial reports to help them understand the state of their business.
  3. Therefore we need a Finance project / programme to sort this out once and for all.
  4. Where is the telephone number of the Change Director?

Most CFOs would view the above as their responsibility. They have an entire function focussed on such matters. Of course they may want to run some Finance projects and Change will help with this, but a Finance Department is an ongoing necessity.

To pick another example one that illustrates just how quickly the make-up of organisations can change, just replace the word “Finance” with “Risk” in the above and “CFO” with “CRO”. While programmes may be helpful to improve either Risk or Finance, they do not run the Risk or Finance functions, the designated officers do and they have a complement of staff to assist them. It is exactly the same with data. Data programmes will enhance your use of data or control of it, but they will not ensure the day-to-day management and leverage of data in your organisation. Running “data” is the responsibility of the designated officer [8] and they should have a complement of staff to assist them as well.

The Data Function is a “business as usual” [9] function. Conveying this fact to a range of stakeholders is going to be one of the first challenges. It may be that the couple of examples I cite above can provide some ammunition for this task.

Demolishing Demoralising Demarcations

With Data Functions and their leaders both being relative emergent phenomena [10], the separation of duties between them and other areas of a business that also deal with data can be less than clear. Scanning down the Related Areas column of the overall Data Function chart, three entities stand out who may feel that they have a strong role to play in data matters: Digital, Change Management and IT.

Of course each is correct and collaboration is the best way forward. However, human nature is not always do benign and I have several times seen jockeying for position between Data, Digital, Change and IT. Route A to resolving this is of course having clarity as to everyone’s roles and a lead Executive (normally a CEO or COO) who ensures that people play nicely with each other. Back in the real world, it will be down to the leaders in each of these areas to forge some sort of consensus about who does what and why. It is probably best to realise this upfront, rather than wasting time and effort lobbying Executives to rule on things they probably have no intention of ruling on.

Nascent Data Function leaders should be aware that there will be a tendency for other teams to carve out what might be seen as the sexier elements of Data work; this can almost seem logical when – for example – a Digital team already has a full complement of web analytics staff; surely it is just a matter of pointing these at other internal data sets, right?

If we assume that the Data Function is the last of the above mentioned departments to form, then “zero sum game” thinking would dictate that whatever is accretive to the Data Function is deleterious to existing data staff in other departments. Perhaps a good place to start in combatting this mind-set is to first acknowledge it and second to take steps to allay people’s fears. It may well make sense for some staff to gravitate to the Data Function, but only if there is a compelling logic and only if all parties agree. Offering the leaders of other departments joint decision-making on such sensitive issues can be a good confidence-building step.

Setting out explicitly to help colleagues in other departments, where feasible to do so, can make very good sense and begin the necessary work of building bridges. As with most areas of human endeavour, forging good relationships and working towards the common good are both the right thing to do and put the Data Function leader in a good place as and when more contentious discussions arise.

To make this concrete, when people in another function appear to be stepping on the toes of the Data Function, instead of reacting with outrage, it may be preferable to embrace and fully understand the work that is being done. It may even make sense to support such work, even if the ultimate view is to do things a bit differently. Insisting on organisational purity and a “my way, or the highway” attitude to data matters are both steps towards a failed Data Function. Instead, engage, listen, support and – maybe over time – seek to nudge things towards your desired state.
 
 
Closing Thoughts

That's All Folks

So we have reached the end of our anatomical journey. While maybe the information contained in these three articles would pale into insignificance compared to an actual course in human anatomy, we have nevertheless covered five main work-areas within a Data Function, splitting these down into nineteen sub-areas and cataloguing eight functions with which collaboration will be key in driving success. I have also typed over 8,000 words to convey my ideas. For those who have read all of them, thank you for your perseverance; I hope that the effort has been worthwhile and that you found some of my opinions thought-provoking.

I would also like to thank the various people who have provided positive feedback on this series via LinkedIn and Facebook. Your comments were particularly influential in shaping this final chapter.

So what are the main takeaways? Well first the word collaboration has cropped up a lot and – because data is so pervasive in organisations – the need to collaborate with a wide variety of people and departments is strong. Second, extending the human anatomy analogy, while each human shares a certain basic layout (upright, bipedal, two arms, etc.), there is considerable variation within the basic parameters. The same goes for the organogram of a Data Function that I have presented at the beginning of each of these articles. The boxes may be rearranged in some organisations, some may not sit in the Data Function in others, the amount of people allocated to each work-area will vary enormously. As with human anatomy, grasping the overall shape is more important than focussing on the inevitable variations between different people.

Third, a central concept is of course that a Data Function is necessary, not just a series of data-centric projects. Even if it starts small, some dedicated resource will be necessary and it would probably be foolish to embark on a data journey without at least a skeleton crew. Fourth, in such straitened circumstances, it is important to point early and clearly to the value of data, both in reducing potentially expensive risks and in driving insights that can save money, boost market share or improve products or services. If the budget is limited, attend to these two things first.

A fifth and final thought is how little these three articles have focussed on technology. Hadoop clusters, data visualisation suites and data governance tools all have their place, but the success or failure of data-centric work tends to pivot on more human and process considerations. This theme of technology being the least important part of data work is one I have come back to time and time again over the nine years that this blog has been published. This observation remains as true today as back in 2008.
 

Part I Part II Part III

 
Notes

 
[1]
 
BAU should in general be filed along with other mythical creatures such as Unicorns, Bigfoot, The Kraken and The Loch Ness Monster.
 
[2]
 
Not least because of the rise of Data Functions, Digital Teams and stand-alone Change Organisations.
 
[3]
 
A title borrowed from J E Thompson’s Calculus for the Practical Man; a tome read by the young Richard Feynman in childhood. Today “Calculus for the Practical Person” might be a more inclusive title.
 
[4]
 
Also known as “pulling yourself up by your bootstraps”.
 
[5]
 
I seem to be channelling JA a lot at present – see A truth universally acknowledged….
 
[6]
 
Indeed I have stated on this particular journey with just myself for company on no fewer than for occasions (these three 1, 2, 3, plus at Bupa).
 
[7]
 
Once a Mathematician, always a Mathematician.
 
[8]
 
See Alphabet Soup for some ideas about what he or she might be called.
 
[9]
 
See note 1.
 
[10]
 
Despite early high-profile CDOs beginning to appear at the turn of the millennium – Joe Bugajski was appointed VP and Chief Data Officer at Visa International in 2001 (Wikipedia).

 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary