An expanded and more mobile-friendly version of the Data & Analytics Dictionary

The Data and Analytics Dictionary

A revised and expanded version of the peterjamesthomas.com Data and Analytics Dictionary has been published.

Mobile version of The Data & Analytics Dictionary (yes I have an iPhone 6s in 2020, please don't judge me!)

The previous Dictionary was not the easiest to read on mobile devices. Because of this, the layout has been amended in this release and the mobile experience should now be greatly enhanced. Any feedback on usability would be welcome.

The new Dictionary includes 22 additional definitions, bringing the total number of entries to 220, totalling well over twenty thousand words. As usual, the new definitions range across the data arena: from Data Science and Machine Learning; to Information and Reporting; to Data Governance and Controls. They are as follows:

  1. Analysis Facility
  2. Analytical Repository
  3. Boosting [Machine Learning]
  4. Conformed Data (Conformed Dimension)
  5. Data Capability
  6. Data Capability Framework (Data Capability Model)
  7. Data Capability Review (Data Capability Assessment)
  8. Data Driven
  9. Data Governance Framework
  10. Data Issue Management
  11. Data Maturity
  12. Data Maturity Model
  13. Data Owner
  14. Data Protection Officer (DPO)
  15. Data Roadmap
  16. Geospatial Tool
  17. Image Recognition (Computer Vision)
  18. Overfitting
  19. Pattern Recognition
  20. Robot (Robotics, Bot)
  21. Random Forest
  22. Structured Reporting Framework

Please remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

If you would like to contribute a definition, which will of course be acknowledged, you can use the comments section here, or the dedicated form, we look forward to hearing from you [1].

If you have found The Data & Analytics Dictionary helpful, we would love to learn more about this. Please post something in the comments section or contact us and we may even look to feature you in a future article.

The Data & Analytics Dictionary will continue to be expanded in coming months.
 


Notes

 
[1]
 
Please note that any submissions will be subject to editorial review and are not guaranteed to be accepted.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

This Structure has Novel Features which are of Considerable Business Interest

A skilled practitioner, hard at work developing elements of a Structured Reporting FrameworkA skilled practitioner, hard at work developing elements of a Structured Reporting Framework
© Jennifer Thomas Photographyview full photo.
 
  For anyone who is unaware, the title of the article echoes a 1953 Nature paper [1], which was instead “of considerable biological interest” [2]  

 
Introduction

I have been very much focussing on the start of a data journey in a series of recent articles about Data Strategy [3]. Here I shift attention to a later stage of the journey [4] and attempt to answer the question: “How best to deliver information to an organisation in a manner that will encourage people to use it?” There are several activities that need to come together here: requirements gathering that is centred on teasing out the critical business questions to be answered, data repository [5] design and the overall approach to education and communication [6]. However, I want to focus on a further pillar in the edifice of easily accessible and comprehensible Insight and Information, a Structured Reporting Framework.

In my experience, Structured Reporting Frameworks are much misunderstood. It is sometimes assumed that they are a shiny, expensive and inconsequential trinket. Some espouse the opinion that the term is synonymous with Dashboards. Others claim that that immense effort is required to create one. I have even heard people suggesting that good training materials are an alternative to such a framework. In actual fact, for a greenfield site, a Structured Reporting Framework should mostly be a byproduct of taking a best practice approach to delivering data capabilities. Even for brownfield sites, layering at least a decent approximation to a Structured Reporting Framework over existing data assets should not be a prohibitively lengthy or costly exercise if approached in the right way.

But I am getting ahead of myself, what exactly is a Structured Reporting Framework? Let’s answer this question by telling a story, well actually two stories…


 
The New Job

 
Chapter One
In which we are introduced to Jane and she makes a surprising discovery.

Jane woke up. It was good to be alive. The sun was shining, the birds were singing and she had achieved one of her lifetime goals only three brief months earlier. Yes Jane was now the Chief Executive Officer of a major organisation: Jane Doe, CEO – how that ran off the tongue. Today was going to be a good day. Later she kissed her husband and one-year-old goodbye: “have a lovely day with Daddy, little boy!”, parcelled her six-year-old into the car and dropped her off at school, before heading into work. It was early January and, on the drive in, Jane thought about the poor accountants who had had a truncated Christmas break while they wrestled the annual accounts into submission. She must remember to write an email thanking them all for their hard work. As she swept into the staff car park and slotted into the closest bay to the entrance – that phrase again: “Jane Doe, CEO” in shiny black letters above her space – she felt a warm glow of pride and satisfaction.

Jane sunk into the padded leather chair in her spacious corner office, flipped open her MacBook Air and saw a note from her CFO. As she clicked, thoughts of pleasant meetings with investors crossed her mind. Thoughts of basking in the sort of market-beating results that the company had always posted. And then she read the mail…

… unprecedented deterioration in sales …

… many customers switched to a competitor …

… prices collapsed precipitously …

… costs escalated in Q4, the reasons are unclear …

… unexpected increase in bad debts …

… massive loss …

… capital erosion …

… issues are likely to continue and maybe increase …

… if nothing changes, potential bankruptcy …

… sorry Jane, nobody saw this coming!

Shaken, Jane wondered whether at least one person had seen this coming, her predecessor as CEO who had been so keen to take early retirement. Was there some insight as to the state of the business that he had been privy to and hidden from his fellow executives? There had been no sign, but maybe his gut had told him that bad things were coming.

Pushing such unhelpful thoughts aside, Jane began to ask herself more practical questions. How was she going to face the investors, and the employees? What was she going to do? And, she decided most pertinent of all, what exactly just happened and why?


 
In an Alternative Reality

Jane Doe - Happy CEO

 
Chapter One′
In which we have already met Jane and there are precious few surprises.
 
  Jane did some stuff before arriving at work which I won’t bore the reader with unnecessarily again. Cut to Jane opening an email from her CFO…  

… it’s not great, profit is down 10% …

… but our customer retention strategy is starting to work …

… we have been able to set a floor on prices …

… the early Q4 blip in expenses is now under control …

… I’m still worried about The Netherlands …

… but we are doing better than the competition …

… at least we saw this coming last year and acted!

Jane opened up her personal dashboard, which already showed the headline figures the CFO had been citing. She clicked a filter and the display changed to show the Netherlands operations. Still glancing at the charts and numbers, she dialled Amsterdam.

“Hi Luuk, I hope you had a good break.”

“Sure Jane, how about you?”

“Good Luuk, good thank you. How about you catch me up on how things are going?”

“Of course Jane, let me pull up the numbers… Now we both know that the turnaround has been poorer here than elsewhere. Let me show you what we think is the issue and explain what we are doing. If you can split the profit and loss figures by product first and order by ascending profit.”

“OK Luuk, I’ve done that.”

“Great. Now it’s obvious that a chunk of the losses, indeed virtually all of them, are to do with our Widget Q range. I’m sure you knew that anyway, but now let’s focus on Widget Q and break it down by territory. It’s pretty clear that the Rotterdam area is where we have a problem.”

“I see that Luuk, I did some work on these numbers myself over the weekend. What else can you tell me?”

“Well, hopefully I can provide some local colour Jane. Let’s look at the actual sales and then filter these by channel. Do you see what I see?”

“I do Luuk, what is driving this problem in sales via franchises?”

“Well, in my review of November, I mentioned a start-up competitor in the Widget Q sector. If you recall, they had launched an app for franchises which helps them to run their businesses and also makes it easy to order Widget Q equivalents from their catalogue. Well, I must admit that I didn’t envisage it having this level of impact. But at least we can see what is happening.

The app is damaging us, but it’s still early days and I believe we have a narrow window within which we can respond. When I discussed these same figures with my sales team earlier, they came up with what I think is a sound strategy to counterpunch.

Let me take you through what they suggested and link it back to these figures…”

The call with Luuk had assured Jane that the Netherlands would soon be back on track. She reflected that it was going to be tough to present the annual report to investors, but at least the early warning systems had worked. She had begun to see the problems start to build up in her previous role as EVP of UK and Ireland, not only in her figures, but in those of her counterparts around the world. Jane and her predecessor had jointly developed an evidence-based plan to address the emerging threats. The old CEO had retired, secure in the knowledge that Jane had the tools to manage what otherwise might have become a crisis. He also knew that, with Jane’s help, he had acted early and acted decisively.

Jane thought about how clear discussions about unambiguous figures had helped to implement the defensive strategy, calibrate it for local markets and allowed her and her team to track progress. She could only imagine what things would have been like if everybody was not using the same figures to flag potential problems, diagnose them, come up with solutions and test that the response was working. She shuddered to think how differently things might have gone without these tools…


 
The lie through which we tell the truth [7]

Schrödinger's Profitability

I know, I know! Don’t worry, I’m not going to give up my day job and instead focus on writing the next great British novel [8]. Equally I have no plans to author a scientific paper on Schrödinger’s Profitability, no matter how tempting. It may burst the bubble of those who have been marvelling at the depth of my creative skills, but in fact neither of the above stories are really entirely fictional. Instead they are based on my first hand experience of how access to timely, accurate and pertinent information and insight can be the difference between organisational failure and organisational success. The way that Jane and her old boss were able to identify issues and formulate a strategic response is a characteristic of a Structured Reporting Framework. The way that Jane and Luuk were able to discuss identical figures and to drill into the detail behind them is another such characteristic. Structured Reporting Frameworks are about making sure that everyone in an organisation uses the same figures and ensuring that these figures are easy to find and easy to understand.

To show how this works, let’s consider a schematic [9]:

A Structured Reporting Framework leads people logically and seamlessly from a high-level perspective of performance to more granular information exposing what factors are driving this performance. This functionality is canonically delivered by a series of tailored dashboards, each supported by lower-level dashboards, analysis facilities and reports (the last of which should be limited in number).

Busy Executives and Managers have their information needs best served via visual exhibits that are focussed on their areas of priority and highlight things that are of specific concern to them. Some charts or tables may be replicated across a number of dashboards, but others with be specific to a particular area of the business. If further attention is necessary (e.g. an indicator turns red) dashboard users should have the ability to investigate the causes themselves, if necessary drilling through to detailed transactional information. Symmetrically, more junior staff, engaged in the day-to-day operation of the organisation, need up-to-date (often real-time) information relating to their area, but may also need to set this within a broader business context. This means accessing more general exhibits. For example moving from a list of recent transactions to an historical perspective of the last two years.

Importantly, when a CEO like Jane Doe drills through from their dashboard all the way to a list report this would be the identical report with the identical figures as used by front-line staff day-to-day. When Jane picks up the ‘phone to ask a question of someone, regardless of whether they are a Country Manager, or an operations person, the figures that both see will be the same.

When not accessed from dashboards, reports and analysis facilities should be grouped into a simple menu hierarchy that allows users to navigate with ease and find what they need without having to trail through 30 reports, each with cryptic titles. As mentioned above, there should be a limited number of highly functional / customisable reports and analysis facilities, each of whose purpose is crystal clear.

The way that this consistency of figures is achieved is by all elements of the Structured Reporting Framework drawing their data from the same data repositories. In a modern Data Architecture, this tends to mean two repositories, an Analytical one delivering insight and an Operational one delivering information; these would obviously be linked to each other as well.


 
Banishing some Misconceptions

Banishing Misconceptions

I started by saying that some people make the mistake of thinking that a Structured Reporting Framework is an optional extra in a modern data landscape. In fact is is the crucial final link between an organisation’s data and the people who need to use it. In many ways how people experience data capabilities will be determined by this final link. Without paying attention to this, your shiny warehouse or data lake will be a technological curiosity, not an indispensable business tool. When the sadly common refrain of “we built state-of-the-art data capabilities, why is noone using them?” is heard, the lack of a Structured Reporting Framework is often the root cause of poor user adoption.

When building a data architecture from scratch, elements of your data repository should be so aligned with business needs that overlaying them with a Structured Reporting Framework should be a relatively easy task. But even an older and more fragmented data landscape can be improved at minimal cost by better organising current reports into more user-friendly menus [10] and by introducing some dashboards as alternative access points to them. Work is clearly required to do this, which might include some tweaks to the underlying repositories, but this is does not normally require re-writing all reports again from scratch. Such work can be approached pragmatically and incrementally, perhaps revamping reports for a given function, such as sales, before moving on to the next area. This way business value is also drip fed to the organisation.


 
I hope that this article will encourage some people to look at the idea of Structured Reporting Frameworks again. My experience is that attention paid to this concept can reap great returns at costs that can be much lower than you might expect.

It is worth thinking hard about which version of Jane Doe, CEO you want to be: the one in the dark reacting too late to events, or the one benefiting from the illumination provided by a Structured Reporting Framework.


 
If you would like to learn more about the impact that a Structured Reporting Framework can have on your organisation, or want to understand how to implement one, then you can get in contact via the form provided. You can also schedule a meeting with us directly, or speak to us on +44 (0) 20 8895 6826.


Notes

 
[1]
 
WATSON, J., CRICK, F. Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid. Nature 171, 737–738 (1953).
 
[2]
 
From what I have gleaned from those who knew (know in Watson’s case) the pair, neither was (is) the most modest of men. I therefore ascribe this not insubstantial understatement to either the editors at Nature or common-all-garden litotes.
 
[3]
 
All of which are handily collected into our Data Strategy Hub.
 
[4]
 
Though not necessarily much later if you adopt an incremental approach to the delivery of Data Capabilities.
 
[5]
 
Be that Curated Data Lake or Conformed Data Warehouse.
 
[6]
 
See the Cultural Transformation section of my repository of Keynote Articles.
 
[7]
 
Albert Camus, referring to fiction in L’Étranger.
 
[8]
 
I still have my work cut out to finish my factual book, Glimpses of Symmetry.
 
[9]
 
This is a simplified version of one that I use in my own data consulting work.
 
[10]
 
Ideally rationalising and standardising look and feel and terminology at the same time.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

Put our Knowledge and Writing Skills to Work for you

peterjamesthomas.com White Papers can be very absorbing reads

As well as consultancy, research and interim work, peterjamesthomas.com Ltd. helps organisations in a number of other ways. The recently launched Data Strategy Review Service is just one example.

Another service we provide is writing White Papers for clients. Sometimes the labels of these are white [1] as well as the paper. Sometimes Peter James Thomas is featured as the author. White Papers can be based on themes arising from articles published here, they can feature findings from de novo research commissioned in the data arena, or they can be on a topic specifically requested by the client.

Seattle-based Data Consultancy, Neal Analytics, is an organisation we have worked with on a number of projects and whose experience and expertise dovetails well with our own. They recently commissioned a White Paper expanding on our 2018 article, Building Momentum – How to begin becoming a Data-driven Organisation. The resulting paper, The Path to Data-Driven, has just been published on Neal Analytics’ site (they have a lot of other interesting content, which I would recommend checking out):

Neal Analytics White Paper - The Path to Data-Driven
Clicking on the above image will take you to Neal Analytics’ site, where the White Paper may be downloaded for free and without registration

If you find the articles published on this site interesting and relevant to your work, then perhaps – like Neal Analytics – you would consider commissioning us to write a White Paper or some other document. If so, please just get in contact, or simply schedule an introductory ‘phone call. We have a degree of flexibility on the commercial side and will most likely be able to come up with an approach that fits within your budget. Although we are based in the UK, commissions – like Neal Analytics’s one – from organisations based in other countries are welcome.
 


Notes

 
[1]
 
White-label Product – Wikipedia

 
peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

A Picture Paints a Thousand Numbers

Charts

Introduction

The recent update of The Data & Analytics Dictionary featured an entry on Charts. Entries in The Dictionary are intended to be relatively brief [1] and also the layout does not allow for many illustrations. Given this, I have used The Dictionary entries as a basis for this slightly expanded article on the subject of chart types.

A Chart is a way to organise and Visualise Data with the general objective of making it easier to understand and – in particular – to discern trends and relationships. This article will cover some of the most frequently used Chart types, which appear in alphabetical order.

Note:
 
Categories, Values and Axes

Here an “axis” is a fixed reference line (sometimes invisible for stylistic reasons) which typically goes vertically up the page or horizontally from left to right across the page (but see also Radar Charts). Categories and values (see below) are plotted on axes. Most charts have two axes.

Throughout I use the word “category” to refer to something discrete that is plotted on an axis, for example France, Germany, Italy and The UK, or 2016, 2017, 2018 and 2019. I use the word “value” to refer to something more continuous plotted on an axis, such as sales or number of items etc. With a few exceptions, the Charts described below plot values against categories. Both Bubble Charts and Scatter Charts plot values against other values.

Series

I use “series” to mean sets of categories and values. So if the categories are France, Germany, Italy and The UK; and the values are sales; then different series may pertain to sales of different products by country.


Index

Bar & Column Charts Bubble Charts Cartograms
Histograms Line Charts Map Charts
Pie Charts Radar Charts / Spider Charts Scatter Charts
Tree Maps    


Bar & Column Charts
Clustered Bar Charts, Stacked Bar Charts

Bar Chart

Bar Charts is the generic term, but this is sometimes reserved for charts where the categories appear on the vertical axis, with Column Charts being those where categories appear on the horizontal axis. In either case, the chart has a series of categories along one axis. Extending righwards (or upwards) from each category is a rectangle whose width (height) is proportional to the value associated with this category. For example if the categories related to products, then the size of rectangle appearing against Product A might be proportional to the number sold, or the value of such sales.

Click here to view a larger version in a new tab

|  © JMB (2014)  |  Used under a Creative Commons licence  |

The exhibit above, which is excerpted from Data Visualisation – A Scientific Treatment, is a compound one in which two bar charts feature prominently.

Clustered Column Chart with Products

Sometimes the bars are clustered to allow multiple series to be charted side-by-side, for example yearly sales for 2015 to 2018 might appear against each product category. Or – as above – sales for Product A and Product B may both be shown by country.

Stacked Bar Chart

Another approach is to stack bars or columns on top of each other, something that is sometimes useful when comparing how the make-up of something has changed.

See also: a section of As Nice as Pie



Bubble Charts

Bubble Chart

Bubble Charts are used to display three dimensions of data on a two dimensional chart. A circle is placed with its centre at a value on the horizontal and vertical axes according to the first two dimensions of data, but then then the area (or less commonly the diameter [2]) of the circle reflects the third dimension. The result is reminiscent of a glass of champagne (then maybe this says more about the author than anything else).

Bubble Planets

You can also use bubble charts in a quite visceral way, as exemplified by the chart above. The vertical axis plots the number of satellites of the four giant planets in the Solar System. The horizontal axis plots the closest that they ever come to the Sun. The size of the planets themselves is proportional to their relative sizes.

See also: Data Visualisation according to a Four-year-old



Cartograms

Cartogram

There does not seem to be a generally accepted definition of Cartograms. Some authorities describe them as any diagram using a map to display statistical data; I cover this type of general chart in Map Charts below. Instead I will define a Cartogram more narrowly as a geographic map where areas of map sections are changed to be proportional to some other value; resulting in a distorted map. So, in a map of Europe, the size of countries might be increased or decreased so that their new areas are proportional to each country’s GDP.

Cartogram

Alternatively the above cartogram of the United States has been distorted (and coloured) to emphasise the population of each state. The dark blue of California and the slightly less dark blues of Texas, Florida and New York dominate the map.



Histograms

Histogram

A type of Bar Chart (typically with categories along the horizontal axis) where the categories are bins (or buckets) and the bars are proportional to the number of items falling into a bin. For example, the bins might be ranges of ages, say 0 to 19, 20 to 39, 30 to 49 and 50+ and the bars appearing against each might be the UK female population falling into each bin.

Brexit Bar
UK Referendum on EU Membership – Voting by age bracket

The diagram above is a bipartite quasi-histogram [3] that I created to illustrate another article. It is not a true histogram as it shows percentages for and against in each bin rather than overall frequencies.

Brexit Bar 2
UK Referendum on EU Membership – Numbers voting by age bracket

In the same article, I addressed this shortcoming with a second view of the same data, which is more histogram-like (apart from having a total category) and appears above. The point that I was making related to how Data Visualisation can both inform and mislead depending on the presentational choices taken.



Line Charts
Fan Charts, Area Charts

Line Chart

These typically have categories across the horizontal axis and could be considered as a set of line segments joining up the tops of what would be the rectangles on a Bar Chart. Clearly multiple lines, associated with multiple series, can be plotted simultaneously without the need to cluster rectangles as is required with Bar Charts. Lines can also be used to join up the points on Scatter Charts assuming that these are sufficiently well ordered to support this.

Line (Fan) Chart

Adaptations of Line Charts can also be used to show the probability of uncertain future events as per the exhibit above. The single red line shows the actual value of some metric up to the middle section of the chart. Thereafter it is the central prediction of a range of possible values. Lying above and below it are shaded areas which show bands of probability. For example it may be that the probability of the actual value falling within the area that has the darkest shading is 50%. A further example is contained in Limitations of Business Intelligence. Such charts are sometimes called Fan Charts.

Area Chart

Another type of Line Chart is the Area Chart. If we can think of a regular Line Chart as linking the tops of an invisible Bar Chart, then an Area Chart links the tops of an invisible Stacked Bar Chart. The effect is that how a band expands and contracts as we move across the chart shows how the contribution this category makes to the whole changes over time (or whatever other category we choose for the horizontal axis).

See also: The first exhibit in New Thinking, Old Thinking and a Fairytale



Map Charts

Cartogram

These place data on top of geographic maps. If we consider the canonical example of a map of the US divided into states, then the degree of shading of each state could be proportional to some state-related data (e.g. average income quartile of residents). Or more simply, figures could appear against each state. Bubbles could be placed at the location of major cities (or maybe a bubble per country or state etc.) with their size relating to some aspect of the locale (e.g.population). An example of this approach might be a map of US states with their relative populations denoted by Bubble area.

Rainfall intensity

Also data could be overlaid on a map, for example – as shown above – coloured bands corresponding to different intensities of rainfall in different areas. This exhibit is excerpted from Hurricanes and Data Visualisation: Part I – Rainbow’s Gravity.



Pie Charts

Pie Chart

These circular charts normally display a single series of categories with values, showing the proportion each category contributes to the total. For example a series might be the nations that make up the United Kingdom and their populations: England 55.62 million people, Scotland 5.43 million, Wales 3.13 million and Northern Ireland 1.87 million.

UK Population by Nation

The whole circle represents the total of all the category values (e.g. the UK population of 66.05 million people [4]). The ratio of a segment’s angle to 360° (i.e. the whole circle) is equal to the percentage of the total represented by the linked category’s value (e.g. Scotland is 8.2% of the UK population and so will have a segment with an angle of just under 30°).

Brexit Flag
UK Referendum on EU Membership – Number voting by age bracket (see notes)

Sometimes – as illustrated above – the segments are “exploded”away from each other. This is taken from the same article as the other voting analysis exhibits.

See also: As Nice as Pie, which examines the pros and cons of this type of chart in some depth.



Radar Charts / Spider Charts

Radar Chart

Radar Charts are used to plot one or more series of categories with values that fall into the same range. If there are six categories, then each has its own axis called a radius and the six of these radiate at equal angles from a central point. The calibration of each radial axis is the same. For example Radar Charts are often used to show ratings (say from 5 = Excellent to 1 = Poor) so each radius will have five points on it, typically with low ratings at the centre and high ones at the periphery. Lines join the values plotted on each adjacent radius, forming a jagged loop. Where more than one series is plotted, the relative scores can be easily compared. A sense of aggregate ratings can also be garnered by seeing how much of the plot of one series lies inside or outside of another.

Part of a Data Capability Assessment

I use Radar Charts myself extensively when assessing organisations’ data capabilities. The above exhibit shows how an organisation ranks in five areas relating to Data Architecture compared to the best in their industry sector [5].



Scatter Charts

Scatter Chart

In most of the cases we have dealt with to date, one axis has contained discrete categories and the other continuous values (though our rating example for the Radar Chart) had discrete categories and values). For a Scatter Chart both axes plot values, either continuous or discrete. A series would consist of a set of pairs of values, one to plotted on the horizontal axis and one to be plotted on the vertical axis. For example a series might be a number of pairs of midday temperature (to be plotted on the horizontal axis) and sales of ice cream (to be plotted on the vertical axis). As may be deduced from the example, often the intention is to establish a link between the pairs of values – do ice cream sales increase with temperature? This aspect can be highlighted by drawing a line of best fit on the chart; one that minimises the total distance between each plotted point and the line. Further series, say sales of coffee versus midday temperature can be added.

Here is a further example, which illustrates potential correlation between two sets of data, one on the x-axis and the other on the y-axis:

Climate Change Scatter Chart

As always a note of caution must be introduced when looking to establish correlations using scatter graphs. The inimitable Randall Munroe of xkcd.com [7] explains this pithility as follows:

By the third trimester there will be hundreds of babies inside you.

|  © Randall Munroe, xkcd.com (2009)  |  Excerpted from: Extrapolating  |

See also: Linear Regression



Tree Maps

Tree Map

Tree Maps require a little bit of explanation. The best way to understand them is to start with something more familiar, a hierarchy diagram with three levels (i.e. something like an organisation chart). Consider a cafe that sells beverages, so we have a top level box labeled Beverages. The Beverages box splits into Hot Beverages and Cold Beverages at level 2. At level 3, Hot Beverages splits into Tea, Coffee, Herbal Tea and Hot Chocolate; Cold Beverages splits into Still Water, Sparkling Water, Juices and Soda. So there is one box at level 1, two at level 2 and eight at level 3. As ever a picture paints a thousand words:

Hierarchy Diagram

Next let’s also label each of the boxes with the value of sales in the last week. If you add up the sales for Tea, Coffee, Herbal Tea and Hot Chocolate we obviously get the sales for Hot Beverages.

Labelled Hierarchy Diagram

A Tree Map takes this idea and expands on it. A Tree Map using the data from our example above might look like this:

Treemap

First, instead of being linked by lines, boxes at level 3 (leaves let’s say) appear within their parent box at level 2 (branches maybe) and the level 2 boxes appear within the overall level 1 box (the whole tree); so everything is nested. Sometimes, as is the case above, rather than having the level 2 boxes drawn explicitly, the level 3 boxes might be colour coded. So above Tea, Coffee, Herbal Tea and Hot Chocolate are mid-grey and the rest are dark grey.

Next, the size of each box (at whatever level) is proportional to the value associated with it. In our example, 66.7% of sales (\frac{1000}{1500}) are of Hot Beverages. Then two-thirds of the Beverages box will be filled with the Hot Beverages box and one-third (\frac{500}{1500}) with the Cold Beverage box. If 20% of Cold Beverages sales (\frac{100}{500}) are Still Water, then the Still Water box will fill one fifth of the Cold Beverages box (or one fifteenth – \frac{100}{1500} – of the top level Beverages box).

It is probably obvious from the above, but it is non-trivial to find a layout that has all the boxes at the right size, particularly if you want to do something else, like have the size of boxes increase from left to right. This is a task generally best left to some software to figure out.


In Closing

The above review of various chart types is not intended to be exhaustive. For example, it doesn’t include Waterfall Charts [8], Stock Market Charts (or Open / High / Low / Close Charts [9]), or 3D Surface Charts [10] (which seldom are of much utility outside of Science and Engineering in my experience). There are also a number of other more recherché charts that may be useful in certain niche areas. However, I hope we have covered some of the more common types of charts and provided some helpful background on both their construction and usage.
 


Notes

 
[1]
 
Certainly by my normal standards!
 
[2]
 
Research suggests that humans are more attuned to comparing areas of circles than say their diameters.
 
[3]
 
© peterjamesthomas.com Ltd. (2019).
 
[4]
 
Excluding overseas territories.
 
[5]
 
This has been suitably redacted of course. Typically there are four other such exhibits in my assessment pack: Data Strategy, Data Organisation, MI & Analytics and Data Controls, together with a summary radar chart across all five lower level ones.
 
[6]
 
The atmospheric CO2 records were sourced from the US National Oceanographic and Atmospheric Administration’s Earth System Research Laboratory and relate to concentrations measured at their Mauna Loa station in Hawaii. The Global Average Surface Temperature records were sourced from the Earth Policy Institute, based on data from NASA’s Goddard Institute for Space Studies and relate to measurements from the latter’s Global Historical Climatology Network. This exhibit is meant to be a basic illustration of how a scatter chart can be used to compare two sets of data. Obviously actual climatological research requires a somewhat more rigorous approach than the simplistic one I have employed here.
 
[7]
 
Randall’s drawings are used (with permission) liberally throughout this site,Including:

 
[8]
 
Waterfall Chart – Wikipedia.
 
[9]
 
Open-High-Low-Close Chart – Wikipedia.
 
[10]
 
Surface Chart – AnyCharts.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

The peterjamesthomas.com Data Strategy Hub

The peterjamesthomas.com Data Strategy Hub
Today we launch a new on-line resource, The Data Strategy Hub. This presents some of the most popular Data Strategy articles on this site and will expand in coming weeks to also include links to articles and other resources pertaining to Data Strategy from around the Internet.

If you have an article you have written, or one that you read and found helpful, please post a link in a comment here or in the actual Data Strategy Hub and I will consider adding it to the list.
 


peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

Data Visualisation according to a Four-year-old

Solar System

When I recently published the latest edition of The Data & Analytics Dictionary, I included an entry on Charts which briefly covered a number of the most frequently used ones. Given that entries in the Dictionary are relatively brief [1] and that its layout allows little room for illustrations, I decided to write an expanded version as an article. This will be published in the next couple of weeks (UPDATE: now published as A Picture Paints a Thousand Numbers).

One of the exhibits that I developed for this charts article was to illustrate the use of Bubble Charts. Given my childhood interest in Astronomy, I came up with the following – somewhat whimsical – exhibit:

Bubble Planets

Bubble Charts are used to plot three dimensions of data on a two dimensional graph. Here the horizontal axis is how far each of the gas and ice giants is from the Sun [2], the vertical axis is how many satellites each planet has [3] and the final dimension – indicated by the size of the “bubbles” – is the actual size of each planet [4].

Anyway, I thought it was a prettier illustration of the utility of Bubble Charts that the typical market size analysis they are often used to display.

However, while I was doing this, my older daughter wandered into my office and said “look at the picture I drew for you Daddy” [5]. Coincidentally my muse had been her muse and the result is the Data Visualisation appearing at the top of this article. Equally coincidentally, my daughter had also encoded three dimensions of data in her drawing:

  1. Rank of distance from the Sun
  2. Colour / appearance
  3. Number of satellites [6]

She also started off trying to capture relative size. After a great start with Mercury, Venus and Earth, she then ran into some Data Quality issues with the later planets (she is only four).

Here is an annotated version:

Solar System (annotated)

I think I’m at least OK at Data Visualisation, but my daughter’s drawing rather knocked mine into a cocked hat [7]. And she included a comet, which makes any Data Visualisation better in my humble opinion; what Chart would not benefit from the inclusion of a comet?
 


Notes

 
[1]
 
For me at least that is.
 
[2]
 
Actually the measurement is the closest that each planet comes to the Sun, its perihelion.
 
[3]
 
This may seem a somewhat arbitrary thing to plot, but a) the exhibit is meant to be illustrative only and b) there does nevertheless seem to be a correlation of sorts; I’m sure there is some Physical reason for this, which I’ll have to look into sometime.
 
[4]
 
Bubble Charts typically offer the option to scale bubbles such that either their radius / diameter or their area is in proportion to the value to be displayed. I chose the equatorial radius as my metric.
 
[5]
 
It has to be said that this is not an atypical occurence.
 
[6]
 
For at least the four rocky planets, it might have taken a while to draw all 79 of Jupiter’s moons.
 
[7]
 
I often check my prose for phrases that may be part of British idiom but not used elsewhere. In doing this, I learnt today that “knock into a cocked hat” was originally an American phrase; it is first found in the 1830s.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

The latest edition of The Data & Analytics Dictionary is now out

The Data and Analytics Dictionary

After a hiatus of a few months, the latest version of the peterjamesthomas.com Data and Analytics Dictionary is now available. It includes 30 new definitions, some of which have been contributed by people like Tenny Thomas Soman, George Firican, Scott Taylor and and Taru Väre. Thanks to all of these for their help.

  1. Analysis
  2. Application Programming Interface (API)
  3. Business Glossary (contributor: Tenny Thomas Soman)
  4. Chart (Graph)
  5. Data Architecture – Definition (2)
  6. Data Catalogue
  7. Data Community
  8. Data Domain (contributor: Taru Väre)
  9. Data Enrichment
  10. Data Federation
  11. Data Function
  12. Data Model
  13. Data Operating Model
  14. Data Scrubbing
  15. Data Service
  16. Data Sourcing
  17. Decision Model
  18. Embedded BI / Analytics
  19. Genetic Algorithm
  20. Geospatial Data
  21. Infographic
  22. Insight
  23. Management Information (MI)
  24. Master Data – additional definition (contributor: Scott Taylor)
  25. Optimisation
  26. Reference Data (contributor: George Firican)
  27. Report
  28. Robotic Process Automation
  29. Statistics
  30. Self-service (BI or Analytics)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

If you would like to contribute a definition, which will of course be acknowledged, you can use the comments section here, or the dedicated form, we look forward to hearing from you [1].

If you have found The Data & Analytics Dictionary helpful, we would love to learn more about this. Please post something in the comments section or contact us and we may even look to feature you in a future article.

The Data & Analytics Dictionary will continue to be expanded in coming months.
 


Notes

 
[1]
 
Please note that any submissions will be subject to editorial review and are not guaranteed to be accepted.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

In praise of Jam Doughnuts or: How I learned to stop worrying and love Hybrid Data Organisations

The above infographic is the work of Management Consultants Oxbow Partners [1] and employs a novel taxonomy to categorise data teams. First up, I would of course agree with Oxbow Partners’ statement that:

Organisation of data teams is a critical component of a successful Data Strategy

Indeed I cover elements of this in two articles [2]. So the structure of data organisations is a subject which, in my opinion, merits some consideration.

Oxbow Partners draw distinctions between organisations where the Data Team is separate from the broader business, ones where data capabilities are entirely federated with no discernible “centre” and hybrids between the two. The imaginative names for these are respectively The Burger, The Smoothie and The Jam Doughnut. In this article, I review Oxbow Partners’s model and offer some of my own observations.



The Burger – Centralised

The Burger

Having historically recommended something along the lines of The Burger, not least when an organisation’s data capabilities are initially somewhere between non-existent and very immature, my views have changed over time, much as the characteristics of the data arena have also altered. I think that The Burger still has a role, in particular, in a first phase where data capabilities need to be constructed from scratch, but it has some weaknesses. These include:

  1. The pace of change in organisations has increased in recent years. Also, many organisations have separate divisions or product lines and / or separate geographic territories. Change can be happening in sometimes radically different ways in each of these as market conditions may vary considerably between Division A’s operations in Switzerland and Division B’s operations in Miami. It is hard for a wholly centralised team to react with speed in such a scenario. Even if they are aware of the shifting needs, capacity may not be available to work on multiple areas in parallel.
     
  2. Again in the above scenario, it is also hard for a central team to develop deep expertise in a range of diverse businesses spread across different locations (even if within just one country). A central team member who has to understand the needs of 12 different business units will necessarily be at a disadvantage when considering any single unit compared to a colleague who focuses on that unit and nothing else.
     
  3. A further challenge presented here is maintaining the relationships with colleagues in different business units that are typically a prerequisite for – for example – driving adoption of new data capabilities.


The Smoothie – Federated

The Smoothie

So – to address these shortcomings – maybe The Smoothie is a better organisational design. Well maybe, but also maybe not. Problems with these arrangements include:

  1. Probably biggest of all, it is an extremely high-cost approach. The smearing out of work on data capabilities inevitably leads to duplication of effort with – for example – the same data sourced or combined by different people in parallel. The pace of change in organisations may have increased, but I know few that are happy to bake large costs into their structures as a way to cope with this.
     
  2. The same duplication referred to above creates another problem, the way that data is processed can vary (maybe substantially) between different people and different teams. This leads to the nightmare scenario where people spend all their time arguing about whose figures are right, rather than focussing on what the figures say is happening in the business [3]. Such arrangements can generate business risk as well. In particular, in highly regulated industries heterogeneous treatment of the same data tends to be frowned upon in external reviews.
     
  3. The wholly federated approach also limits both opportunities for economies of scale and identification of areas where data capabilities can meet the needs of more than one business unit.
     
  4. Finally, data resources who are fully embedded in different parts of a business may become isolated and may not benefit from the exchange of ideas that happens when other similar people are part of the immediate team.

So to summarise we have:

Burger vs Smoothie



The Jam Doughnut – Hybrid

The Jam Doughnut

Which leaves us with The Jam Doughnut, in my opinion, this is a Goldilocks approach that captures as much as possible of the advantages of the other two set-ups, while mitigating their drawbacks. It is such an approach that tends to be my recommendation for most organisations nowadays. Let me spend a little more time describing its attributes.

I see the best way of implementing a Jam Doughnut approach is via a hub-and-spoke model. The hub is a central Data Team, the spokes are data-centric staff in different parts of the business (Divisions, Functions, Geographic Territories etc.).

Data Hub and Spoke

It is important to stress that each spoke satellite is not a smaller copy of the central Data Team. Some roles will be more federated, some more centralised according to what makes sense. Let’s consider a few different roles to illustrate this:

  • Data Scientist – I would see a strong central group of these, developing methodologies and tools, but also that many business units would have their own dedicated people; “spoke”-based people could also develop new tools and new approaches, which could be brought into the “hub” for wider dissemination
     
  • Analytics Expert – Similar to the Data Scientists, centralised “hub” staff might work more on standards (e.g. for Data Visualisation), developing frameworks to be leveraged by others (e.g. a generic harness for dashboards that can be leveraged by “spoke” staff), or selecting tools and technologies; “spoke”-based staff would be more into the details of meeting specific business needs
     
  • Data Engineer – Some “spoke” people may be hybrid Data Scientists / Data Engineers and some larger “spoke” teams may have dedicated Data Engineers, but the needle moves more towards centralisation with this role
     
  • Data Architect – Probably wholly centralised, but some “spoke” staff may have an architecture string to their bow, which would of course be helpful
     
  • Data Governance Analyst – Also probably wholly centralised, this is not to downplay the need for people in the “spokes” to take accountability for Data Governance and Data Quality improvement, but these are likely to be part-time roles in the “spokes”, whereas the “hub” will need full-time Data Governance people

It is also important to stress that the various spokes should also be in contact with each other, swapping successful approaches, sharing ideas and so on. Indeed, you could almost see the spokes beginning to merge together somewhat to form a continuum around the Data Team. Maybe the merged spokes could form the “dough”, with the Data Team being the “jam” something like this:

Data Hub and Spoke

I label these types of arrangements a Data Community and this is something that I have looked to establish and foster in a few recent assignments. Broadly a Data Community is something that all data-centric staff would feel part of; they are obviously part of their own segment of the organisation, but the Data Community is also part of their corporate identity. The Data Community facilities best practice approaches, sharing of ideas, helping with specific problems and general discourse between its members. I will be revisiting the concept of a Data Community in coming weeks. For now I would say that one thing that can help it to function as envisaged is sharing common tooling. Again this is a subject that I will return to shortly.

I’ll close by thanking Oxbow Partners for some good mental stimulation – I will look forward to their next data-centric publication.
 


 

Disclosure:

It is peterjamesthomas.com’s policy to disclose any connections with organisations or individuals mentioned in articles.

Oxbow Partners are an advisory firm for the insurance industry covering Strategy, Digital and M&A. Oxbow Partners and peterjamesthomas.com Ltd. have a commercial association and peterjamesthomas.com Ltd. was also engaged by one of Oxbow Partners’ principals, Christopher Hess, when he was at a former organisation.

 
Notes

 
[1]
 
Though the author might have had a minor role in developing some elements of it as well.
 
[2]
 
The Anatomy of a Data Function and A Simple Data Capability Framework.
 
[3]
 
See also The impact of bad information on organisations.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 
 

A Simple Data Capability Framework

Introduction

As part of my consulting business, I end up thinking about Data Capability Frameworks quite a bit. Sometimes this is when I am assessing current Data Capabilities, sometimes it is when I am thinking about how to transition to future Data Capabilities. Regular readers will also recall my tripartite series on The Anatomy of a Data Function, which really focussed more on capabilities than purely organisation structure [1].

Detailed frameworks like the one contained in Anatomy are not appropriate for all audiences. Often I need to provide a more easily-absorbed view of what a Data Function is and what it does. The exhibit above is one that I have developed and refined over the last three or so years and which seems to have resonated with a number of clients. It has – I believe – the merit of simplicity. I have tried to distil things down to the essentials. Here I will aim to walk the reader through its contents, much of which I hope is actually self-explanatory.

The overall arrangement has been chosen intentionally, the top three areas are visible activities, the bottom three are more foundational areas [2], ones that are necessary for the top three boxes to be discharged well. I will start at the top left and work across and then down.
 
 
Collation of Data to provide Information

Dashboard

This area includes what is often described as “traditional” reporting [3], Dashboards and analysis facilities. The Information created here is invaluable for both determining what has happened and discerning trends / turning points. It is typically what is used to run an organisation on a day-to-day basis. Absence of such Information has been the cause of underperformance (or indeed major losses) in many an organisation, including a few that I have been brought in to help. The flip side is that making the necessary investments to provide even basic information has been at the heart of the successful business turnarounds that I have been involved in.

The bulk of Business Intelligence efforts would also fall into this area, but there is some overlap with the area I next describe as well.
 
 
Leverage of Data to generate Insight

Voronoi diagram

In this second area we have disciplines such as Analytics and Data Science. The objective here is to use a variety of techniques to tease out findings from available data (both internal and external) that go beyond the explicit purpose for which it was captured. Thus data to do with bank transactions might be combined with publically available demographic and location data to build an attribute model for both existing and potential clients, which can in turn be used to make targeted offers or product suggestions to them on Digital platforms.

It is my experience that work in this area can have a massive and rapid commercial impact. There are few activities in an organisation where a week’s work can equate to a percentage point increase in profitability, but I have seen insight-focussed teams deliver just that type of ground-shifting result.
 
 
Control of Data to ensure it is Fit-for-Purpose

Data controls

This refers to a wide range of activities from Data Governance to Data Management to Data Quality improvement and indeed related concepts such as Master Data Management. Here as well as the obvious policies, processes and procedures, together with help from tools and technology, we see the need for the human angle to be embraced via strong communications, education programmes and aligning personal incentives with desired data quality outcomes.

The primary purpose of this important work is to ensure that the information an organisation collates and the insight it generates are reliable. A helpful by-product of doing the right things in these areas is that the vast majority of what is required for regulatory compliance is achieved simply by doing things that add business value anyway.
 
 
Data Architecture / Infrastructure

Data architecture

Best practice has evolved in this area. When I first started focussing on the data arena, Data Warehouses were state of the art. More recently Big Data architectures, including things like Data Lakes, have appeared and – at least in some cases – begun to add significant value. However, I am on public record multiple times stating that technology choices are generally the least important in the journey towards becoming a data-centric organisation. This is not to say such choices are unimportant, but rather that other choices are more important, for example how best to engage your potential users and begin to build momentum [4].

Having said this, the model that seems to have emerged of late is somewhat different to the single version of the truth aspired to for many years by organisations. Instead best practice now encompasses two repositories: the first Operational, the second Analytical. At a high-level, arrangements would be something like this:

Data architecture

The Operational Repository would contain a subset of corporate data. It would be highly controlled, highly reconciled and used to support both regular reporting and a large chunk of dashboard content. It would be designed to also feed data to other areas, notably Finance systems. This would be complemented by the Analytical Repository, into which most corporate data (augmented by external data) would be poured. This would be accessed by a smaller number of highly skilled staff, Data Scientists and Analytics experts, who would use it to build models, produce one off analyses and to support areas such as Data Visualisation and Machine Learning.

It is not atypical for Operational Repositories to be SQL-based and Analytical Repsoitories to be Big Data-based, but you could use SQL for both or indeed Big Data for both according to the circumstances of an organisation and its technical expertise.
 
 
Data Operating Model / Organisation Design

Organisational design

Here I will direct readers to my (soon to be updated) earlier work on The Anatomy of a Data Function. However, it is worth mentioning a couple of additional points. First an Operating Model for data must encompass the whole organisation, not just the Data Function. Such a model should cover how data is captured, sourced and used across all departments.

Second I think that the concept of a Data Community is important here, a web of like-minded Data Scientists and Analytics people, sitting in various business areas and support functions, but linked to the central hub of the Data Function by common tooling, shared data sets (ideally Curated) and aligned methodologies. Such a virtual data team is of course predicated on an organisation hiring collaborative people who want to be part of and contribute to the Data Community, but those are the types of people that organisations should be hiring anyway [5].
 
 
Data Strategy

Data strategy

Our final area is that of Data Strategy, something I have written about extensively in these pages [6] and a major part of the work that I do for organisations.

It is an oft-repeated truism that a Data Strategy must reflect an overarching Business Strategy. While this is clearly the case, often things are less straightforward. For example, the Business Strategy may be in flux; this is particularly the case where a turn-around effort is required. Also, how the organisation uses data for competitive advantage may itself become a central pillar of its overall Business Strategy. Either way, rather than waiting for a Business Strategy to be finalised, there are a number of things that will need to be part of any Data Strategy: the establishment of a Data Function; a focus on making data fit-for-purpose to better support both information and insight; creation of consistent and business-focussed reporting and analysis; and the introduction or augmentation of Data Science capabilities. Many of these activities can help to shape a Business Strategy based on facts, not gut feel.

More broadly, any Data Strategy will include: a description of where the organisation is now (threats and opportunities); a vision for commercially advantageous future data capabilities; and a path for moving between the current and the future states. Rather than being PowerPoint-ware, such a strategy needs to be communicated assiduously and in a variety of ways so that it can be both widely understood and form a guide for data-centric activities across the organisation.
 
 
Summary
 
As per my other articles, the data capabilities that a modern organisation needs are broader and more detailed than those I have presented here. However, I have found this simple approach a useful place to start. It covers all the basic areas and provides a scaffold off of which more detailed capabilities may be hung.

The framework has been informed by what I have seen and done in a wide range of organisations, but of course it is not necessarily the final word. As always I would be interested in any general feedback and in any suggestions for improvement.
 


 
Notes

 
[1]
 
In passing, Anatomy is due for its second refresh, which will put greater emphasis on Data Science and its role as an indispensable part of a modern Data Function. Watch this space.
 
[2]
 
Though one would hope that a Data Strategy is also visible!
 
[3]
 
Though nowadays you hear “traditional” Analytics and “traditional” Big Data as well (on the latter see Sic Transit Gloria Magnorum Datorum), no doubt “traditional” Machine Learning will be with us at some point, if it isn’t here already.
 
[4]
 
See also Building Momentum – How to begin becoming a Data-driven Organisation.
 
[5]
 
I will be revisiting the idea of a Data Community in coming months, so again watch this space.
 
[6]
 
Most explicitly in my three-part series:

  1. Forming an Information Strategy: Part I – General Strategy
  2. Forming an Information Strategy: Part II – Situational Analysis
  3. Forming an Information Strategy: Part III – Completing the Strategy

 
peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 
 

A Retrospective of 2018’s Articles

A Review of 2018

This is the second year in which I have produced a retrospective of my blogging activity. As in 2017, I have failed miserably in my original objective of posting this early in January. Despite starting to write this piece on 18th December 2018, I have somehow sneaked into the second quarter before getting round to completing it. Maybe I will do better with 2019’s highlights!

Anyway, 2018 was a record-breaking year for peterjamesthomas.com. The site saw more traffic than in any other year since its inception; indeed hits were over a third higher than in any previous year. This increase was driven in part by the launch of my new Maths & Science section, articles from which claimed no fewer than 6 slots in the 2018 top 10 articles, when measured by hits [1]. Overall the total number of articles and new pages I published exceeded 2017’s figures to claim the second spot behind 2009; our first year in business.

As with every year, some of my work was viewed by tens of thousands of people, while other pieces received less attention. This is my selection of the articles that I enjoyed writing most, which does not always overlap with the most popular ones. Given the advent of the Maths & Science section, there are now seven categories into which I have split articles. These are as follows:

  1. General Data Articles
  2. Data Visualisation
  3. Statistics & Data Science
  4. CDO perspectives
  5. Programme Advice
  6. Analytics & Big Data
  7. Maths & Science

In each category, I will pick out one or two pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.

 
 
General Data Articles
 
A Brief History of Databases
 
February
A Brief History of Databases
An infographic spanning the history of Database technology from its early days in the 1960s to the landscape in the late 2010s..
 
Data Strategy Alarm Bell
 
July
How to Spot a Flawed Data Strategy
What alarm bells might alert you to problems with your Data Strategy; based on the author’s extensive experience of both developing Data Strategies and vetting existing ones.
 
Just the facts...
 
August
Fact-based Decision-making
Fact-based decision-making sounds like a no brainer, but just how hard is it to generate accurate facts?
 
 
Data Visualisation
 
Comparative Pie Charts
 
August
As Nice as Pie
A review of the humble Pie Chart, what it is good at, where it presents problems and some alternatives.
 
 
Statistics & Data Science
 
Data Science Challenges – It’s Deja Vu all over again!
 
August
Data Science Challenges – It’s Deja Vu all over again!
A survey of more than 10,000 Data Scientists highlights a set of problems that will seem very, very familiar to anyone working in the data space for a few years.
 
 
CDO Perspectives
 
The CDO Dilemma
 
February
The CDO – A Dilemma or The Next Big Thing?
Two Forbes articles argue different perspectives about the role of Chief Data Officer. The first (by Lauren deLisa Coleman) stresses its importance, the second (by Randy Bean) highlights some of the challenges that CDOs face.
 
2018 CDO Interviews
 
May onwards
The “In-depth” series of CDO interviews
Rather than a single article, this is a series of four talks with prominent CDOs, reflecting on the role and its challenges.
 
The Chief Marketing Officer and the CDO – A Modern Fable
 
October
The Chief Marketing Officer and the CDO – A Modern Fable
Discussing an alt-facts / “fake” news perspective on the Chief Data Officer role.
 
 
Programme Advice
 
Building Momentum
 
June
Building Momentum – How to begin becoming a Data-driven Organisation
Many companies want to become data driven, but getting started on the journey towards this goal can be tough. This article offers a framework for building momentum in the early stages of a Data Programme.
 
 
Analytics & Big Data
 
Enterprise Data Marketplace
 
January
Draining the Swamp
A review of some of the problems that can beset Data Lakes, together with some ideas about what to do to fix these from Dan Woods (Forbes), Paul Barth (Podium Data) and Dave Wells (Eckerson Group).
 
Sic Transit Gloria Mundi
 
February
Sic Transit Gloria Magnorum Datorum
In a world where the word has developed a very negative connotation, what’s so bad about being traditional?
 
Convergent Evolution of Data Architectures
 
August
Convergent Evolution
What the similarities (and differences) between Ichthyosaurs and Dolphins can tell us about different types of Data Architectures.
 
 
Maths & Science
 
Euler's Number
 
March
Euler’s Number
A long and winding road with the destination being what is probably the most important number in Mathematics.
 The Irrational Ratio  
August
The Irrational Ratio
The number π is surrounded by a fog of misunderstanding and even mysticism. This article seeks to address some common misconceptions about π, to show that in many ways it is just like any other number, but also to demonstrate some of its less common properties.
 
Emmy Noether
 
October
Glimpses of Symmetry, Chapter 24 – Emmy
One of the more recent chapters in my forthcoming book on Group Theory and Particle Physics. This focuses on the seminal contributions of Mathematician Emmy Noether to the fundamentals of Physics and the connection between Symmetry and Conservation Laws.

 
Notes

 
[1]
 

The 2018 Top Ten by Hits
1. The Irrational Ratio
2. A Brief History of Databases
3. Euler’s Number
4. The Data and Analytics Dictionary
5. The Equation
6. A Brief Taxonomy of Numbers
7. When I’m 65
8. How to Spot a Flawed Data Strategy
9. Building Momentum – How to begin becoming a Data-driven Organisation
10. The Anatomy of a Data Function – Part I

 
peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.