# An expanded and more mobile-friendly version of the Data & Analytics Dictionary

A revised and expanded version of the peterjamesthomas.com Data and Analytics Dictionary has been published.

The previous Dictionary was not the easiest to read on mobile devices. Because of this, the layout has been amended in this release and the mobile experience should now be greatly enhanced. Any feedback on usability would be welcome.

The new Dictionary includes 22 additional definitions, bringing the total number of entries to 220, totalling well over twenty thousand words. As usual, the new definitions range across the data arena: from Data Science and Machine Learning; to Information and Reporting; to Data Governance and Controls. They are as follows:

Please remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

If you would like to contribute a definition, which will of course be acknowledged, you can use the comments section here, or the dedicated form, we look forward to hearing from you [1].

The Data & Analytics Dictionary will continue to be expanded in coming months.

Notes

 [1] Please note that any submissions will be subject to editorial review and are not guaranteed to be accepted.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

# The Song Jane [Doe, CEO] Likes

Note: This article was originally intended to be posted on 1st April 2020, but was delayed. I decided to share it now rather than waiting another year.

In my last post, we met Jane Doe, CEO. This article forms part of her further adventures [1]. The material may seem eerily familiar to anyone who – like me – has a pre-teen daughter.

The figures show red in the ledger of dread
Not a bright spot anywhere
A storm of profit warnings
And I’m tearing out my hair
Our shares are tumbling like the rain I hear outside
Can’t find out why, our accountants tried

We have no facts that we can trust
Much more of this and we’ll go bust
Oh what a bind, I have been blind
Well now I see…

CDO, CDO, they’ll know just what to do
CDO, CDO, they’ll help us to get through
I don’t care what I have to pay
Get us data now
I know that there’s got to be a better way

Can be really quite profound
And good quality of data
Can help us turn around

I heard machines can help us learn
And governance can have its turn
We need some stats to make things right
Insight!

CDO, CDO, can’t find me one anywhere
CDO, CDO, does no one really care?
Help us to create gold from clay
Get us data now…

Our CDO has helped us to work out a plan
With custom dashboards, every woman, every man
I slice our numbers now just like a piece of cake
We built a warehouse first, now for a data lake

CDO, CDO, we’re getting right back on track
CDO, CDO, our numbers are turning black
We all know that come what may
Got our data now
I knew that there had to be a better way…

 With apologies to the Dave Matthews Band (for the title); Robert Lopez, Kristen Anderson-Lopez and Christophe Beck (for the music); Chris Buck and Jennifer Lee (for the inspiration); Idina Menzel (I’m just so sorry Idina); and anyone else who knows me [2].

Notes

 [1] Eat your heart out Arthur Conan-Doyle. [2] Plus The Disney Corporation for [hopefully] not suing.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

# This Structure has Novel Features which are of Considerable Business Interest

A skilled practitioner, hard at work developing elements of a Structured Reporting Framework
© Jennifer Thomas Photographyview full photo.

 For anyone who is unaware, the title of the article echoes a 1953 Nature paper [1], which was instead “of considerable biological interest” [2]

Introduction

I have been very much focussing on the start of a data journey in a series of recent articles about Data Strategy [3]. Here I shift attention to a later stage of the journey [4] and attempt to answer the question: “How best to deliver information to an organisation in a manner that will encourage people to use it?” There are several activities that need to come together here: requirements gathering that is centred on teasing out the critical business questions to be answered, data repository [5] design and the overall approach to education and communication [6]. However, I want to focus on a further pillar in the edifice of easily accessible and comprehensible Insight and Information, a Structured Reporting Framework.

In my experience, Structured Reporting Frameworks are much misunderstood. It is sometimes assumed that they are a shiny, expensive and inconsequential trinket. Some espouse the opinion that the term is synonymous with Dashboards. Others claim that that immense effort is required to create one. I have even heard people suggesting that good training materials are an alternative to such a framework. In actual fact, for a greenfield site, a Structured Reporting Framework should mostly be a byproduct of taking a best practice approach to delivering data capabilities. Even for brownfield sites, layering at least a decent approximation to a Structured Reporting Framework over existing data assets should not be a prohibitively lengthy or costly exercise if approached in the right way.

But I am getting ahead of myself, what exactly is a Structured Reporting Framework? Let’s answer this question by telling a story, well actually two stories…

The New Job

Chapter One
In which we are introduced to Jane and she makes a surprising discovery.

Jane woke up. It was good to be alive. The sun was shining, the birds were singing and she had achieved one of her lifetime goals only three brief months earlier. Yes Jane was now the Chief Executive Officer of a major organisation: Jane Doe, CEO – how that ran off the tongue. Today was going to be a good day. Later she kissed her husband and one-year-old goodbye: “have a lovely day with Daddy, little boy!”, parcelled her six-year-old into the car and dropped her off at school, before heading into work. It was early January and, on the drive in, Jane thought about the poor accountants who had had a truncated Christmas break while they wrestled the annual accounts into submission. She must remember to write an email thanking them all for their hard work. As she swept into the staff car park and slotted into the closest bay to the entrance – that phrase again: “Jane Doe, CEO” in shiny black letters above her space – she felt a warm glow of pride and satisfaction.

Jane sunk into the padded leather chair in her spacious corner office, flipped open her MacBook Air and saw a note from her CFO. As she clicked, thoughts of pleasant meetings with investors crossed her mind. Thoughts of basking in the sort of market-beating results that the company had always posted. And then she read the mail…

… unprecedented deterioration in sales …

… many customers switched to a competitor …

… prices collapsed precipitously …

… costs escalated in Q4, the reasons are unclear …

… unexpected increase in bad debts …

… massive loss …

… capital erosion …

… issues are likely to continue and maybe increase …

… if nothing changes, potential bankruptcy …

… sorry Jane, nobody saw this coming!

Shaken, Jane wondered whether at least one person had seen this coming, her predecessor as CEO who had been so keen to take early retirement. Was there some insight as to the state of the business that he had been privy to and hidden from his fellow executives? There had been no sign, but maybe his gut had told him that bad things were coming.

Pushing such unhelpful thoughts aside, Jane began to ask herself more practical questions. How was she going to face the investors, and the employees? What was she going to do? And, she decided most pertinent of all, what exactly just happened and why?

In an Alternative Reality

Chapter One′
In which we have already met Jane and there are precious few surprises.

 Jane did some stuff before arriving at work which I won’t bore the reader with unnecessarily again. Cut to Jane opening an email from her CFO…

… it’s not great, profit is down 10% …

… but our customer retention strategy is starting to work …

… we have been able to set a floor on prices …

… the early Q4 blip in expenses is now under control …

… I’m still worried about The Netherlands …

… but we are doing better than the competition …

… at least we saw this coming last year and acted!

Jane opened up her personal dashboard, which already showed the headline figures the CFO had been citing. She clicked a filter and the display changed to show the Netherlands operations. Still glancing at the charts and numbers, she dialled Amsterdam.

“Hi Luuk, I hope you had a good break.”

“Good Luuk, good thank you. How about you catch me up on how things are going?”

“Of course Jane, let me pull up the numbers… Now we both know that the turnaround has been poorer here than elsewhere. Let me show you what we think is the issue and explain what we are doing. If you can split the profit and loss figures by product first and order by ascending profit.”

“OK Luuk, I’ve done that.”

“Great. Now it’s obvious that a chunk of the losses, indeed virtually all of them, are to do with our Widget Q range. I’m sure you knew that anyway, but now let’s focus on Widget Q and break it down by territory. It’s pretty clear that the Rotterdam area is where we have a problem.”

“I see that Luuk, I did some work on these numbers myself over the weekend. What else can you tell me?”

“Well, hopefully I can provide some local colour Jane. Let’s look at the actual sales and then filter these by channel. Do you see what I see?”

“I do Luuk, what is driving this problem in sales via franchises?”

“Well, in my review of November, I mentioned a start-up competitor in the Widget Q sector. If you recall, they had launched an app for franchises which helps them to run their businesses and also makes it easy to order Widget Q equivalents from their catalogue. Well, I must admit that I didn’t envisage it having this level of impact. But at least we can see what is happening.

The app is damaging us, but it’s still early days and I believe we have a narrow window within which we can respond. When I discussed these same figures with my sales team earlier, they came up with what I think is a sound strategy to counterpunch.

Let me take you through what they suggested and link it back to these figures…”

The call with Luuk had assured Jane that the Netherlands would soon be back on track. She reflected that it was going to be tough to present the annual report to investors, but at least the early warning systems had worked. She had begun to see the problems start to build up in her previous role as EVP of UK and Ireland, not only in her figures, but in those of her counterparts around the world. Jane and her predecessor had jointly developed an evidence-based plan to address the emerging threats. The old CEO had retired, secure in the knowledge that Jane had the tools to manage what otherwise might have become a crisis. He also knew that, with Jane’s help, he had acted early and acted decisively.

Jane thought about how clear discussions about unambiguous figures had helped to implement the defensive strategy, calibrate it for local markets and allowed her and her team to track progress. She could only imagine what things would have been like if everybody was not using the same figures to flag potential problems, diagnose them, come up with solutions and test that the response was working. She shuddered to think how differently things might have gone without these tools…

The lie through which we tell the truth [7]

I know, I know! Don’t worry, I’m not going to give up my day job and instead focus on writing the next great British novel [8]. Equally I have no plans to author a scientific paper on Schrödinger’s Profitability, no matter how tempting. It may burst the bubble of those who have been marvelling at the depth of my creative skills, but in fact neither of the above stories are really entirely fictional. Instead they are based on my first hand experience of how access to timely, accurate and pertinent information and insight can be the difference between organisational failure and organisational success. The way that Jane and her old boss were able to identify issues and formulate a strategic response is a characteristic of a Structured Reporting Framework. The way that Jane and Luuk were able to discuss identical figures and to drill into the detail behind them is another such characteristic. Structured Reporting Frameworks are about making sure that everyone in an organisation uses the same figures and ensuring that these figures are easy to find and easy to understand.

To show how this works, let’s consider a schematic [9]:

A Structured Reporting Framework leads people logically and seamlessly from a high-level perspective of performance to more granular information exposing what factors are driving this performance. This functionality is canonically delivered by a series of tailored dashboards, each supported by lower-level dashboards, analysis facilities and reports (the last of which should be limited in number).

Busy Executives and Managers have their information needs best served via visual exhibits that are focussed on their areas of priority and highlight things that are of specific concern to them. Some charts or tables may be replicated across a number of dashboards, but others with be specific to a particular area of the business. If further attention is necessary (e.g. an indicator turns red) dashboard users should have the ability to investigate the causes themselves, if necessary drilling through to detailed transactional information. Symmetrically, more junior staff, engaged in the day-to-day operation of the organisation, need up-to-date (often real-time) information relating to their area, but may also need to set this within a broader business context. This means accessing more general exhibits. For example moving from a list of recent transactions to an historical perspective of the last two years.

Importantly, when a CEO like Jane Doe drills through from their dashboard all the way to a list report this would be the identical report with the identical figures as used by front-line staff day-to-day. When Jane picks up the ‘phone to ask a question of someone, regardless of whether they are a Country Manager, or an operations person, the figures that both see will be the same.

When not accessed from dashboards, reports and analysis facilities should be grouped into a simple menu hierarchy that allows users to navigate with ease and find what they need without having to trail through 30 reports, each with cryptic titles. As mentioned above, there should be a limited number of highly functional / customisable reports and analysis facilities, each of whose purpose is crystal clear.

The way that this consistency of figures is achieved is by all elements of the Structured Reporting Framework drawing their data from the same data repositories. In a modern Data Architecture, this tends to mean two repositories, an Analytical one delivering insight and an Operational one delivering information; these would obviously be linked to each other as well.

Banishing some Misconceptions

I started by saying that some people make the mistake of thinking that a Structured Reporting Framework is an optional extra in a modern data landscape. In fact is is the crucial final link between an organisation’s data and the people who need to use it. In many ways how people experience data capabilities will be determined by this final link. Without paying attention to this, your shiny warehouse or data lake will be a technological curiosity, not an indispensable business tool. When the sadly common refrain of “we built state-of-the-art data capabilities, why is noone using them?” is heard, the lack of a Structured Reporting Framework is often the root cause of poor user adoption.

When building a data architecture from scratch, elements of your data repository should be so aligned with business needs that overlaying them with a Structured Reporting Framework should be a relatively easy task. But even an older and more fragmented data landscape can be improved at minimal cost by better organising current reports into more user-friendly menus [10] and by introducing some dashboards as alternative access points to them. Work is clearly required to do this, which might include some tweaks to the underlying repositories, but this is does not normally require re-writing all reports again from scratch. Such work can be approached pragmatically and incrementally, perhaps revamping reports for a given function, such as sales, before moving on to the next area. This way business value is also drip fed to the organisation.

I hope that this article will encourage some people to look at the idea of Structured Reporting Frameworks again. My experience is that attention paid to this concept can reap great returns at costs that can be much lower than you might expect.

It is worth thinking hard about which version of Jane Doe, CEO you want to be: the one in the dark reacting too late to events, or the one benefiting from the illumination provided by a Structured Reporting Framework.

If you would like to learn more about the impact that a Structured Reporting Framework can have on your organisation, or want to understand how to implement one, then you can get in contact via the form provided. You can also speak to us on +44 (0) 20 8895 6826.

Notes

 [1] WATSON, J., CRICK, F. Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid. Nature 171, 737–738 (1953). [2] From what I have gleaned from those who knew (know in Watson’s case) the pair, neither was (is) the most modest of men. I therefore ascribe this not insubstantial understatement to either the editors at Nature or common-all-garden litotes. [3] All of which are handily collected into our Data Strategy Hub. [4] Though not necessarily much later if you adopt an incremental approach to the delivery of Data Capabilities. [5] Be that Curated Data Lake or Conformed Data Warehouse. [6] See the Cultural Transformation section of my repository of Keynote Articles. [7] Albert Camus, referring to fiction in L’Étranger. [8] I still have my work cut out to finish my factual book, Glimpses of Symmetry. [9] This is a simplified version of one that I use in my own data consulting work. [10] Ideally rationalising and standardising look and feel and terminology at the same time.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

# Data Strategy in a Working from Home Climate

The author, working from home together with his Executive Assistant
© Jennifer Thomas Photographyview full photo.

When I occasionally re-read articles I penned back in 2009 or 2010, I’m often struck that – no matter how many things have undeniably changed over the intervening years in the data arena – there are some seemingly eternal verities. For example, it’s never just about the technology and indeed it’s seldom even predominantly about the technology [1]. True then, true now. These articles have a certain timeless quality to them. This is not that sort of article…

This is an article written at a certain time and in certain circumstances. I fervently hope that it rapidly becomes an anachronism.

Here and now, in late March 2020, many of us are adjusting to working from home for the first time on an extended basis [2]. In talking to friends and associates, this can be a difficult transition. Humans are inherently social animals and limiting our social interactions can be injurious to mental health. Fortunately, people also seem to be coming up with creative ways to stay in touch and the array of tools at our disposal to do this has never been greater.

In this piece, I wanted to talk about my first experience of extended home working. In my last article, Data Strategy Creation – A Roadmap, I hopefully gave some sense of the complexities involved in developing a commercially focussed Data Strategy. Well my task while home working for the first time was to do just that!

Back then, I ended up being successful without the benefit of more modern communications facilities, which is hopefully a helpful to learn for people today. Not only can you get by when working from home, you can take on some types of complicated work and do it well.

To provide some more colour, let’s go back to 2007 / 2008. Even in the midst of what was then obviously the Dark Ages, we did have email and even the Internet, but to be honest the “revolutionary” technology I used most often was well over a hundred years old at that point, let me introduce you to it…

First some context. I had successfully developed and then executed a Data Strategy for the European operations of a leading Global General Insurer. This work had played a pivotal role in returning the organisation to profitability following record losses. On the back of this, I was promoted to also be accountable for Data across the organisation’s businesses in Asia / Pacific, Canada and Latin America. The span of my new responsibilities is shown further down the page.

My first task was to develop an International Data Strategy. As per the framework that I began to develop as part of this assignment, I needed to speak to a lot of people, both business and technical. I needed to understand what was different about Insurance Markets as diverse as China and Brasil. I needed to understand a systems and data landscape spanning five continents. And – importantly – I needed to establish and then build on personal relationships with a lot of different people from different cultures in different locations. There was also the minor issue of time zones to be dealt with. Then I like a challenge.

My transition to home working in this role was not driven by the type of deadly pathogen that we currently face, but by more quotidian considerations. The work I was initially doing primarily related to the activities tagged as 1.3 Business Interviews and 1.5 Technical Staff Discussions in my Data Strategy framework. Relative to this, I found that I was speaking to Singapore (where there was a team of data developers as well as several stakeholders) at 6am or even earlier; seguing to my continuing European responsibilities not long after; then had Latin and North America come on stream in the afternoon, going through until late; and sometimes picking things up with Australia or Asia Pacific locations at 11pm or midnight.

I was often writing up notes straight after meetings, or comparing them to previous ones looking for commonalities and teasing out themes. Because of this, there was not a lot of time for a commute to central London. Equally as I was on the ‘phone or email all of the time, there was little need for me to go into the office. So working from home became my “new normal”.

I could of course go into the office if I wanted to. It was also possible to go out and have a meal with my wife. Finally, I was not worried about getting sick or this happening to my family. So things were not so difficult as they are today. It did however take me some time to adjust to these different arrangements. One thing I learnt was that I couldn’t work solidly every day from 6am to midnight – an amazing revelation I realise. Given the extended nature of my day, I had to build breaks in.

As I was working well in excess of my contractual hours, if a gap opened up in my day, I would do things like cycle to Regent’s Park and do laps of the outer circle. A major activity for me at the time was rock climbing and so I would take a break and work out on a training aid called a fingerboard, which we had two of (see above). Trying to hang from this by two fingers tended to clear the mind.

To return to the work, there were elements of this that blunted any feeling of isolation. Of course I ran my notes past the people interviewed, a second point of interaction. I also found a handful of people in each territory who were very positive about driving change through enhanced information. With these I held ongoing chats, discussing the views of their colleagues, contrasting these to those of other people around the organisation, sharing preliminary findings and nascent ideas for moving forward, getting their feedback on all of this. As well as helping me to have sounding boards for my ideas and getting alternative input, this was also great for building relationships; something that is harder over the ‘phone, but – as I found – far from impossible. However, it did require effort and, importantly, that effort needed to be sustained.

Something I thankfully figured out quite early was that email was not enough. Even with busy people on the other side of the world, perhaps particularly with busy people on the other side of the world, it was worth arranging time to talk. Despite the efficiency and convenience of email, I made a point of also speaking and of religiously rearranging any chats that fell through. Sometimes I wanted to just drop a colleague an email, but I tried to resist the temptation. In retrospect I think this approach helped a lot – on both ends of the ‘phone line.

Over time my work gradually shifted from gathering data to analysing and synthesising it, that is figuring out what the elements of the Data Strategy should be, for example current and future states. This is never an abrupt change, you start to analyse as soon as you have done a handful of interviews, but this work ramps up as more and more interviews are ticked off. Another thing that I found was, rather than sending people a whole slide deck and inviting comments, sharing one slide / exhibit at a time worked better. That way you assemble a deck out of agreed components and also have further opportunities for telephone interaction. If you are careful enough with structuring your individual slides, then the overall story can take care of itself, or at least require minimal “connective tissue” to be woven around it.

I won’t go into every aspect of this Data Strategy work, as the point of this article is instead to focus on the working from home element. However it is worth summarising the eventual number of interviews I held and documented:

I have no idea how many ‘phone calls that equated to, but it must have been an awful lot. Most of these were carried out over the initial three-month period, with a few stragglers picked up later in parallel with rounding out the Data Strategy. The whole exercise consumed a little under six months, with the back of the work broken in slightly more than four.

In closing, the Data Strategy I developed was adopted and the international data architecture that was later rolled out remains in place today; a testament to both the work done by the development teams, but also – I hope – to my vision. Since then, I have carried out a number of similar international exercises, though not always with the working from home component. I found the lessons I learned in that initial period invaluable. For example, I use many of the the approaches I developed in 2007 / 2008 in my work today as well.

So the closing message is that things are obviously far from normal right now. But – as challenging as working from home may seem – it is possible to be productive and also to lift your eyes above keeping the business running in order to contemplate more complicated transformation activities. I hope that this knowledge is of some help to those grappling, as I did years ago, with “the new normal”.

If – despite current circumstances – you need to develop a Data Strategy and would like some help, then please get in contact via the form provided. You can also speak to us on +44 (0) 20 8895 6826.

Notes

 [1] To see why, consider reading A bad workman blames his [Business Intelligence] tools. [2] I of course appreciate that many people do not have this option due to the type of work that they do. I also appreciate that many unfortunate people will have no work in current circumstances.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

# Data Strategy Creation – A Roadmap

Data Strategy creation is one of the main pieces of work that I have been engaged in over the last decade [1]. In my last article, Measuring Maturity, I wrote about Data Maturity and how this relates to both Data Strategy and a Data Capability Review. Here I wanted to step back and look at the big picture.

The exhibit above is one that I use to chart my work in Data Strategy development [2]. An obvious thing to say upfront is that this is not a trivial exercise to embark on. There are many different interrelated activities, each one of which requires experience and expertise in both what makes businesses tick and the types of Data-related capabilities and organisation designs that can better support this. These need to be woven together to form the fabric of a Data Strategy and to deliver several other more detailed supporting documents, such as Data Roadmaps, or Cost / Benefit Analyses.

I tend to often tag Data Strategy with the adjective “commercial” and commercial awareness is for me what makes the difference between a Data Technology Strategy and a true Data Strategy. The latter has to be imbued with real commercial benefits being delivered.

Several of the activities in the diagram are looked at in greater detail in my trilogy on strategy development that starts with Forming an Information Strategy: Part I – General Strategy. I have also added some new areas to my approach since writing these articles back in 2014. As previously trailed, I will be penning a more comprehensive piece on Data Strategies in coming months.

I find Data Strategy creation a very rewarding process. Turning this into Data Capabilities that add business value is even more stimulating.

Having helped 10 organisations to develop their Data Strategies, the above activities are second nature to me. There is also a logical flow (mostly from left to right) and the various elements come together like the plot of a well-written book to yield the actual Data Strategy on the far right.

However I can appreciate that the complexity and reach of a Data Strategy exercise may seem rather daunting to someone looking at the area for the first time. In response to such a feeling, I’d suggest taking a leaf out of what used to be my main leisure activity, rock climbing [3]. I am a pretty experienced rock climber, but if I wanted to get into some unfamiliar aspect of the sport – say Alpinism – then I would make sure to hire a guide; someone whose experience and expertise I could rely upon and from whom I could also learn.

In my opinion, Data Strategy is an area in which such a guide is also indispensable.

It was rightly pointed out by one of my associates, Andrew Willimott, that the above roadmap above does not explicitly reference Business Strategy. This is an very important point.

Here is an excerpt from some comments I made on this subject on Quora only the other day:

A sound commercially-focussed Data Strategy must be tailored to a specific organisation, the markets they operate in, the products or services they sell, the competitive landscape, their current Data Capabilities and – most importantly – their overarching business strategy.

I had this area implicitly covered by a combination of 1.2 Documentation Review and 1.3 Business Interviews, but I agree that the connection should be more explicit. The diagrams have now been revised accordingly with thanks to Andrew.

If you would like to better understand any aspect of the Data Strategy creation process, then please get in contact via the form provided. You can also speak to us on +44 (0) 20 8895 6826.

Notes

 [1] Often followed on by then helping to get the execution of the Data Strategy going. [2] I also use cut-down versions to play back progress to clients. [3] For example see A bad workman blames his [Business Intelligence] tools.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

# Measuring Maturity

The author, engaged in measuring maturity – © Jennifer Thomas Photographyview full photo.

In the thirteen years that have passed since the beginning of 2007, I have helped ten organisations to develop commercially-focused Data Strategies [1]. I last wrote about the process of creating a Data Strategy back in 2014 and – with the many changes that the field has seen since then – am overdue publishing an update, so watch this space [2]. However, in this initial article, I wanted to to focus on one tool that I have used as part of my Data Strategy engagements; a Data Maturity Model.

A key element of developing any type of strategy is knowing where you are now and the pros and cons associated with this. I used to talk about carrying out a Situational Analysis of Data Capabilities, nowadays I am more likely to refer to a Data Capability Review. I make such reviews with respect to my own Data Capability Framework, which I introduced to the public in 2019 via A Simple Data Capability Framework.

Typically I break each of the areas appearing in boxes above into sub-areas, score the organisation against these, roll the results back up and present them back to the client with accompanying commentary; normally also including some sort of benchmark for comparison [3].

A Data Maturity Model is simply one way of presenting the outcome of a Data Capability Review; it has the nice feature of also pointing the way to the future. Such a model presents a series of states into which an organisation may fall with respect to its data. These are generally arranged in order, with the least beneficial state at the bottom and the most beneficial at the top. Data Maturity Models often adopt visual metaphors like ladders, or curves arching upwards, or – as I do myself – a flight of stairs. All of these metaphors – not so subtly – suggest ascending to a high state of being.

Here is the Data Maturity Model that I use:

The various levels of Data Maturity appear on the left, ranging from Disorder to Advanced and graded – in a way reminiscent of exams – between the lowest score of E and the highest of A. To the right of the diagram is the aforementioned “staircase”. Each “step” describes attributes of an organisation with the given level of Data Maturity. Here there is an explicit connection to the Data Capability Framework. The six numbered areas that appear in the Framework also appear in each “step” of the Model (and are listed in the Key); together with a brief description of the state of each Data Capability at the given level of Data Maturity. Obviously things improve as you climb up the “stairs”.

Of course organisations may be at a more advanced stage with respect to Data Controls than they are with Analytics. Equally one division or geographic territory might be at a different level with its Information than another. Nevertheless I generally find it useful to place an entire organisation somewhere on the flight of stairs, leaving a more detailed assessment to the actual Data Capability Review; such an approach tends to also resonate with clients.

So, supposing a given organisation is at level “D – Emergent”, an obvious question is where should it aspire to be instead? In my experience, not all organisations need to be at level “A – Advanced”. It may be that a solid “B – Basic” (or perhaps B+ splitting the difference) is a better target. Much as Einstein may have said that everything should be as simple as possible, but no simpler [4], Data Maturity should be as great as necessary, but no greater; over-engineering has been the downfall of many a Data Transformation Programme.

Of course, while I attempt to introduce some scientific rigour and consistency into both my Data Capability Reviews and the resulting Data Maturity Assessments, there is also an element of judgement to be applied; in many ways it is this judgement that I am actually paid to provide. When opining on an organisations state, I tend to lay the groundwork by first playing back what its employees say about this area (including the Executives that I am typically presenting my findings to). Most typically my own findings are fairly in line with what the average person says, but perhaps in general a bit less positive. Given my extensive work implementaing modern Data Architectures that deliver positive commercial outcomes, this is not a surprising state of affairs.

If a hypothetical organisation is at level “D – Emergent”, then the Model’s description of the next level up, “C – Transitional”, can provide strong pointers as to some of the activities that need to be undertaken in order to ratchet up Data Maturity one notch. The same goes for if more of a stepped-change to say, “B – Basic” is required. Initial ideas for improvement can be further buttressed by more granular Data Capability Review findings. The two areas should be mutually reinforcing.

One thing that I have found very useful is to revisit the area of Data Maturity after, for example, a year working on the area. If the organisation has scaled another step, or is at least embarked on the climb and making progress, this can be evidence of the success of the approach I have recommended and can also have a motivational effect.

As with many things, where you are with respect to Data Maturity is probably less important than your direction of travel.

If you would like to learn more about Data Maturity Models, or want to better understand how mature the data capabilities of your organisation are, then please get in touch, via the form provided. You can also speak to us on +44 (0) 20 8895 6826.

Notes

 [1] In case you were wondering, much of the rest of the time has been spent executing these Data Strategies, or at least getting the execution in motion. Having said that, I also did a lot of other stuff as per: Experience at different Organisations. You can read about some of this work in our Case Studies section. [2] The first such article is Data Strategy Creation – A Roadmap. [3] I’ll be covering this area in greater detail in the forthcoming article I mentioned in the introductory paragraph. [4] There is actually very significant doubt that he actually ever uttered or wrote those words. However, in 1933, he did deliver a lecture which touched on similar themes. The closest that the great man came to saying the words attributed to him was: It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience. “On the Method of Theoretical Physics” the Herbert Spencer Lecture, Oxford, June 10, 1933.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

# Put our Knowledge and Writing Skills to Work for you

As well as consultancy, research and interim work, peterjamesthomas.com Ltd. helps organisations in a number of other ways. The recently launched Data Strategy Review Service is just one example.

Another service we provide is writing White Papers for clients. Sometimes the labels of these are white [1] as well as the paper. Sometimes Peter James Thomas is featured as the author. White Papers can be based on themes arising from articles published here, they can feature findings from de novo research commissioned in the data arena, or they can be on a topic specifically requested by the client.

Seattle-based Data Consultancy, Neal Analytics, is an organisation we have worked with on a number of projects and whose experience and expertise dovetails well with our own. They recently commissioned a White Paper expanding on our 2018 article, Building Momentum – How to begin becoming a Data-driven Organisation. The resulting paper, The Path to Data-Driven, has just been published on Neal Analytics’ site (they have a lot of other interesting content, which I would recommend checking out):

If you find the articles published on this site interesting and relevant to your work, then perhaps – like Neal Analytics – you would consider commissioning us to write a White Paper or some other document. If so, please just get in contact. We have a degree of flexibility on the commercial side and will most likely be able to come up with an approach that fits within your budget. Although we are based in the UK, commissions – like Neal Analytics’s one – from organisations based in other countries are welcome.

Notes

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

# A Picture Paints a Thousand Numbers

Introduction

The recent update of The Data & Analytics Dictionary featured an entry on Charts. Entries in The Dictionary are intended to be relatively brief [1] and also the layout does not allow for many illustrations. Given this, I have used The Dictionary entries as a basis for this slightly expanded article on the subject of chart types.

A Chart is a way to organise and Visualise Data with the general objective of making it easier to understand and – in particular – to discern trends and relationships. This article will cover some of the most frequently used Chart types, which appear in alphabetical order.

 Note:   Here an “axis” is a fixed reference line (sometimes invisible for stylistic reasons) which typically goes vertically up the page or horizontally from left to right across the page (but see also Radar Charts). Categories and values (see below) are plotted on axes. Most charts have two axes. Throughout I use the word “category” to refer to something discrete that is plotted on an axis, for example France, Germany, Italy and The UK, or 2016, 2017, 2018 and 2019. I use the word “value” to refer to something more continuous plotted on an axis, such as sales or number of items etc. With a few exceptions, the Charts described below plot values against categories. Both Bubble Charts and Scatter Charts plot values against other values. I use “series” to mean sets of categories and values. So if the categories are France, Germany, Italy and The UK; and the values are sales; then different series may pertain to sales of different products by country.

Index

Bar & Column Charts
Clustered Bar Charts, Stacked Bar Charts

Bar Charts is the generic term, but this is sometimes reserved for charts where the categories appear on the vertical axis, with Column Charts being those where categories appear on the horizontal axis. In either case, the chart has a series of categories along one axis. Extending righwards (or upwards) from each category is a rectangle whose width (height) is proportional to the value associated with this category. For example if the categories related to products, then the size of rectangle appearing against Product A might be proportional to the number sold, or the value of such sales.

|  © JMB (2014)  |  Used under a Creative Commons licence  |

The exhibit above, which is excerpted from Data Visualisation – A Scientific Treatment, is a compound one in which two bar charts feature prominently.

Sometimes the bars are clustered to allow multiple series to be charted side-by-side, for example yearly sales for 2015 to 2018 might appear against each product category. Or – as above – sales for Product A and Product B may both be shown by country.

Another approach is to stack bars or columns on top of each other, something that is sometimes useful when comparing how the make-up of something has changed.

Bubble Charts

Bubble Charts are used to display three dimensions of data on a two dimensional chart. A circle is placed with its centre at a value on the horizontal and vertical axes according to the first two dimensions of data, but then then the area (or less commonly the diameter [2]) of the circle reflects the third dimension. The result is reminiscent of a glass of champagne (then maybe this says more about the author than anything else).

You can also use bubble charts in a quite visceral way, as exemplified by the chart above. The vertical axis plots the number of satellites of the four giant planets in the Solar System. The horizontal axis plots the closest that they ever come to the Sun. The size of the planets themselves is proportional to their relative sizes.

Cartograms

There does not seem to be a generally accepted definition of Cartograms. Some authorities describe them as any diagram using a map to display statistical data; I cover this type of general chart in Map Charts below. Instead I will define a Cartogram more narrowly as a geographic map where areas of map sections are changed to be proportional to some other value; resulting in a distorted map. So, in a map of Europe, the size of countries might be increased or decreased so that their new areas are proportional to each country’s GDP.

Alternatively the above cartogram of the United States has been distorted (and coloured) to emphasise the population of each state. The dark blue of California and the slightly less dark blues of Texas, Florida and New York dominate the map.

Histograms

A type of Bar Chart (typically with categories along the horizontal axis) where the categories are bins (or buckets) and the bars are proportional to the number of items falling into a bin. For example, the bins might be ranges of ages, say 0 to 19, 20 to 39, 30 to 49 and 50+ and the bars appearing against each might be the UK female population falling into each bin.

The diagram above is a bipartite quasi-histogram [3] that I created to illustrate another article. It is not a true histogram as it shows percentages for and against in each bin rather than overall frequencies.

In the same article, I addressed this shortcoming with a second view of the same data, which is more histogram-like (apart from having a total category) and appears above. The point that I was making related to how Data Visualisation can both inform and mislead depending on the presentational choices taken.

Line Charts
Fan Charts, Area Charts

These typically have categories across the horizontal axis and could be considered as a set of line segments joining up the tops of what would be the rectangles on a Bar Chart. Clearly multiple lines, associated with multiple series, can be plotted simultaneously without the need to cluster rectangles as is required with Bar Charts. Lines can also be used to join up the points on Scatter Charts assuming that these are sufficiently well ordered to support this.

Adaptations of Line Charts can also be used to show the probability of uncertain future events as per the exhibit above. The single red line shows the actual value of some metric up to the middle section of the chart. Thereafter it is the central prediction of a range of possible values. Lying above and below it are shaded areas which show bands of probability. For example it may be that the probability of the actual value falling within the area that has the darkest shading is 50%. A further example is contained in Limitations of Business Intelligence. Such charts are sometimes called Fan Charts.

Another type of Line Chart is the Area Chart. If we can think of a regular Line Chart as linking the tops of an invisible Bar Chart, then an Area Chart links the tops of an invisible Stacked Bar Chart. The effect is that how a band expands and contracts as we move across the chart shows how the contribution this category makes to the whole changes over time (or whatever other category we choose for the horizontal axis).

See also: The first exhibit in New Thinking, Old Thinking and a Fairytale

Map Charts

These place data on top of geographic maps. If we consider the canonical example of a map of the US divided into states, then the degree of shading of each state could be proportional to some state-related data (e.g. average income quartile of residents). Or more simply, figures could appear against each state. Bubbles could be placed at the location of major cities (or maybe a bubble per country or state etc.) with their size relating to some aspect of the locale (e.g.population). An example of this approach might be a map of US states with their relative populations denoted by Bubble area.

Also data could be overlaid on a map, for example – as shown above – coloured bands corresponding to different intensities of rainfall in different areas. This exhibit is excerpted from Hurricanes and Data Visualisation: Part I – Rainbow’s Gravity.

Pie Charts

These circular charts normally display a single series of categories with values, showing the proportion each category contributes to the total. For example a series might be the nations that make up the United Kingdom and their populations: England 55.62 million people, Scotland 5.43 million, Wales 3.13 million and Northern Ireland 1.87 million.

The whole circle represents the total of all the category values (e.g. the UK population of 66.05 million people [4]). The ratio of a segment’s angle to 360° (i.e. the whole circle) is equal to the percentage of the total represented by the linked category’s value (e.g. Scotland is 8.2% of the UK population and so will have a segment with an angle of just under 30°).

Sometimes – as illustrated above – the segments are “exploded”away from each other. This is taken from the same article as the other voting analysis exhibits.

See also: As Nice as Pie, which examines the pros and cons of this type of chart in some depth.

Radar Charts are used to plot one or more series of categories with values that fall into the same range. If there are six categories, then each has its own axis called a radius and the six of these radiate at equal angles from a central point. The calibration of each radial axis is the same. For example Radar Charts are often used to show ratings (say from 5 = Excellent to 1 = Poor) so each radius will have five points on it, typically with low ratings at the centre and high ones at the periphery. Lines join the values plotted on each adjacent radius, forming a jagged loop. Where more than one series is plotted, the relative scores can be easily compared. A sense of aggregate ratings can also be garnered by seeing how much of the plot of one series lies inside or outside of another.

I use Radar Charts myself extensively when assessing organisations’ data capabilities. The above exhibit shows how an organisation ranks in five areas relating to Data Architecture compared to the best in their industry sector [5].

Scatter Charts

In most of the cases we have dealt with to date, one axis has contained discrete categories and the other continuous values (though our rating example for the Radar Chart) had discrete categories and values). For a Scatter Chart both axes plot values, either continuous or discrete. A series would consist of a set of pairs of values, one to plotted on the horizontal axis and one to be plotted on the vertical axis. For example a series might be a number of pairs of midday temperature (to be plotted on the horizontal axis) and sales of ice cream (to be plotted on the vertical axis). As may be deduced from the example, often the intention is to establish a link between the pairs of values – do ice cream sales increase with temperature? This aspect can be highlighted by drawing a line of best fit on the chart; one that minimises the total distance between each plotted point and the line. Further series, say sales of coffee versus midday temperature can be added.

Here is a further example, which illustrates potential correlation between two sets of data, one on the x-axis and the other on the y-axis:

As always a note of caution must be introduced when looking to establish correlations using scatter graphs. The inimitable Randall Munroe of xkcd.com [7] explains this pithility as follows:

|  © Randall Munroe, xkcd.com (2009)  |  Excerpted from: Extrapolating  |

Tree Maps

Tree Maps require a little bit of explanation. The best way to understand them is to start with something more familiar, a hierarchy diagram with three levels (i.e. something like an organisation chart). Consider a cafe that sells beverages, so we have a top level box labeled Beverages. The Beverages box splits into Hot Beverages and Cold Beverages at level 2. At level 3, Hot Beverages splits into Tea, Coffee, Herbal Tea and Hot Chocolate; Cold Beverages splits into Still Water, Sparkling Water, Juices and Soda. So there is one box at level 1, two at level 2 and eight at level 3. As ever a picture paints a thousand words:

Next let’s also label each of the boxes with the value of sales in the last week. If you add up the sales for Tea, Coffee, Herbal Tea and Hot Chocolate we obviously get the sales for Hot Beverages.

A Tree Map takes this idea and expands on it. A Tree Map using the data from our example above might look like this:

First, instead of being linked by lines, boxes at level 3 (leaves let’s say) appear within their parent box at level 2 (branches maybe) and the level 2 boxes appear within the overall level 1 box (the whole tree); so everything is nested. Sometimes, as is the case above, rather than having the level 2 boxes drawn explicitly, the level 3 boxes might be colour coded. So above Tea, Coffee, Herbal Tea and Hot Chocolate are mid-grey and the rest are dark grey.

Next, the size of each box (at whatever level) is proportional to the value associated with it. In our example, 66.7% of sales ($\frac{1000}{1500}$) are of Hot Beverages. Then two-thirds of the Beverages box will be filled with the Hot Beverages box and one-third ($\frac{500}{1500}$) with the Cold Beverage box. If 20% of Cold Beverages sales ($\frac{100}{500}$) are Still Water, then the Still Water box will fill one fifth of the Cold Beverages box (or one fifteenth – $\frac{100}{1500}$ – of the top level Beverages box).

It is probably obvious from the above, but it is non-trivial to find a layout that has all the boxes at the right size, particularly if you want to do something else, like have the size of boxes increase from left to right. This is a task generally best left to some software to figure out.

In Closing

The above review of various chart types is not intended to be exhaustive. For example, it doesn’t include Waterfall Charts [8], Stock Market Charts (or Open / High / Low / Close Charts [9]), or 3D Surface Charts [10] (which seldom are of much utility outside of Science and Engineering in my experience). There are also a number of other more recherché charts that may be useful in certain niche areas. However, I hope we have covered some of the more common types of charts and provided some helpful background on both their construction and usage.

Notes

 [1] Certainly by my normal standards! [2] Research suggests that humans are more attuned to comparing areas of circles than say their diameters. [3] © peterjamesthomas.com Ltd. (2019). [4] Excluding overseas territories. [5] This has been suitably redacted of course. Typically there are four other such exhibits in my assessment pack: Data Strategy, Data Organisation, MI & Analytics and Data Controls, together with a summary radar chart across all five lower level ones. [6] The atmospheric CO2 records were sourced from the US National Oceanographic and Atmospheric Administration’s Earth System Research Laboratory and relate to concentrations measured at their Mauna Loa station in Hawaii. The Global Average Surface Temperature records were sourced from the Earth Policy Institute, based on data from NASA’s Goddard Institute for Space Studies and relate to measurements from the latter’s Global Historical Climatology Network. This exhibit is meant to be a basic illustration of how a scatter chart can be used to compare two sets of data. Obviously actual climatological research requires a somewhat more rigorous approach than the simplistic one I have employed here. [7] Randall’s drawings are used (with permission) liberally throughout this site,Including: [8] Waterfall Chart – Wikipedia. [9] Open-High-Low-Close Chart – Wikipedia. [10] Surface Chart – AnyCharts.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

# The peterjamesthomas.com Data Strategy Hub

Today we launch a new on-line resource, The Data Strategy Hub. This presents some of the most popular Data Strategy articles on this site and will expand in coming weeks to also include links to articles and other resources pertaining to Data Strategy from around the Internet.

If you have an article you have written, or one that you read and found helpful, please post a link in a comment here or in the actual Data Strategy Hub and I will consider adding it to the list.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

# Data Visualisation according to a Four-year-old

When I recently published the latest edition of The Data & Analytics Dictionary, I included an entry on Charts which briefly covered a number of the most frequently used ones. Given that entries in the Dictionary are relatively brief [1] and that its layout allows little room for illustrations, I decided to write an expanded version as an article. This will be published in the next couple of weeks (UPDATE: now published as A Picture Paints a Thousand Numbers).

One of the exhibits that I developed for this charts article was to illustrate the use of Bubble Charts. Given my childhood interest in Astronomy, I came up with the following – somewhat whimsical – exhibit:

Bubble Charts are used to plot three dimensions of data on a two dimensional graph. Here the horizontal axis is how far each of the gas and ice giants is from the Sun [2], the vertical axis is how many satellites each planet has [3] and the final dimension – indicated by the size of the “bubbles” – is the actual size of each planet [4].

Anyway, I thought it was a prettier illustration of the utility of Bubble Charts that the typical market size analysis they are often used to display.

However, while I was doing this, my older daughter wandered into my office and said “look at the picture I drew for you Daddy” [5]. Coincidentally my muse had been her muse and the result is the Data Visualisation appearing at the top of this article. Equally coincidentally, my daughter had also encoded three dimensions of data in her drawing:

1. Rank of distance from the Sun
2. Colour / appearance
3. Number of satellites [6]

She also started off trying to capture relative size. After a great start with Mercury, Venus and Earth, she then ran into some Data Quality issues with the later planets (she is only four).

Here is an annotated version:

I think I’m at least OK at Data Visualisation, but my daughter’s drawing rather knocked mine into a cocked hat [7]. And she included a comet, which makes any Data Visualisation better in my humble opinion; what Chart would not benefit from the inclusion of a comet?

Notes

 [1] For me at least that is. [2] Actually the measurement is the closest that each planet comes to the Sun, its perihelion. [3] This may seem a somewhat arbitrary thing to plot, but a) the exhibit is meant to be illustrative only and b) there does nevertheless seem to be a correlation of sorts; I’m sure there is some Physical reason for this, which I’ll have to look into sometime. [4] Bubble Charts typically offer the option to scale bubbles such that either their radius / diameter or their area is in proportion to the value to be displayed. I chose the equatorial radius as my metric. [5] It has to be said that this is not an atypical occurence. [6] For at least the four rocky planets, it might have taken a while to draw all 79 of Jupiter’s moons. [7] I often check my prose for phrases that may be part of British idiom but not used elsewhere. In doing this, I learnt today that “knock into a cocked hat” was originally an American phrase; it is first found in the 1830s.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.