This Structure has Novel Features which are of Considerable Business Interest

A skilled practitioner, hard at work developing elements of a Structured Reporting FrameworkA skilled practitioner, hard at work developing elements of a Structured Reporting Framework
© Jennifer Thomas Photographyview full photo.
 
  For anyone who is unaware, the title of the article echoes a 1953 Nature paper [1], which was instead “of considerable biological interest” [2]  

 
Introduction

I have been very much focussing on the start of a data journey in a series of recent articles about Data Strategy [3]. Here I shift attention to a later stage of the journey [4] and attempt to answer the question: “How best to deliver information to an organisation in a manner that will encourage people to use it?” There are several activities that need to come together here: requirements gathering that is centred on teasing out the critical business questions to be answered, data repository [5] design and the overall approach to education and communication [6]. However, I want to focus on a further pillar in the edifice of easily accessible and comprehensible Insight and Information, a Structured Reporting Framework.

In my experience, Structured Reporting Frameworks are much misunderstood. It is sometimes assumed that they are a shiny, expensive and inconsequential trinket. Some espouse the opinion that the term is synonymous with Dashboards. Others claim that that immense effort is required to create one. I have even heard people suggesting that good training materials are an alternative to such a framework. In actual fact, for a greenfield site, a Structured Reporting Framework should mostly be a byproduct of taking a best practice approach to delivering data capabilities. Even for brownfield sites, layering at least a decent approximation to a Structured Reporting Framework over existing data assets should not be a prohibitively lengthy or costly exercise if approached in the right way.

But I am getting ahead of myself, what exactly is a Structured Reporting Framework? Let’s answer this question by telling a story, well actually two stories…


 
The New Job

 
Chapter One
In which we are introduced to Jane and she makes a surprising discovery.

Jane woke up. It was good to be alive. The sun was shining, the birds were singing and she had achieved one of her lifetime goals only three brief months earlier. Yes Jane was now the Chief Executive Officer of a major organisation: Jane Doe, CEO – how that ran off the tongue. Today was going to be a good day. Later she kissed her husband and one-year-old goodbye: “have a lovely day with Daddy, little boy!”, parcelled her six-year-old into the car and dropped her off at school, before heading into work. It was early January and, on the drive in, Jane thought about the poor accountants who had had a truncated Christmas break while they wrestled the annual accounts into submission. She must remember to write an email thanking them all for their hard work. As she swept into the staff car park and slotted into the closest bay to the entrance – that phrase again: “Jane Doe, CEO” in shiny black letters above her space – she felt a warm glow of pride and satisfaction.

Jane sunk into the padded leather chair in her spacious corner office, flipped open her MacBook Air and saw a note from her CFO. As she clicked, thoughts of pleasant meetings with investors crossed her mind. Thoughts of basking in the sort of market-beating results that the company had always posted. And then she read the mail…

… unprecedented deterioration in sales …

… many customers switched to a competitor …

… prices collapsed precipitously …

… costs escalated in Q4, the reasons are unclear …

… unexpected increase in bad debts …

… massive loss …

… capital erosion …

… issues are likely to continue and maybe increase …

… if nothing changes, potential bankruptcy …

… sorry Jane, nobody saw this coming!

Shaken, Jane wondered whether at least one person had seen this coming, her predecessor as CEO who had been so keen to take early retirement. Was there some insight as to the state of the business that he had been privy to and hidden from his fellow executives? There had been no sign, but maybe his gut had told him that bad things were coming.

Pushing such unhelpful thoughts aside, Jane began to ask herself more practical questions. How was she going to face the investors, and the employees? What was she going to do? And, she decided most pertinent of all, what exactly just happened and why?


 
In an Alternative Reality

Jane Doe - Happy CEO

 
Chapter One′
In which we have already met Jane and there are precious few surprises.
 
  Jane did some stuff before arriving at work which I won’t bore the reader with unnecessarily again. Cut to Jane opening an email from her CFO…  

… it’s not great, profit is down 10% …

… but our customer retention strategy is starting to work …

… we have been able to set a floor on prices …

… the early Q4 blip in expenses is now under control …

… I’m still worried about The Netherlands …

… but we are doing better than the competition …

… at least we saw this coming last year and acted!

Jane opened up her personal dashboard, which already showed the headline figures the CFO had been citing. She clicked a filter and the display changed to show the Netherlands operations. Still glancing at the charts and numbers, she dialled Amsterdam.

“Hi Luuk, I hope you had a good break.”

“Sure Jane, how about you?”

“Good Luuk, good thank you. How about you catch me up on how things are going?”

“Of course Jane, let me pull up the numbers… Now we both know that the turnaround has been poorer here than elsewhere. Let me show you what we think is the issue and explain what we are doing. If you can split the profit and loss figures by product first and order by ascending profit.”

“OK Luuk, I’ve done that.”

“Great. Now it’s obvious that a chunk of the losses, indeed virtually all of them, are to do with our Widget Q range. I’m sure you knew that anyway, but now let’s focus on Widget Q and break it down by territory. It’s pretty clear that the Rotterdam area is where we have a problem.”

“I see that Luuk, I did some work on these numbers myself over the weekend. What else can you tell me?”

“Well, hopefully I can provide some local colour Jane. Let’s look at the actual sales and then filter these by channel. Do you see what I see?”

“I do Luuk, what is driving this problem in sales via franchises?”

“Well, in my review of November, I mentioned a start-up competitor in the Widget Q sector. If you recall, they had launched an app for franchises which helps them to run their businesses and also makes it easy to order Widget Q equivalents from their catalogue. Well, I must admit that I didn’t envisage it having this level of impact. But at least we can see what is happening.

The app is damaging us, but it’s still early days and I believe we have a narrow window within which we can respond. When I discussed these same figures with my sales team earlier, they came up with what I think is a sound strategy to counterpunch.

Let me take you through what they suggested and link it back to these figures…”

The call with Luuk had assured Jane that the Netherlands would soon be back on track. She reflected that it was going to be tough to present the annual report to investors, but at least the early warning systems had worked. She had begun to see the problems start to build up in her previous role as EVP of UK and Ireland, not only in her figures, but in those of her counterparts around the world. Jane and her predecessor had jointly developed an evidence-based plan to address the emerging threats. The old CEO had retired, secure in the knowledge that Jane had the tools to manage what otherwise might have become a crisis. He also knew that, with Jane’s help, he had acted early and acted decisively.

Jane thought about how clear discussions about unambiguous figures had helped to implement the defensive strategy, calibrate it for local markets and allowed her and her team to track progress. She could only imagine what things would have been like if everybody was not using the same figures to flag potential problems, diagnose them, come up with solutions and test that the response was working. She shuddered to think how differently things might have gone without these tools…


 
The lie through which we tell the truth [7]

Schrödinger's Profitability

I know, I know! Don’t worry, I’m not going to give up my day job and instead focus on writing the next great British novel [8]. Equally I have no plans to author a scientific paper on Schrödinger’s Profitability, no matter how tempting. It may burst the bubble of those who have been marvelling at the depth of my creative skills, but in fact neither of the above stories are really entirely fictional. Instead they are based on my first hand experience of how access to timely, accurate and pertinent information and insight can be the difference between organisational failure and organisational success. The way that Jane and her old boss were able to identify issues and formulate a strategic response is a characteristic of a Structured Reporting Framework. The way that Jane and Luuk were able to discuss identical figures and to drill into the detail behind them is another such characteristic. Structured Reporting Frameworks are about making sure that everyone in an organisation uses the same figures and ensuring that these figures are easy to find and easy to understand.

To show how this works, let’s consider a schematic [9]:

A Structured Reporting Framework leads people logically and seamlessly from a high-level perspective of performance to more granular information exposing what factors are driving this performance. This functionality is canonically delivered by a series of tailored dashboards, each supported by lower-level dashboards, analysis facilities and reports (the last of which should be limited in number).

Busy Executives and Managers have their information needs best served via visual exhibits that are focussed on their areas of priority and highlight things that are of specific concern to them. Some charts or tables may be replicated across a number of dashboards, but others with be specific to a particular area of the business. If further attention is necessary (e.g. an indicator turns red) dashboard users should have the ability to investigate the causes themselves, if necessary drilling through to detailed transactional information. Symmetrically, more junior staff, engaged in the day-to-day operation of the organisation, need up-to-date (often real-time) information relating to their area, but may also need to set this within a broader business context. This means accessing more general exhibits. For example moving from a list of recent transactions to an historical perspective of the last two years.

Importantly, when a CEO like Jane Doe drills through from their dashboard all the way to a list report this would be the identical report with the identical figures as used by front-line staff day-to-day. When Jane picks up the ‘phone to ask a question of someone, regardless of whether they are a Country Manager, or an operations person, the figures that both see will be the same.

When not accessed from dashboards, reports and analysis facilities should be grouped into a simple menu hierarchy that allows users to navigate with ease and find what they need without having to trail through 30 reports, each with cryptic titles. As mentioned above, there should be a limited number of highly functional / customisable reports and analysis facilities, each of whose purpose is crystal clear.

The way that this consistency of figures is achieved is by all elements of the Structured Reporting Framework drawing their data from the same data repositories. In a modern Data Architecture, this tends to mean two repositories, an Analytical one delivering insight and an Operational one delivering information; these would obviously be linked to each other as well.


 
Banishing some Misconceptions

Banishing Misconceptions

I started by saying that some people make the mistake of thinking that a Structured Reporting Framework is an optional extra in a modern data landscape. In fact is is the crucial final link between an organisation’s data and the people who need to use it. In many ways how people experience data capabilities will be determined by this final link. Without paying attention to this, your shiny warehouse or data lake will be a technological curiosity, not an indispensable business tool. When the sadly common refrain of “we built state-of-the-art data capabilities, why is noone using them?” is heard, the lack of a Structured Reporting Framework is often the root cause of poor user adoption.

When building a data architecture from scratch, elements of your data repository should be so aligned with business needs that overlaying them with a Structured Reporting Framework should be a relatively easy task. But even an older and more fragmented data landscape can be improved at minimal cost by better organising current reports into more user-friendly menus [10] and by introducing some dashboards as alternative access points to them. Work is clearly required to do this, which might include some tweaks to the underlying repositories, but this is does not normally require re-writing all reports again from scratch. Such work can be approached pragmatically and incrementally, perhaps revamping reports for a given function, such as sales, before moving on to the next area. This way business value is also drip fed to the organisation.


 
I hope that this article will encourage some people to look at the idea of Structured Reporting Frameworks again. My experience is that attention paid to this concept can reap great returns at costs that can be much lower than you might expect.

It is worth thinking hard about which version of Jane Doe, CEO you want to be: the one in the dark reacting too late to events, or the one benefiting from the illumination provided by a Structured Reporting Framework.


 
If you would like to learn more about the impact that a Structured Reporting Framework can have on your organisation, or want to understand how to implement one, then you can get in contact via the form provided. You can also speak to us on +44 (0) 20 8895 6826.


Notes

 
[1]
 
WATSON, J., CRICK, F. Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid. Nature 171, 737–738 (1953).
 
[2]
 
From what I have gleaned from those who knew (know in Watson’s case) the pair, neither was (is) the most modest of men. I therefore ascribe this not insubstantial understatement to either the editors at Nature or common-all-garden litotes.
 
[3]
 
All of which are handily collected into our Data Strategy Hub.
 
[4]
 
Though not necessarily much later if you adopt an incremental approach to the delivery of Data Capabilities.
 
[5]
 
Be that Curated Data Lake or Conformed Data Warehouse.
 
[6]
 
See the Cultural Transformation section of my repository of Keynote Articles.
 
[7]
 
Albert Camus, referring to fiction in L’Étranger.
 
[8]
 
I still have my work cut out to finish my factual book, Glimpses of Symmetry.
 
[9]
 
This is a simplified version of one that I use in my own data consulting work.
 
[10]
 
Ideally rationalising and standardising look and feel and terminology at the same time.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

The latest edition of The Data & Analytics Dictionary is now out

The Data and Analytics Dictionary

After a hiatus of a few months, the latest version of the peterjamesthomas.com Data and Analytics Dictionary is now available. It includes 30 new definitions, some of which have been contributed by people like Tenny Thomas Soman, George Firican, Scott Taylor and and Taru Väre. Thanks to all of these for their help.

  1. Analysis
  2. Application Programming Interface (API)
  3. Business Glossary (contributor: Tenny Thomas Soman)
  4. Chart (Graph)
  5. Data Architecture – Definition (2)
  6. Data Catalogue
  7. Data Community
  8. Data Domain (contributor: Taru Väre)
  9. Data Enrichment
  10. Data Federation
  11. Data Function
  12. Data Model
  13. Data Operating Model
  14. Data Scrubbing
  15. Data Service
  16. Data Sourcing
  17. Decision Model
  18. Embedded BI / Analytics
  19. Genetic Algorithm
  20. Geospatial Data
  21. Infographic
  22. Insight
  23. Management Information (MI)
  24. Master Data – additional definition (contributor: Scott Taylor)
  25. Optimisation
  26. Reference Data (contributor: George Firican)
  27. Report
  28. Robotic Process Automation
  29. Statistics
  30. Self-service (BI or Analytics)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

If you would like to contribute a definition, which will of course be acknowledged, you can use the comments section here, or the dedicated form, we look forward to hearing from you [1].

If you have found The Data & Analytics Dictionary helpful, we would love to learn more about this. Please post something in the comments section or contact us and we may even look to feature you in a future article.

The Data & Analytics Dictionary will continue to be expanded in coming months.
 


Notes

 
[1]
 
Please note that any submissions will be subject to editorial review and are not guaranteed to be accepted.

peterjamesthomas.com

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

 

More Definitions in the Data and Analytics Dictionary

The Data and Analytics Dictionary

The peterjamesthomas.com Data and Analytics Dictionary is an active document and I will continue to issue revised versions of it periodically. Here are 20 new definitions, including the first from other contributors (thanks Tenny!):

  1. Artificial Intelligence Platform
  2. Data Asset
  3. Data Audit
  4. Data Classification
  5. Data Consistency
  6. Data Controls
  7. Data Curation (contributor: Tenny Thomas Soman)
  8. Data Democratisation
  9. Data Dictionary
  10. Data Engineering
  11. Data Ethics
  12. Data Integrity
  13. Data Lineage
  14. Data Platform
  15. Data Strategy
  16. Data Wrangling (contributor: Tenny Thomas Soman)
  17. Explainable AI (contributor: Tenny Thomas Soman)
  18. Information Governance
  19. Referential Integrity
  20. Testing Data (Training Data)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

People are now also welcome to contribute their own definitions. You can use the comments section here, or the dedicated form. Submissions will be subject to editorial review and are not guaranteed to be accepted.
 


 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

Convergent Evolution

Ichthyosaur and Dolphin

No this article has not escaped from my Maths & Science section, it is actually about data matters. But first of all, channeling Jennifer Aniston [1], “here comes the Science bit – concentrate”.


 
Shared Shapes

The Theory of Common Descent holds that any two organisms, extant or extinct, will have a common ancestor if you roll the clock back far enough. For example, each of fish, amphibians, reptiles and mammals had a common ancestor over 500 million years ago. As shown below, the current organism which is most like this common ancestor is the Lancelet [2].

Chordate Common Ancestor

To bring things closer to home, each of the Great Apes (Orangutans, Gorillas, Chimpanzees, Bonobos and Humans) had a common ancestor around 13 million years ago.

Great Apes Common Ancestor

So far so simple. As one would expect, animals sharing a recent common ancestor would share many attributes with both it and each other.

Convergent Evolution refers to something else. It describes where two organisms independently evolve very similar attributes that were not features of their most recent common ancestor. Thus these features are not inherited, instead evolutionary pressure has led to the same attributes developing twice. An example is probably simpler to understand.

The image at the start of this article is of an Ichthyosaur (top) and Dolphin. It is striking how similar their body shapes are. They also share other characteristics such as live birth of young, tail first. The last Ichthyosaur died around 66 million years ago alongside many other archosaurs, notably the Dinosaurs [3]. Dolphins are happily still with us, but the first toothed whale (not a Dolphin, but probably an ancestor of them) appeared around 30 million years ago. The ancestors of the modern Bottlenose Dolphins appeared a mere 5 million years ago. Thus there is tremendous gap of time between the last Ichthyosaur and the proto-Dolphins. Ichthyosaurs are reptiles, they were covered in small scales [4]. Dolphins are mammals and covered in skin not massively different to our own. The most recent common ancestor of Ichthyosaurs and Dolphins probably lived around quarter of a billion years ago and looked like neither of them. So the shape and other attributes shared by Ichthyosaurs and Dolphins do not come from a common ancestor, they have developed independently (and millions of years apart) as adaptations to similar lifestyles as marine hunters. This is the essence of Convergent Evolution.

That was the Science, here comes the Technology…


 
A Brief Hydrology of Data Lakes

From 2000 to 2015, I had some success [5] with designing and implementing Data Warehouse architectures much like the following:

Data Warehouse Architecture (click to view larger version in a new window)

As a lot of my work then was in Insurance or related fields, the Analytical Repositories tended to be Actuarial Databases and / or Exposure Management Databases, developed in collaboration with such teams. Even back then, these were used for activities such as Analytics, Dashboards, Statistical Modelling, Data Mining and Advanced Visualisation.

Overlapping with the above, from around 2012, I began to get involved in also designing and implementing Big Data Architectures; initially for narrow purposes and later Data Lakes spanning entire enterprises. Of course some architectures featured both paradigms as well.

One of the early promises of a Data Lake approach was that – once all relevant data had been ingested – this would be directly leveraged by Data Scientists to derive insight.

Over time, it became clear that it would be useful to also have some merged / conformed and cleansed data structures in the Data Lake. Once the output of Data Science began to be used to support business decisions, a need arose to consider how it could be audited and both data privacy and information security considerations also came to the fore.

Next, rather than just being the province of Data Scientists, there were moves to use Data Lakes to support general Data Discovery and even business Reporting and Analytics as well. This required additional investments in metadata.

The types of issues with Data Lake adoption that I highlighted in Draining the Swamp earlier this year also led to the advent of techniques such as Data Curation [6]. In parallel, concerns about expensive Data Science resource spending 80% of their time in Data Wrangling [7] led to the creation of a new role, that of Data Engineer. These people take on much of the heavy lifting of consolidating, fixing and enriching datasets, allowing the Data Scientists to focus on Statistical Analysis, Data Mining and Machine Learning.

Big Data Architecture (click to view larger version in a new window)

All of which leads to a modified Big Data / Data Lake architecture, embodying people and processes as well as technology and looking something like the exhibit above.

This is where the observant reader will see the concept of Convergent Evolution playing out in the data arena as well as the Natural World.


 
In Closing

Convergent Evolution of Data Architectures

Lest it be thought that I am saying that Data Warehouses belong to a bygone era, it is probably worth noting that the archosaurs, Ichthyosaurs included, dominated the Earth for orders of magnitude longer that the mammals and were only dethroned by an asymmetric external shock, not any flaw their own finely honed characteristics.

Also, to be crystal clear, much as while there are similarities between Ichthyosaurs and Dolphins there are also clear differences, the same applies to Data Warehouse and Data Lake architectures. When you get into the details, differences between Data Lakes and Data Warehouses do emerge; there are capabilities that each has that are not features of the other. What is undoubtedly true however is that the same procedural and operational considerations that played a part in making some Warehouses seem unwieldy and unresponsive are also beginning to have the same impact on Data Lakes.

If you are in the business of turning raw data into actionable information, then there are inevitably considerations that will apply to any technological solution. The key lesson is that shape of your architecture is going to be pretty similar, regardless of the technical underpinnings.


 
Notes

 
[1]
 
The two of us are constantly mistaken for one another.
 
[2]
 
To be clear the common ancestor was not a Lancelet, rather Lancelets sit on the branch closest to this common ancestor.
 
[3]
 
Ichthyosaurs are not Dinosaurs, but a different branch of ancient reptiles.
 
[4]
 
This is actually a matter of debate in paleontological circles, but recent evidence suggests small scales.
 
[5]
 
See:

 
[6]
 
A term that is unaccountably missing from The Data & Analytics Dictionary – something to add to the next release. UPDATE: Now remedied here.
 
[7]
 
Ditto. UPDATE: Now remedied here

 


From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

 

The revised and expanded Data and Analytics Dictionary

The Data and Analytics Dictionary

Since its launch in August of this year, the peterjamesthomas.com Data and Analytics Dictionary has received a welcome amount of attention with various people on different social media platforms praising its usefulness, particularly as an introduction to the area. A number of people have made helpful suggestions for new entries or improvements to existing ones. I have also been rounding out the content with some more terms relating to each of Data Governance, Big Data and Data Warehousing. As a result, The Dictionary now has over 80 main entries (not including ones that simply refer the reader to another entry, such as Linear Regression, which redirects to Model).

The most recently added entries are as follows:

  1. Anomaly Detection
  2. Behavioural Analytics
  3. Complex Event Processing
  4. Data Discovery
  5. Data Ingestion
  6. Data Integration
  7. Data Migration
  8. Data Modelling
  9. Data Privacy
  10. Data Repository
  11. Data Virtualisation
  12. Deep Learning
  13. Flink
  14. Hive
  15. Information Security
  16. Metadata
  17. Multidimensional Approach
  18. Natural Language Processing (NLP)
  19. On-line Transaction Processing
  20. Operational Data Store (ODS)
  21. Pig
  22. Table
  23. Sentiment Analysis
  24. Text Analytics
  25. View

It is my intention to continue to revise this resource. Adding some more detail about Machine Learning and related areas is probably the next focus.

As ever, ideas for what to include next would be more than welcome (any suggestions used will also be acknowledged).
 


 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

The peterjamesthomas.com Data and Analytics Dictionary

The Data and Analytics Dictionary

I find myself frequently being asked questions around terminology in Data and Analytics and so thought that I would try to define some of the more commonly used phrases and words. My first attempt to do this can be viewed in a new page added to this site (this also appears in the site menu):

The Data and Analytics Dictionary

I plan to keep this up-to-date as the field continues to evolve.

I hope that my efforts to explain some concepts in my main area of specialism are both of interest and utility to readers. Any suggestions for new entries or comments on existing ones are more than welcome.
 

 

Data Visualisation – A Scientific Treatment

Introduction

Diagram of the Causes of Mortality in the Army of the East (click to view a larger version in a new tab)

The above diagram was compiled by Florence Nightingale, who was – according to The Font – “a celebrated English social reformer and statistician, and the founder of modern nursing”. It is gratifying to see her less high-profile role as a number-cruncher acknowledged up-front and central; particularly as she died in 1910, eight years before women in the UK were first allowed to vote and eighteen before universal suffrage. This diagram is one of two which are generally cited in any article on Data Visualisation. The other is Charles Minard’s exhibit detailing the advance on, and retreat from, Moscow of Napoleon Bonaparte’s Grande Armée in 1812 (Data Visualisation had a military genesis in common with – amongst many other things – the internet). I’ll leave the reader to look at this second famous diagram if they want to; it’s just a click away.

While there are more elements of numeric information in Minard’s work (what we would now call measures), there is a differentiating point to be made about Nightingale’s diagram. This is that it was specifically produced to aid members of the British parliament in their understanding of conditions during the Crimean War (1853-56); particularly given that such non-specialists had struggled to understand traditional (and technical) statistical reports. Again, rather remarkably, we have here a scenario where the great and the good were listening to the opinions of someone who was barred from voting on the basis of lacking a Y chromosome. Perhaps more pertinently to this blog, this scenario relates to one of the objectives of modern-day Data Visualisation in business; namely explaining complex issues, which don’t leap off of a page of figures, to busy decision makers, some of whom may not be experts in the specific subject area (another is of course allowing the expert to discern less than obvious patterns in large or complex sets of data). Fortunately most business decision makers don’t have to grapple with the progression in number of “deaths from Preventible or Mitigable Zymotic diseases” versus ”deaths from wounds” over time, but the point remains.
 
 
Data Visualisation in one branch of Science

von Laue, Bragg Senior & Junior, Crowfoot Hodgkin, Kendrew, Perutz, Crick, Franklin, Watson & Wilkins

Coming much more up to date, I wanted to consider a modern example of Data Visualisation. As with Nightingale’s work, this is not business-focused, but contains some elements which should be pertinent to the professional considering the creation of diagrams in a business context. The specific area I will now consider is Structural Biology. For the incognoscenti (no advert for IBM intended!), this area of science is focussed on determining the three-dimensional shape of biologically relevant macro-molecules, most frequently proteins or protein complexes. The history of Structural Biology is intertwined with the development of X-ray crystallography by Max von Laue and father and son team William Henry and William Lawrence Bragg; its subsequent application to organic molecules by a host of pioneers including Dorothy Crowfoot Hodgkin, John Kendrew and Max Perutz; and – of greatest resonance to the general population – Francis Crick, Rosalind Franklin, James Watson and Maurice Wilkins’s joint determination of the structure of DNA in 1953.

photo-51

X-ray diffraction image of the double helix structure of the DNA molecule, taken 1952 by Raymond Gosling, commonly referred to as “Photo 51”, during work by Rosalind Franklin on the structure of DNA

While the masses of data gathered in modern X-ray crystallography needs computer software to extrapolate them to physical structures, things were more accessible in 1953. Indeed, it could be argued that Gosling and Franklin’s famous image, its characteristic “X” suggestive of two helices and thus driving Crick and Watson’s model building, is another notable example of Data Visualisation; at least in the sense of a picture (rather than numbers) suggesting some underlying truth. In this case, the production of Photo 51 led directly to the creation of the even more iconic image below (which was drawn by Francis Crick’s wife Odile and appeared in his and Watson’s seminal Nature paper[1]):

Odile and Francis Crick - structure of DNA

© Nature (1953)
Posted on this site under the non-commercial clause of the right-holder’s licence

It is probably fair to say that the visualisation of data which is displayed above has had something of an impact on humankind in the fifty years since it was first drawn.
 
 
Modern Structural Biology

The X-ray Free Electron Laser at Stanford

Today, X-ray crystallography is one of many tools available to the structural biologist with other approaches including Nuclear Magnetic Resonance Spectroscopy, Electron Microscopy and a range of biophysical techniques which I will not detain the reader by listing. The cutting edge is probably represented by the X-ray Free Electron Laser, a device originally created by repurposing the linear accelerators of the previous generation’s particle physicists. In general Structural Biology has historically sat at an intersection of Physics and Biology.

However, before trips to synchrotrons can be planned, the Structural Biologist often faces the prospect of stabilising their protein of interest, ensuring that they can generate sufficient quantities of it, successfully isolating the protein and finally generating crystals of appropriate quality. This process often consumes years, in some cases decades. As with most forms of human endeavour, there are few short-cuts and the outcome is at least loosely correlated to the amount of time and effort applied (though sadly with no guarantee that hard work will always be rewarded).
 
 
From the general to the specific

The Journal of Molecular Biology (October 2014)

At this point I should declare a personal interest, the example of Data Visualisation which I am going to consider is taken from a paper recently accepted by the Journal of Molecular Biology (JMB) and of which my wife is the first author[2]. Before looking at this exhibit, it’s worth a brief detour to provide some context.

In recent decades, the exponential growth in the breadth and depth of scientific knowledge (plus of course the velocity with which this can be disseminated), coupled with the increase in the range and complexity of techniques and equipment employed, has led to the emergence of specialists. In turn this means that, in a manner analogous to the early production lines, science has become a very collaborative activity; expert in stage one hands over the fruits of their labour to expert in stage two and so on. For this reason the typical scientific paper (and certainly those in Structural Biology) will have several authors, often spread across multiple laboratory groups and frequently in different countries. By way of example the previous paper my wife worked on had 16 authors (including a Nobel Laureate[3]). In this context, the fact the paper I will now reference was authored by just my wife and her group leader is noteworthy.

The reader may at this point be relieved to learn that I am not going to endeavour to explain the subject matter of my wife’s paper, nor the general area of biology to which it pertains (the interested are recommended to Google “membrane proteins” or “G Protein Coupled Receptors” as a starting point). Instead let’s take a look at one of the exhibits.

Click to view a larger version in a new tab

© The Journal of Molecular Biology (2014)
Posted on this site under a Creative Commons licence

The above diagram (in common with Nightingale’s much earlier one) attempts to show a connection between sets of data, rather than just the data itself. I’ll elide the scientific specifics here and focus on more general issues.

First the grey upper section with the darker blots on it – which is labelled (a) – is an image of a biological assay called a Western Blot (for the interested, details can be viewed here); each vertical column (labelled at the top of the diagram) represents a sub-experiment on protein drawn from a specific sample of cells. The vertical position of a blot indicates the size of the molecules found within it (in kilodaltons); the intensity of a given blot indicates how much of the substance is present. Aside from the headings and labels, the upper part of the figure is a photographic image and so essentially analogue data[4]. So, in summary, this upper section represents the findings from one set of experiments.

At the bottom – and labelled (b) – appears an artefact familiar to anyone in business, a bar-graph. This presents results from a parallel experiment on samples of protein from the same cells (for the interested, this set of data relates to degree to which proteins in the samples bind to a specific radiolabelled ligand). The second set of data is taken from what I might refer to as a “counting machine” and is thus essentially digital. To be 100% clear, the bar chart is not a representation of the data in the upper part of the diagram, it pertains to results from a second experiment on the same samples. As indicated by the labelling, for a given sample, the column in the bar chart (b) is aligned with the column in the Western Blot above (a), connecting the two different sets of results.

Taken together the upper and lower sections[5] establish a relationship between the two sets of data. Again I’ll skip on the specifics, but the general point is that while the Western Blot (a) and the binding assay (b) tell us the same story, the Western Blot is a much more straightforward and speedy procedure. The relationship that the paper establishes means that just the Western Blot can be used to perform a simple new assay which will save significant time and effort for people engaged in the determination of the structures of membrane proteins; a valuable new insight. Clearly the relationships that have been inferred could equally have been presented in a tabular form instead and be just as relevant. It is however testament to the more atavistic side of humans that – in common with many relationships between data – a picture says it more surely and (to mix a metaphor) more viscerally. This is the essence of Data Visualisation.
 
 
What learnings can Scientific Data Visualisation provide to Business?

Scientific presentation (c/o Nature, but looks a lot like PhD Comics IMO)

Using the JMB exhibit above, I wanted to now make some more general observations and consider a few questions which arise out of comparing scientific and business approaches to Data Visualisation. I think that many of these points are pertinent to analysis in general.
 
 
Normalisation

Broadly, normalisation[6] consists of defining results in relation to some established yardstick (or set of yardsticks); displaying relative, as opposed to absolute, numbers. In the JMB exhibit above, the amount of protein solubilised in various detergents is shown with reference to the un-solubilised amount found in native membranes; these reference figures appear as 100% columns to the right and left extremes of the diagram.

The most common usage of normalisation in business is growth percentages. Here the fact that London business has grown by 5% can be compared to Copenhagen having grown by 10% despite total London business being 20-times the volume of Copenhagen’s. A related business example, depending on implementation details, could be comparing foreign currency amounts at a fixed exchange rate to remove the impact of currency fluctuation.

Normalised figures are very typical in science, but, aside from the growth example mentioned above, considerably less prevalent in business. In both avenues of human endeavour, the approach should be used with caution; something that increases 200% from a very small starting point may not be relevant, be that the result of an experiment or weekly sales figures. Bearing this in mind, normalisation is often essential when looking to present data of different orders on the same graph[7]; the alternative often being that smaller data is swamped by larger, not always what is desirable.
 
 
Controls

I’ll use an anecdote to illustrate this area from a business perspective. Imagine an organisation which (as you would expect) tracks the volume of sales of a product or service it provides via a number of outlets. Imagine further that it launches some sort of promotion, perhaps valid only for a week, and notices an uptick in these sales. It is extremely tempting to state that the promotion has resulted in increased sales[8].

However this cannot always be stated with certainty. Sales may have increased for some totally unrelated reason such as (depending on what is being sold) good or bad weather, a competitor increasing prices or closing one or more of their comparable outlets and so on. Equally perniciously, the promotion maybe have simply moved sales in time – people may have been going to buy the organisation’s product or service in the weeks following a promotion, but have brought the expenditure forward to take advantage of it. If this is indeed the case, an uptick in sales may well be due to the impact of a promotion, but will be offset by a subsequent decrease.

In science, it is this type of problem that the concept of control tests is designed to combat. As well as testing a result in the presence of substance or condition X, a well-designed scientific experiment will also be carried out in the absence of substance or condition X, the latter being the control. In the JMB exhibit above, the controls appear in the columns with white labels.

There are ways to make the business “experiment” I refer to above more scientific of course. In retail business, the current focus on loyalty cards can help, assuming that these can be associated with the relevant transactions. If the business is on-line then historical records of purchasing behaviour can be similarly referenced. In the above example, the organisation could decide to offer the promotion at only a subset of the its outlets, allowing a comparison to those where no promotion applied. This approach may improve rigour somewhat, but of course it does not cater for purchases transferred from a non-promotion outlet to a promotion one (unless a whole raft of assumptions are made). There are entire industries devoted to helping businesses deal with these rather messy scenarios, but it is probably fair to say that it is normally easier to devise and carry out control tests in science.

The general take away here is that a graph which shows some change in a business output (say sales or profit) correlated to some change in a business input (e.g. a promotion, a new product launch, or a price cut) would carry a lot more weight if it also provided some measure of what would have happened without the change in input (not that this is always easy to measure).
 
 
Rigour and Scrutiny

I mention in the footnotes that the JMB paper in question includes versions of the exhibit presented above for four other membrane proteins, this being in order to firmly establish a connection. Looking at just the figure I have included here, each element of the data presented in the lower bar-graph area is based on duplicated or triplicated tests, with average results (and error bars – see the next section) being shown. When you consider that upwards of three months’ preparatory work could have gone into any of these elements and that a mistake at any stage during this time would have rendered the work useless, some impression of the level of rigour involved emerges. The result of this assiduous work is that the authors can be confident that the exhibits they have developed are accurate and will stand up to external scrutiny. Of course such external scrutiny is a key part of the scientific process and the manuscript of the paper was reviewed extensively by independent experts before being accepted for publication.

In the business world, such external scrutiny tends to apply most frequently to publicly published figures (such as audited Financial Accounts); of course external financial analysts also will look to dig into figures. There may be some internal scrutiny around both the additional numbers used to run the business and the graphical representations of these (and indeed some companies take this area very seriously), but not every internal KPI is vetted the way that the report and accounts are. Particularly in the area of Data Visualisation, there is a tension here. Graphical exhibits can have a lot of impact if they relate to the current situation or present trends; contrawise if they are substantially out-of-date, people may question their relevance. There is sometimes the expectation that a dashboard is just like its aeronautical counterpart, showing real-time information about what is going on now[9]. However a lot of the value of Data Visualisation is not about the here and now so much as trends and explanations of the factors behind the here and now. A well-thought out graph can tell a very powerful story, more powerful for most people than a table of figures. However a striking graph based on poor quality data, data which has been combined in the wrong way, or even – as sometimes happens – the wrong datasets entirely, can tell a very misleading story and lead to the wrong decisions being taken.

I am not for a moment suggesting here that every exhibit produced using Data Visualisation tools must be subject to months of scrutiny. As referenced above, in the hands of an expert such tools have the value of sometimes quickly uncovering hidden themes or factors. However, I would argue that – as in science – if the analyst involved finds something truly striking, an association which he or she feels will really resonate with senior business people, then double- or even triple-checking the data would be advisable. Asking a colleague to run their eye over the findings and to then probe for any obvious mistakes or weaknesses sounds like an appropriate next step. Internal Data Visualisations are never going to be subject to peer-review, however their value in taking sound business decisions will be increased substantially if their production reflects at least some of the rigour and scrutiny which are staples of the scientific method.
 
 
Dealing with Uncertainty

In the previous section I referred to the error bars appearing on the JMB figure above. Error bars are acknowledgements that what is being represented is variable and they indicate the extent of such variability. When dealing with a physical system (be that mechanical or – as in the case above – biological), behaviour is subject to many factors, not all of which can be eliminated or adjusted for and not all of which are predictable. This means that repeating an experiment under ostensibly identical conditions can lead to different results[10]. If the experiment is well-designed and if the experimenter is diligent, then such variability is minimised, but never eliminated. Error bars are a recognition of this fundamental aspect of the universe as we understand it.

While de rigueur in science, error bars seldom make an appearance in business, even – in my experience – in estimates of business measures which emerge from statistical analyses[11]. Even outside the realm of statistically generated figures, more business measures are subject to uncertainty than might initially be thought. An example here might be a comparison (perhaps as part of the externally scrutinised report and accounts) of the current quarter’s sales to the previous one (or the same one last year). In companies where sales may be tied to – for example – the number of outlets, care is paid to making these figures like-for-like. This might include only showing numbers for outlets which were in operation in the prior period and remain in operation now (i.e. excluding sales from both closed outlets or newly opened ones). However, outside the area of high-volume low-value sales where the Law of Large Numbers[12] rules, other factors could substantially skew a given quarter’s results for many organisations. Something as simple as a key customer delaying a purchase (so that it fell in Q3 this year instead of Q2 last) could have a large impact on quarterly comparisons. Again companies will sometimes look to include adjustments to cater for such timing or related issues, but this cannot be a precise process.

The main point I am making here is that many aspects of the information produced in companies is uncertain. The cash transactions in a quarter are of course the cash transactions in a quarter, but the above scenario suggests that they may not always 100% reflect actual business conditions (and you cannot adjust for everything). Equally where you get in to figures that would be part of most companies’ financial results, outstanding receivables and allowance for bad debts, the spectre of uncertainty arises again without a statistical model in sight. In many industries, regulators are pushing for companies to include more forward-looking estimates of future assets and liabilities in their Financials. While this may be a sensible reaction to recent economic crises, the approach inevitably leads to more figures being produced from models. Even when these models are subject to external review, as is the case with most regulatory-focussed ones, they are still models and there will be uncertainty around the numbers that they generate. While companies will often provide a range of estimates for things like guidance on future earnings per share, providing a range of estimates for historical financial exhibits is not really a mainstream activity.

Which perhaps gets me back to the subject of error bars on graphs. In general I think that their presence in Data Visualisations can only add value, not subtract it. In my article entitled Limitations of Business Intelligence I include the following passage which contains an exhibit showing how the Bank of England approaches communicating the uncertainty inevitably associated with its inflation estimates:

Business Intelligence is not a crystal ball, Predictive Analytics is not a crystal ball either. They are extremely useful tools […] but they are not universal panaceas.

The Old Lady of Threadneedle Street is clearly not a witch
An inflation prediction from The Bank of England
Illustrating the fairly obvious fact that uncertainty increases in proportion to time from now.

[…] Statistical models will never give you precise answers to what will happen in the future – a range of outcomes, together with probabilities associated with each is the best you can hope for (see above). Predictive Analytics will not make you prescient, instead it can provide you with useful guidance, so long as you remember it is a prediction, not fact.

While I can’t see them figuring in formal financial statements any time soon, perhaps there is a case for more business Data Visualisations to include error bars.
 
 
In Summary

So, as is often the case, I have embarked on a journey. I started with an early example of Data Visualisation, diverted in to a particular branch of science with which I have some familiarity and hopefully returned, again as is often the case, to make some points which I think are pertinent to both the Business Intelligence practitioner and the consumers (and indeed commissioners) of Data Visualisations. Back in “All that glisters is not gold” – some thoughts on dashboards I made some more general comments about the best Data Visualisations having strong informational foundations underpinning them. While this observation remains true, I do see a lot of value in numerically able and intellectually curious people using Data Visualisation tools to quickly make connections which had not been made before and to tease out patterns from large data sets. In addition there can be great value in using Data Visualisation to present more quotidian information in a more easily digestible manner. However I also think that some of the learnings from science which I have presented in this article suggest that – as with all powerful tools – appropriate discretion on the part of the people generating Data Visualisation exhibits and on the part of the people consuming such content would be prudent. In particular the business equivalents of establishing controls, applying suitable rigour to data generation / combination and including information about uncertainty on exhibits where appropriate are all things which can help make Data Visualisation more honest and thus – at least in my opinion – more valuable.
 


 
Notes

 
[1]
 
Watson, J.D., Crick, F.H.C. (1953). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature.
 
[2]
 
Thomas, J.A., Tate, C.G. (2014). Quality Control in Eukaryotic Membrane Protein Overproduction. J. Mol. Biol. [Epub ahead of print].
 
[3]
 
The list of scientists involved in the development of X-ray Crystallography and Structural Biology which was presented earlier in the text encompasses a further nine such laureates (four of whom worked at my wife’s current research institute), though sadly this number does not include Rosalind Franklin. Over 20 Nobel Prizes have been awarded to people working in the field of Structural Biology, you can view an interactive time line of these here.
 
[4]
 
The intensity, size and position of blots are often digitised by specialist software, but this is an aside for our purposes.
 
[5]
 
Plus four other analogous exhibits which appear in the paper and relate to different proteins.
 
[6]
 
Normalisation has a precise mathematical meaning, actually (somewhat ironically for that most precise of activities) more than one. Here I am using the term more loosely.
 
[7]
 
That’s assuming you don’t want to get into log scales, something I have only come across once in over 25 years in business.
 
[8]
 
The uptick could be as compared to the week before, or to some other week (e.g. the same one last year or last month maybe) or versus an annual weekly average. The change is what is important here, not what the change is with respect to.
 
[9]
 
Of course some element of real-time information is indeed both feasible and desirable; for more analytic work (which encompasses many aspects of Data Visualisation) what is normally more important is sufficient historical data of good enough quality.
 
[10]
 
Anyone interested in some of the reasons for this is directed to my earlier article Patterns patterns everywhere.
 
[11]
 
See my series of three articles on Using historical data to justify BI investments for just one example of these.
 
[12]
 
But then 1=2 for very large values of 1

 

 

Using multiple business intelligence tools in an implementation – Part II

Rather unsurprisingly, this article follows on from: Using multiple business intelligence tools in an implementation – Part I.

On further reflection about this earlier article, I realised that I missed out one important point. This was perhaps implicit in the diagram that I posted (and which I repeat below), but I think that it makes sense for me to make things explicit.

An example of a multi-tier BI architecture with different tools
An example of a multi-tier BI architecture with different tools

The point is that in this architecture with different BI tools in different layers, it remains paramount to have consistency in terminology and behaviour for dimensions and measures. So “Country” and “Profit” must mean the same things in your dashboard as it does in your OLAP cubes. The way that I have achieved this before is to have virtually all of the logic defined in the warehouse itself. Of course some things may need to be calculated “on-the-fly” within the BI tool, in this case care needs to be paid to ensuring consistency.

It has been pointed out that the approach of using the warehouse to drive consistency may circumscribe your ability to fully exploit the functionality of some BI tools. While this is sometimes true, I think it is not just a price worth paying, but a price that it is mandatory to pay. Inconsistency of any kind is the enemy of all BI implementations. If your systems do not have credibility with your users, then all is already lost and no amount of flashy functionality will save you.
 

Using multiple business intelligence tools in an implementation – Part I

linkedin The Data Warehousing Institute The Data Warehousing Institute (TDWI™) 2.0

Introduction

This post follows on from a question that was asked on the LinkedIn.com Data Warehousing Institute (TDWI™) 2.0 group. Unfortunately the original thread is no longer available for whatever reason, but the gist of the question was whether anyone had experience with using a number of BI tools to cover different functions within an implementation. So the scenario might be: Tool A for dashboards, Tool B for OLAP, Tool C for Analytics, Tool D for formatted reports and even Tool E for visualisation.

In my initial response I admitted that I had not faced precisely this situation, but that I had worked with the set-up shown in the following diagram, which I felt was not that dissimilar:

An example of a multi-tier BI architecture with different tools
An example of a multi-tier BI architecture with different tools

Here there is no analytics tool (in the statistical modelling sense – Excel played that role) and no true visualisation (unless you count graphs in PowerPlay that is), but each of dashboards, OLAP cubes, formatted reports and simple list reports are present. The reason that this arrangement might not at first sight appear pertinent to the question asked on LinkedIn.com is that two of the layers (and three of the report technologies) are from one vendor; Cognos at the time, IBM-Cognos now. The reason that I felt that there was some relevance was that the Cognos products were from different major releases. The dashboard tool being from their Version 8 architecture and the OLAP cubes and formatted reports from their Version 7 architecture.
 
 
A little history

London Bridge circa 1600
London Bridge circa 1600

Maybe a note of explanation is necessary as clearly we did not plan to have this slight mismatch of technologies. We initially built out our BI infrastructure without a dashboard layer. Partly this was because dashboards weren’t as much of a hot topic for CEOs when we started. However, I also think it also makes sense to overlay dashboards on an established information architecture (something I cover in my earlier article, “All that glisters is not gold” – some thoughts on dashboards, which is also pertinent to these discussions).

When we started to think about adding icing to our BI cake, ReportStudio in Cognos 8 had just come out and we thought that it made sense to look at this; both to deliver dashboards and to assess its potential future role in our BI implementation. At that point, the initial Cognos 8 version of Analysis Studio wasn’t an attractive upgrade path for existing PowerPlay users and so we wanted to stay on PowerPlay 7.3 for a while longer.

The other thing that I should mention is that we had integrated an in-house developed web-based reporting tool with PowerPlay as the drill down tool. The reasons for this were a) we had already trained 750 users in this tool and it seemed sensible to leverage it and b) employing it meant that we didn’t have to buy an additional Cognos 7 product, such as Impromptu, to support this need. This hopefully explains the mild heterogeneity of our set up. I should probably also say that users could directly access any one of the BI tools to get at information and that they could navigate between them as shown by the arrows in the diagram.

I am sure that things have improved immensely in the Cognos toolset since back then, but at the time there was no truly seamless integration between ReportStudio and PowerPlay as they were on different architectures. This meant that we had to code the passing of parameters between the ReportStudio dashboard and PowerPlay cubes ourselves. Although there were some similarities between the two products, there were also some differences at the time and these, plus the custom integration we had to develop, meant that you could also view the two Cognos products as essentially separate tools. Add in here the additional custom integration of our in-house reporting application with PowerPlay and maybe you can begin to see why I felt that there were some similarities between our implementation and one using different vendors for each tool.

I am going to speak a bit about the benefits and disadvantages of having a single vendor approach later, but for now an obvious question is “did our set-up work?” The answer to this was a resounding yes. Though the IT work behind the scenes was maybe not the most elegant (though everything was eminently supportable), from the users’ perspective things were effectively seamless. To slightly pre-empt a later point, I think that the user experience is what really matters, more than what happens on the IT side of the house. Nevertheless let’s move on from some specifics to some general comments.
 
 
The advantages of a single vendor approach to BI

One-stop shopping
One-stop shopping

I think that it makes sense if I lay my cards on the table up-front. I am a paid up member of the BI standardisation club. I think that you only release the true potential of BI when you take a broad based approach and bring as many areas as you can into your warehouse (see my earlier article, Holistic vs Incremental approaches to BI, for my reasons for believing this).

Within the warehouse itself there should be a standardised approach to dimensions (business entities and the hierarchies they are built into should be the same everywhere – I’m sure this will please all my MDM friends out there) and to measures (what is the point if profitability is defined different ways in different reports?). It is almost clichéd nowadays to speak about “the single version of the truth”, but I have always been a proponent of this approach.

I also think that you should have the minimum number of BI tools. Here however the minimum is not necessarily always one. To misquote one of Württemberg’s most famous sons:

Everything should be made as simple as possible, but no simpler.

What he actually said was:

It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.

but maybe the common rendition is itself paying tribute to the principle that he propounded. Let me pause to cover what are the main reasons quoted for adopting a single vendor approach in BI:

  1. Consistent look-and-feel: The tools will have a common look-and-feel, making it easier for people to use them and simplifying training.
  2. Better interoperability: Interoperability between the tools is out-of-the-box, saving on time and effort in developing and maintaining integration.
  3. Clarity in problem resolution: If something goes wrong with your implementation, you don’t get different vendors blaming each other for the problem.
  4. Simpler upgrades: You future proof your architecture, when one element has a new release, it is the vendor’s job to ensure it works with everything else, not yours.
  5. Less people needed: You don’t need to hire an expert for each different vendor tool, thereby reducing the size and cost of your BI team.
  6. Cheaper licensing: It should be cheaper to buy a bundled solution from one vendor and ongoing maintenance fees should also be less.

This all seems to make perfect sense and each of the above points can be seen to be reducing the complexity and cost of your BI solution. Surely it is a no-brainer to adopt this approach? Well maybe. Let me offer some alternative perspectives on each item – none of these wholly negates the point, but I think it is nevertheless worth considering a different perspective before deciding what is best for your organisation.

  1. Consistent look-and-feel: It is not always 100% true that different tools from the same vendor have the same look-and-feel. This might be down to quality control at the vendor, it might be because the vendor has recently acquired part of their product set and not fully integrated it as yet, or – even more basically – it may be because different tools are intended to do different things. To pick one example from outside of BI that has frustrated me endlessly over the years: PowerPoint and Word seem to have very little in common, even in Office 2007. Hopefully different tools from the same vendor will be able to share the same metadata, but this is not always the case. Some research is probably required here before assuming this point is true. Also, picking up on the Bauhaus ethos of form dictating function, you probably don’t want to have your dashboard looking exactly like your OLAP cubes – it wouldn’t be a dashboard then would it? Additional user training will generally be required for each tier in your BI architecture and a single-vendor approach will at best reduce this somewhat.
  2. Better interoperability: I mention an problem with interoperability of the Cognos toolset above. This is is hopefully now a historical oddity, but I would be amazed if similar issues do not arise at least from time to time with most BI vendors. Cognos itself has now been acquired by IBM and I am sure everyone in the new organisation is doing a fine job of consolidating the product lines, but it would be incredible if there were not some mismatches that occur in the process. Even without acquisitions it is likely that elements of a vendor’s product set get slightly out of alignment from time to time.
  3. Clarity in problem resolution: This is hopefully a valid point, however it probably won’t stop your BI tool vendor from suggesting that it is your web-server software, or network topology, or database version that is causing the issue. Call me cynical if you wish, I prefer to think of myself as a seasoned IT professional!
  4. Simpler upgrades: Again this is also most likely to be a plus point, but problems can occur when only parts of a product set have upgrades. Also you may need to upgrade Tool A to the latest version to address a bug or to deliver desired functionality, but have equally valid reasons for keeping Tool B at the previous release. This can cause problems in a single supplier scenario precisely because the elements are likely to be more tightly coupled with each other, something that you may have a chance of being insulated against if you use tools from different vendors.
  5. Less people needed: While there might be half a point here, I think that this is mostly fallacious. The skills required to build an easy-to-use and impactful dashboard are not the same as building OLAP cubes. It may be that you have flexible and creative people who can do both (I have been thus blessed myself in the past in projects I ran), but this type of person would most likely be equally adept whatever tool they were using. Again there may be some efficiencies in sharing metadata, but it is important not to over-state these. You may well still need a dashboard person and an OLAP person, if you don’t then the person who can do both with probably not care about which vendor provides the tools.
  6. Cheaper licensing: Let’s think about this. How many vendors give you Tool B free when you purchase Tool A? Not many is the answer in my experience, they are commercial entities after all. It may be more economical to purchase bundles of products from a vendor, but also having more than one in the game may be an even better way of ensuring that cost are kept down. This is another area that requires further close examination before deciding what to do.

 
A more important consideration

Overall it is still likely that a single-vendor solution is cheaper than a multi-vendor one, but I hope that I have raised enough points to make you think that this is not guaranteed. Also the cost differential may not be as substantial as might be thought initially. You should certainly explore both approaches and figure out what works best for you. However there is another overriding point to consider here, the one I alluded to earlier; your users. The most important thing is that your users have the best experience and that whatever tools you employ are the ones that will deliver this. If you can do this while sticking to a single vendor then great. However if your users will be better served by different tools in different tiers, then this should be your approach, regardless of whether it makes things a bit more complicated for your team.

Of course there may be some additional costs associated with such an approach, but I doubt that this issue is insuperable. One comparison that it may help to keep in mind is that the per user cost of many BI tools is similar to desktop productivity tools such as Office. The main expense of BI programmes is not the tools that you use to deliver information, but all the work that goes on behind the scenes to ensure that it is the right information, at the right time and with the appropriate degree of accuracy. The big chunks of BI project costs are located in the four pillars that I consistently refer to:

  1. Understand the important business decisions and what figures are necessary to support these.
  2. Understand the data available in the organisation, how it relates to other data and to business decisions.
  3. Transform the data to provide information answering business questions.
  4. Focus on embedding the use of information in the corporate DNA.

The cost of the BI tools themselves are only a minor part of the above (see also, BI implementations are like icebergs). Of course any savings made on tools may make funds available for other parts of the project. It is however important not to cut your nose off to spite your face here. Picking right tools for the job, be they from one vendor or two (or even three at a push) will be much more important to the overall payback of your project than saving a few nickels and dimes by sticking to a one-vendor strategy just for the sake of it.
 


 
Continue reading about this area in: Using multiple business intelligence tools in an implementation – Part II
 

“All that glisters is not gold” – some thoughts on dashboards

Fool's gold

Yesterday I was tweeting quotes from Poe and blogging lines attributed to Heraclitus. Today I’m moving on to Shakespeare. Kudos to anyone posting a comment pointing out the second quote that appears later in the text.
 
 
Introduction

Dashboards are all the rage at present. The basic idea is that they provide a way to quickly see what is happening, without getting lost in a sea of numbers. There are lots of different technologies out there that can help with dashboards. These range from parts of the product suites of all the main BI vendors, through boutique products dedicated to the area, all the way to simply using Java to write your own.

A lot of effort needs to go into how a dashboard is presented. The information really does need to leap off the screen, it is important that it looks professional. People are used to seeing well-designed sites on the web and if your corporate dashboard looks like it is only one step removed from Excel charts, you may have a problem. While engaging a design firm to help craft a dashboard might be overkill, it helps to get some graphic design input. I have been lucky enough over the years to have had people on my teams with experience in this area. They have mostly been hobbyists, but they had enough flair and enough of an aesthetic taste to make a difference.

However, echoing my comments on BI tools in general, I think an attractive looking dashboard is really only the icing on the cake. The cake itself has two main other ingredients:

  1. The actual figures that it presents (and how well they have been chosen) and
  2. The Information Architecture that underpins them

I’ll now consider the importance of these two areas.
 
 
Choosing the KPIs

Filtering out the KPIs

The acronym KPI is bandied about with enormous vigour in the BI community. Sometimes what the ‘K’ stands for can get a bit lost in the cacophony. Stepping back from dashboards for a few minutes, I want to focus on the measures that you have in your general business intelligence applications such as analysis cubes. Things like: sales revenue, units sold, growth, head count, profit and so on.

[Note: If you don’t like BI buzzwords, please feel free to read “figures”, or “numbers” where ever you see “measures”. I may attempt to provide my own definitions of some of these terms in the future as the Wikipedia entries aren’t always that illuminating.]

When you have built a Data Mart for a particular subject area and are looking to develop one or more cubes based on this, you may well have a myriad of measures to select from. In some of the earliest prototype cubes that my teams built, we made the mistake of having too many measures. The same observation equally applied to the number of dimensions (things that you want to slice and dice the measures by, e.g. geography, line of business, product, customer etc.). Having too many measures and dimensions led to a cube that was cumbersome, difficult to navigate and where the business purpose was less that crystal clear. These are all cardinal sins, but the last is the worst as I have referred to elsewhere. The clear objective is to cut down on both the figures and the business attributes that you want to look at them by. We set a rule (which we did break a couple of times for specialist applications) of generally having no more than ten measures and ten dimensions in a cube and ideally having less.

Well this all sounds great, the problem – and the reason for this diversion away from dashboards – is which measures do you keep and which do you drop. Here there is no real alternative to lots of discussions with business partners, building multiple prototypes to test out different combinations and, ultimately, accepting that you might make some mis-steps in your first release and need to revisit the area after it has been “shaken down” by real business use. I won’t delve into this particular process any deeper now. Suffice it to say that choosing which measures to include in a cube it is both an area that is important to get right and one in which it is all to easy to make mistakes.

So, retuning to our main discussion, if picking measures at the level of an analysis cube is hard, just how hard is it to pick KPIs for a dashboard. I recall a conversation with the CEO of a large organisation in which he basically told me to just pick the six most important figure and put them on a dashboard (with the clear implication that sooner would be rather better than later). After I had explained that the view of the CEO in this area was of paramount importance and that his input on which figures to use would be very valuable, we began to talk about what should be in and what should be out. After a period of going round in circles, I at least managed to convey the fact that this was not a trivial decision.

What you want with the KPIs on a dashboard is that they are genuinely key and that you can actually tell something from graphing them. The exercise in determining which figures to use and how to present them was a lengthy one, but very worthwhile. You need to rigorously apply the “so what?” test – what action will people take based on the trends and indicators that are presented to them. In the end we went for simplicity, with a focus on growth.

There was a map showing how each country was doing against plan; colour-coded red, amber and green according to their results. There were graphs comparing revenue to budget by month and the cumulative position and there was a break-down by business unit. The only to elements of interaction were to filter for a region or country and a business unit or line of business. Any further analysis required pulling up an underlying cube (actually we integrated the cube with the dashboard so that context was maintained moving from one to the other – this was not so easy as the dashboard and cube tools, while from the same vendor, were on two different major release numbers).

There were many iterations of the dashboard, but the one we eventually went live with received general acclaim. I’m not sure what we could have done differently to shorten the process.
 
 
Where does the data come from?

A dashboard without an underlying Information Architecture
A dashboard without an underlying Information Architecture

The same range of dashboard tools that I mention in the introduction are of course mostly capable of sourcing their data from pretty much anywhere. If the goal is to build a dashboard, then maybe it is tempting to do this as quickly as possible, based on whatever data sources are to hand (as in the diagram above). This is probably the quickest way to produce a dashboard, but it is unlikely to produce something that is used much, tells people anything useful, or adds any value. Why do I say this?

Well the problem with this approach is that all you are doing is reflecting what is likely to be a somewhat fragmented (and maybe even chaotic) set of information tools. Out of your sources, is there a unique place to go to get a definitive value for measure A? Do the various different sources hold data in the same way and calculate values using the same formulae? Do sources overlap (either duplicating data, or function), if so, which ones do you use? Do different sources get refreshed with the same frequency and do they treat currency the same way? Are customers and products defined consistently everywhere?

A dashboad underpinned by a proper Information Architecture
A dashboad underpinned by a proper Information Architecture

Leaving issues like these unresolved is a sure way to perpetuate a poor state of information. They are best addressed by establishing a wider information architecture (a simplified diagram of which appears above). I am not going to go into all of the benefits of such an approach, if readers would like more information, then please browse through the rest of this blog and the links to other resources that it contains (maybe this post would be a good place to start). What I will state is that a dashboard will only add value if it is part of an overall consistent approach to information, something that best practice indicates requires an Information Architecture. Anything else is simply going to be a pretty picture, signifying nothing.
 
 
Summary

So my advice to those seeking to build their first dashboard has three parts. First of all, keep it simple and identify a small group of measures and dimensions, which are highly pertinent to the core of the business and susceptible to graphical presentation. Second, dashboards are not a short-cut to management information Nirvana, they only really work when they are the final layer in a proper approach to information that spans all areas of the organisation. Finally, and partly driven by the first two observations, if you are in charge of building a dashboard, make sure that the plans you draw up reflect the complexity of the task and that you manage expectations accordingly.