The need for collaboration between teams using the same data in different ways

7 Dec 201416 Jan 2017 Peter James Thomas big data, business analytics, business intelligence, business intelligence competency centres, data science, data warehousing, it business alignment, Physics actuarial, collaboration, insurance, LinkedIn, neil raden, quantum entanglement, tdwi

This article is based on conversations that took place recently on the TDWI LinkedIn Group^[1].

The title of the discussion thread posted was “Business Intelligence vs. Business Analytics: What’s the Difference?” and the original poster was Jon Dohner from Information Builders. To me the thread topic is something of an old chestnut and takes me back to the heady days of early 2009. Back then, Big Data was maybe a lot more than just a twinkle in Doug Cutting and Mike Cafarella‘s eyes, but it had yet to rise to its current level of media ubiquity.

Nostalgia is not going to be enough for me to start quoting from my various articles of the time^[2] and neither am I going to comment on the pros and cons of Information Builders’ toolset. Instead I am more interested in a different turn that discussions took based on some comments posted by Peter Birksmith of Insurance Australia Group.

Peter talked about two streams of work being carried out on the same source data. These are Business Intelligence (BI) and Information Analytics (IA). I’ll let Peter explain more himself:

BI only produces reports based on data sources that have been transformed to the requirements of the Business and loaded into a presentation layer. These reports present KPI’s and Business Metrics as well as paper-centric layouts for consumption. Analysis is done via Cubes and DQ although this analysis is being replaced by IA.

[…]

IA does not produce a traditional report in the BI sense, rather, the reporting is on Trends and predictions based on raw data from the source. The idea in IA is to acquire all data in its raw form and then analysis this data to build the foundation KPI and Metrics but are not the actual Business Metrics (If that makes sense). This information is then passed back to BI to transform and generate the KPI Business report.

I was interested in the dual streams that Peter referred to and, given that I have some experience of insurance organisations and how they work, penned the following reply^[3]:

Hi Peter,

I think you are suggesting an organisational and technology framework where the source data bifurcates and goes through two parallel processes and two different “departments”. On one side, there is a more traditional, structured, controlled and rules-based transformation; probably as the result of collaborative efforts of a number of people, maybe majoring on the technical side – let’s call it ETL World. On the other a more fluid, analytical (in the original sense – the adjective is much misused) and less controlled (NB I’m not necessarily using this term pejoratively) transformation; probably with greater emphasis on the skills and insights of individuals (though probably as part of a team) who have specific business knowledge and who are familiar with statistical techniques pertinent to the domain – let’s call this ~ETL World, just to be clear :-).

You seem to be talking about the two of these streams constructively interfering with each other (I have been thinking about X-ray Crystallography recently). So insights and transformations (maybe down to either pseudo-code or even code) from ~ETL World influence and may be adopted wholesale by ETL World.

I would equally assume that, if ETL World‘s denizens are any good at their job, structures, datasets and master data which they create (perhaps early in the process before things get multidimensional) may make work more productive for the ~ETLers. So it should be a collaborative exercise with both groups focused on the same goal of adding value to the organisation.

If I have this right (an assumption I realise) then it all seems very familiar. Given we both have Insurance experience, this sounds like how a good information-focused IT team would interact with Actuarial or Exposure teams. When I have built successful information architectures in insurance, in parallel with delivering robust, reconciled, easy-to-use information to staff in all departments and all levels, I have also created, maintained and extended databases for the use of these more statistically-focused staff (the ~ETLers).

These databases, which tend to be based on raw data have become more useful as structures from the main IT stream (ETL World) have been applied to these detailed repositories. This might include joining key tables so that analysts don’t have to repeat this themselves every time, doing some basic data cleansing, or standardising business entities so that different data can be more easily combined. You are of course right that insights from ~ETL World often influence the direction of ETL World as well. Indeed often such insights will need to move to ETL World (and be produced regularly and in a manner consistent with existing information) before they get deployed to the wider field.

It is sort of like a research team and a development team, but where both “sides” do research and both do development, but in complementary areas (reminiscent of a pair of entangled electrons in a singlet state, each of whose spin is both up and down until they resolve into one up and one down in specific circumstances – sorry again I did say “no more science analogies”). Of course, once more, this only works if there is good collaboration and both ETLers and ~ETLers are focussed on the same corporate objectives.

So I suppose I’m saying that I don’t think – at least in Insurance – that this is a new trend. I can recall working this way as far back as 2000. However, what you describe is not a bad way to work, assuming that the collaboration that I mention is how the teams work.

I am aware that I must have said “collaboration” 20 times – your earlier reference to “silos” does however point to a potential flaw in such arrangements.

Peter

PS I talk more about interactions with actuarial teams in: BI and a different type of outsourcing

PPS For another perspective on this area, maybe see comments by @neilraden in his 2012 article What is a Data Scientist and what isn’t?

I think that the perspective of actuaries having been data scientists long before the latter term emerged is a sound one.

Although the genesis of this thread dates to over five years ago (an aeon in terms of information technology), I think that – in the current world where some aspects of the old divide between technically savvy users^[4] and IT staff with strong business knowledge^[5] has begun to disappear – there is both an opportunity for businesses and a threat. If silos develop and the skills of a range of different people are not combined effectively, then we have a situation where:

| ETL World | + | ~ETL World | < | ETL World ∪ ~ETL World |

If instead collaboration, transparency and teamwork govern interactions between different sets of people then the equation flips to become:

| ETL World | + | ~ETL World | ≥ | ETL World ∪ ~ETL World |

Perhaps the way that Actuarial and IT departments work together in enlightened insurance companies points the way to a general solution for the organisational dynamics of modern information provision. Maybe also the, by now somewhat venerable, concept of a Business Intelligence Competency Centre, a unified team combining the best and brightest from many fields, is an idea whose time has come.

Notes

^[1]	A link to the actual discussion thread is provided here. However You need to be a member of the TDWI Group to view this.
^[2]	Anyone interested in ancient history is welcome to take a look at the following articles from a few years back: Business Analytics vs Business Intelligence A business intelligence parable The Dictatorship of the Analysts
^[3]	I have mildly edited the text from its original form and added some new links and new images to provide context.
^[4]	Particularly those with a background in quantitative methods – what we now call data scientists
^[5]	Many of whom seem equally keen to also call themselves data scientists

Follow @peterjthomas

Forming an Information Strategy: Part II – Situational Analysis

1 Dec 201415 Dec 2014 Peter James Thomas business intelligence, data strategy, data visualisation, it business alignment, strategy information strategy

Forming an Information Strategy
I – General Strategy	II – Situational Analysis	III – Completing the Strategy

Maybe we could do with some better information, but how to go about getting it? Hmm...

This article is the second of three which address how to formulate an Information Strategy. I have written a number of other articles which touch on this subject^[1] and have also spoken about the topic^[2]. However I realised that I had never posted an in-depth review of this important area. This series of articles seeks to remedy this omission.

The first article, Part I – General Strategy, explored the nature of strategy, laid some foundations and presented a framework of questions which will need to be answered in order to formulate any general strategy. This chapter, Part II – Situational Analysis, explains how to adapt the first element of this general framework – The Situational Analysis – to creating an Information Strategy. In Part I, I likened formulating an Information Strategy to a journey, Part III – Completing the Strategy sees us reaching the destination by working through the rest of the general framework and showing how this can be used to produce a fully-formed Information Strategy.

As with all of my other articles, this essay is not intended as a recipe for success, a set of instructions which – if slavishly followed – will guarantee the desired outcome. Instead the reader is invited to view the following as a set of observations based on what I have learnt during a career in which the development of both Information Strategies and technology strategies in general have played a major role.

A Recap of the Strategic Framework

LCP

I closed Part I of this series by presenting a set of questions, the answers to which will facilitate the formation of any strategy. These have a geographic / journey theme and are as follows:

Where are we?
Where do we want to be instead and why?
How do we get there, how long will it take and what will it cost?
Will the trip be worth it?
What else can we do along the way?

In this article I will focus on how to answer the first question, Where are we? This is the province of a Situational Analysis. I will now move on from general strategy and begin to be specific about how to develop a Situational Analysis in the context of an overall Information Strategy.

But first a caveat: if the last article was prose-heavy, this one is question-heavy; the reader is warned!

Where are we? The anatomy of an Information Strategy’s Situational Analysis

The unfashionable end of the western spiral arm of the Galaxy

If we take this question and, instead of aiming to plot our celestial coordinates, look to consider what it would mean in the context of an Information Strategy, then a number of further questions arise. Here are just a few examples of the types of questions that the strategist should investigate, broken down into five areas:

Business-focussed questions

What do business people use current information to do?
In their opinion, is current information adequate for this task and if not in what ways is it inadequate?
Are there any things that business people would like to do with information, but where the figures don’t exist or are not easily accessible?
How reliable and trusted is existing information, is it complete, accurate and suitably up-to-date?
If there are gaps in information provision, what are these and what is the impact of missing data?
How consistent is information provision, are business entities and calculated figures ambiguously labeled and can you get different answers to the same question in different places?
Is existing information available at the level that different people need, e.g. by department, country, customer, team or at at a transactional level?
Are there areas where business people believe that data is available, but no facilities exist to access this?
What is the extent of End User Computing, is this at an appropriate level and, if not, is poor information provision a driver for work in this area?
Related to this, are the needs of analytical staff catered for, or are information facilities targeted mostly at management reporting only?
How easy do business people view it as being to get changes made to information facilities, or to get access to the sort of ad hoc data sets necessary to support many business processes?
What training have business people received, what is the general level of awareness of existing information facilities and how easy is it for people to find what they need?
How intuitive are existing information facilities and how well-structured are menus which provide access to these?
Is current information provision something that is an indispensable part of getting work done, or at best an afterthought?

Design questions

How were existing information facilities created, who designed and built them and what level of business input was involved?
What are the key technical design components of the overall information architecture and how do they relate to each other?
If there is more than one existing information architecture (e.g. in different geographic locations or different business units), what are the differences between them?
How many different tools are used in various layers of the information architecture? E.g.
- Databases
- Extract Transform Load tools
- Multidimensional data stores
- Reporting and Analysis tools
- Data Visualisation tools
- Dashboard tools
- Tools to provide information to applications or web-portals
What has been the role of data modeling in designing and developing information facilities?
If there is a target data model for the information facilities, is this fit for purpose and does it match business needs?
Has a business glossary been developed in parallel to the design of the information capabilities and if so is this linked to reporting layers?
What is the approach to master data and how is this working?

Technical questions

What are the key source systems and what are their types, are these integrated with each other in any way?
How does data flow between source systems?
Is there redundancy of data and can similar datasets in different systems get out of synch with each other, if so which are the master records?
How robust are information facilities, do they suffer outages, if so how often and what are the causes?
Are any issues experienced in making changes to information facilities, either extended development time, or post-implementation failures?
Are there similar issues related to the time taken to fix information facilities when they go wrong?
Are various development tools integrated with each other in a way that helps developers and makes code more rigourous?
How are errors in input data handled and how robust are information facilities in the face of these challenges?
How well-optimised is the regular conversion of data into information?
How well do information facilities cope with changes to business entities (e.g. the merger of two customers)?
Is the IT infrastructure(s) underpinning information facilities suitable for current data volumes, what about future data volumes?
Is there a need for redundancy in the IT infrastructure supporting information facilities, if so, how is this delivered?
Are suitable arrangements in place for disaster recovery?

Process questions

Is there an overall development methodology applied to the creation of information facilities?^[3]
If so, is it adhered to and is it fit for purpose?
What controls are applied to the development of new code and data structures?
How are requests for new facilities estimated and prioritised?
How do business requirements get translated into what developers actually do and is this process working?
Is the level, content and completeness of documentation suitable, is it up-to-date and readily accessible to all team members?
What is the approach to testing new information facilities?
Are there any formal arrangements for Data Governance and any initiatives to drive improvements in data quality?
How are day-to-day support and operational matters dealt with and by whom?

Information Team questions

Is there a single Information Team or many, if many, how do they collaborate and share best practice?
What is the demand for work required of the existing team(s) and how does this relate to their capacity for delivery?
What are the skills of current team members and how do these complement each other?
Are there any obvious skill gaps or important missing roles?
How do information people relate to other parts of IT and to their business colleagues?
How is the information team(s) viewed by their stakeholders in terms of capability, knowledge and attitude?

An Approach to Geolocation

It's good to talk. I was going to go with a picture of the late Bob Hoskins, but figured that this might not resonate outside of my native UK.

So that’s a long list of questions^[4], to add to the list: what is the best way of answering them? Of course it may be that there is existing documentation which can help in some areas, however the majority of questions are going to be answered via the expedient of talking to people. While this may appear to be a simple approach, if these discussions are going to result in an accurate and relevant Situational Analysis, then how to proceed needs to be thought about up-front and work needs to be properly structured.

Business conversations

A challenge here is the range and number of people^[5]. It is of course crucial to start with the people who consume information. These discussions would ideally allow the strategist to get a feeling for what different business people do and how they do it. This would cover their products/services, the markets that they operate in and the competitive landscape they face. With some idea of these matters established, the next item is their needs for information and how well these are met at present. Together feedback in these areas will begin to help to shape answers to some of the business-focussed questions referenced above (and to provide pointers to guide investigations in other areas). However it is not as simple an equation as:

Talk to Business People = Answer all Business-focussed Questions

The feedback from different people will not be identical, variations may be driven by their personal experience, how long they have been at the company and what part of its operations they work in. Different people will also approach their work in different ways, some will want to be very numerically focussed in decision-making, others will rely more on experience and relationships. Also even getting information out of people in the first place is a skill in itself; it is a capital mistake for even the best analyst to theorise before they have data^[6].

This heterogeneity means that one challenge in writing the business-focussed component of a Situational Analysis within an overall Information Strategy is sifting through the different feedback looking for items which people agree upon, or patterns in what people said and the frequency with which different people made similar points. This work is non-trivial and there is no real substitute for experience. However, one thing that I would suggest can help is to formally document discussions with business people. This has a number of advantages, such as being able to run this past them to check the accuracy and completeness of your notes^[7] and being able to defend any findings as based on actual fact. However, documenting meetings also facilitates the analysis and synthesis process described above. These meeting notes can be read and re-read (or shared between a number of people collectively engaged in the strategy formulation process) and – when draft findings have been developed – these can be compared to the original source material to ensure consistency and completeness.

IT conversations

I preferred Father Ted (or the first series of Black Books) myself; can't think where the inspiration for these characters came from.

Depending on circumstances, talking to business people can often be the largest activity and will do most to formulate proposals that will appear in other parts of the Information Strategy. However the other types of questions also need to be considered and parallel discussions with general IT people are a prerequisite. An objective here is for the strategist to understand (and perhaps document) the overall IT landscape and how this flows into current information capabilities. Such a review can also help to identify mismatches between business aspirations and system capabilities; there may be a desire to report on data which is captured nowhere in the organisation for example.

The final tranche of discussions need to be with the information professionals who have built the current information landscape (assuming that they are still at the company, if not then the people to target are those who maintain information facilities). There can sometimes be an element of defensiveness to be overcome in such discussions, but equally no one will have a better idea about the challenges with existing information provision than the people who deal with this area day in and day out. It is worth taking the time to understand their thoughts and opinions. With both of these groups of IT people, formally documented notes and/or schematics are just as valuable as with the business people and for the same reasons.

Rinse and Repeat

The above conversations have been described sequentially, but some element of them will probably be in parallel. Equally the process is likely to be somewhat iterative. It is perhaps a good idea to meet with a subset of business people first, draw some very preliminary conclusions from these discussions and then hold some initial meetings with various IT people to both gather more information and potentially kick the tyres on your embryonic findings. Sometimes after having done a lot of business interviews, it is also worth circling back to the first cohort both to ask some different questions based on later feedback and also to validate the findings which you are hopefully beginning to refine by now.

Of course a danger here is that you could spend an essentially limitless time engaging with people and not ever landing your Situational Analysis; in particular person A may suggest what a good idea it would be for you to also meet with person B and person C (and so on exponentially). The best way to guard against this is time-boxing. Give your self a deadline, perhaps arrange for a presentation of an initial Situational Analysis to an audience at a point in the not-so-distance future. This will help to focus your efforts. Of course mentioning a presentation, or at least some sort of abridged Situational Analysis, brings up the idea of how to summarise the detailed information that you have uncovered through the process described above. This is the subject of the final section of this article.

In Summary

Sigma

I will talk further about how to summarise findings and recommendations in Part III, for now I wanted to focus on just two aspects of this. First a mechanism to begin to identify areas of concern and second a simple visual way to present the key elements of an information-focussed Situational Analysis in a relatively simple exhibit.

Sorting the wheat from the chaff

To an extent, sifting through large amounts of feedback from a number of people is one way in which good IT professionals earn their money. Again experience is the most valuable tool to apply in this situation. However, I would suggest some intermediate steps would also be useful here both to the novice and the seasoned professional. If you have extensive primary material from your discussions with a variety of people and have begun to discern some common themes through this process, then – rather than trying to progress immediately to an overall summary – I would recommend writing notes around each of these common themes as a good place to start. These notes may be only for your own purposes, or they may be something that you also later choose to circulate as additional information; if you take the latter approach, then bear the eventual audience in mind while writing. Probably while you are composing these intermediate-level notes a number of things will happen. First it may occur to you that some sections could be split to more precisely target the issues. Equally other sections may overlap somewhat and could benefit from being merged. Also you may come to realise that you have overlooked some areas and need to address these.

Whatever else is happening, this approach is likely to give your subconscious some time to chew over the material in parallel. It is for this reason that sometimes the strategist will wake at night with an insight that had previously eluded them. Whether or not the subconscious contributes this dramatically, this rather messy and organic process will leave you with a number of paragraphs (or maybe pages) on a handful of themes. This can then form the basis of the more summary exhibit which I describe in the next section; namely a scorecard.

An Information Provision Scorecard

Of course a scorecard about the state of information provision approaches levels of self-reference that Douglas R Hofstadter^[8] would be proud of. I would suggest that such a scorecard could be devised by thinking about each of the common themes that have arisen, considering each of the areas of questioning described above (business, design, technical, process and team), or perhaps a combination of both. The example scorecard which I provide above uses the areas of questions as its intermediate level. These are each split out into a number of sub-categories (these will vary from situation to situation and hence I have not attempted to provide actual sub-category names). A score can be allocated (based on your research) to each of these on some scale (the example uses a 5 point one) and these base figures can be rolled up to get a score for each of the intermediate categories. These can then be further summarised to give a single, overall score ^[9].

While a data visualisation such as the one presented here may be a good way to present overall findings, it is important that this can be tied back to the notes that have been compiled during the analysis. Sometimes such scores will be challenged and it is important that they are based in fact and can thus be defended.

Next steps

Next steps

Of course your scorecard, or overall Situational Analysis, could tell you that all is well. If this is the case, then our work here may be done^[10]. If however the Situational Analysis reveals areas where improvements can be made, or if there is a desire to move the organisation forward in a way that requires changes to information provision, then thought must be given to either what can be done to remediate problems or what is necessary to seize opportunities; most often a mixture of both. Considering these questions will be the subject of the final article in this series, Forming an Information Strategy: Part III – Completing the Strategy.

Addendum

When I published the first part of this series, I received an interesting comment from Gary Nuttall, Head of Business Intelligence at Chaucer Syndicates (you can view Gary’s profile on LinkedIn and he posts as @gpn01 on Twitter). I reproduce an extract from this verbatim below:

[When considering questions such as “Where are we?”] one thing I’d add, which for smaller organisations may not be relevant, is to consider who the “we” is (are?). For a multinational it can be worth scoping out whether the strategy is for the legal entity or group of companies, does it include the ultimate parent, etc. It can also help in determining the culture of the enterprise too which will help to shape the size, depth and span of the strategy too – for some companies a two pager is more than enough for others a 200 pager would be considered more appropriate.

I think that this is a valuable additional perspective and I thank Gary for providing this insightful and helpful feedback.

Forming an Information Strategy
I – General Strategy	II – Situational Analysis	III – Completing the Strategy

Notes

^[1]	These include (in chronological order): Scaling-up Performance Management Developing an international BI strategy “Involving users in Business intelligence Strategy key for success” – Christina Torode on SearchCio-Midmarket.com A single version of the truth? A bad workman blames his [Business Intelligence] tools
^[2]	IRM European Data Warehouse and Business Intelligence Conference – November 2012
^[3]	There are a whole raft of sub-questions here and I don’t propose to be exhaustive in this article.
^[4]	In practice its at best a representative subset of the questions that would need to be answered to assemble a robust situational analysis.
^[5]	To get some perspective on the potential range of business people it is necessary to engage in such a process, again see the aforementioned Developing an international BI strategy.
^[6]	With apologies to Arthur Conan Doyle and his most famous creation.
^[7]	It is not atypical for this approach to lead to people coming up with new observations based on reviewing your meeting notes. This is a happy outcome.
^[8]	Gödel, Escher, Bach: An Eternal Golden Braid has been referenced a number of times on this site (see above from New Adventures in Wi-Fi – Track 3: LinkedIn), but I think that this is the first time that I have explicitly acknowledged its influence.
^[9]	You can try to be cute here and weight scores before rolling them up. In practice this is seldom helpful and can give the impression that the precision of scoring is higher than can ever actually be the case. Judgement also needs to be exercised in determining which graphic to use to best represent a rolled up score as these will seldom precisely equal the fractions selected; quarters in this example. The strategist should think about whether a rounded-up or rounded-down summary score is more representative of reality as pure arithmetic may not suffice in all cases.
^[10]	There remains the possibility that the current situation is well-aligned with current business practices, but will have problems supporting future ones. In this case perhaps a situational analysis is less useful, unless this is comparing to some desired future state (of which more in the next chapter).

Follow @peterjthomas

Phillip Hughes RIP

27 Nov 201427 Nov 2014 Peter James Thomas general cricket, phillip hughes

© International Cricket Council www.icc-cricket.com 2015 — © International Cricket Council http://www.icc-cricket.com 2014

Virtually all of the time this blog is focussed on aspects of business, technology and change; even when my writing starts out in an ostensibly different place, I habitually bring my thoughts back to one of these three areas. This post is not like my normal ones. I trust that readers will understand my motivations in deviating from my regular subject matter,

Earlier today it was announced that Australian international cricketer Phillip Hughes (ESPNcricinfo profile here) had sucumbed to injuries suffered two days earlier while playing for South Australia against New South Wales in the Sheffield Shield (Australia’s domestic First Class cricket league). Hughes was struck by the cricket ball somewhere between the top of his neck and base of his skull. He was wearing a protective helmet as most cricketers do nowadays, but these do not cover every angle from which the head can be hit. Without getting into morbid details, Hughes was very unlucky, being hit in exactly the wrong place in exactly the wrong way; cricket is a dangerous sport, but mercifully fatalities are rare.

Regular readers on these pages will know that cricket is my favourite sport. I played in some capacity – though never very well – from under the age of 10 to my mid-twenties. I have followed it ever since and cricket-related articles have appeared on this site on many occasions before. Indeed I have written here about Phil Hughes, here is part of what I had to say about him back in July 2009:

Hughes is only 20 and has burst onto the international cricketing scene in a matter of months. Before the current tour to England, he had played just three Tests (the name given to five day cricket matches between different countries). However, these were all against South Africa, one of the strongest teams in the world at present. In his six innings (a team generally bats twice in a Test Match) he had made 415 runs at the eye-catching average of 69.16 [number of runs / (number of innings – times not out)]. By way of reference, this is higher than any other player in either of the current Australian and English teams.

While to play an international sport for your country is the pinnacle of athletic success, Hughes was not able to fully establish himself at the highest level, though it is possible that – absent these tragic events – he would have had a recall to the Australian team in the near future. His playing record will now sadly show that he played just 26 Test Matches and 25 One Day Internationals. It is however arguable that his best days in cricket were ahead of him. Sometimes batsmen who bloom early, as Hughes did, have a second and more sustained later flowering based on a better understanding of their own game and greater experience.

Of course any life cut short is a tragedy, leaving questions around what the person could have done if granted more time. However, when someone with demonstrable talent, a clutch of achievements and the likelihood of more to come, has their story end too early it does lead to a special type of sadness, coupled with musings about what could have been. Having said that, and even if there is a theme of potential not being wholly fulfilled here, it is worth thinking about just how few people are good enough to represent their country at a sport. Compared to the general population, Hughes had a very special talent, even if he will now not have the opportunity to display this over a longer career.

I didn’t know Phillip Hughes, but it is part of the life of an international sportsperson that we all want a part of them; be they the “heros” who play for our team or the “villains” who represent the opposition (the quotation marks in both cases are wholly intentional). Part of the appeal of sport is its vicarious nature. Having played and watched cricket for years and having both suffered and seen injuries during my playing years, I suppose I do somehow feel close to what has happened to Phil. Perhaps this is just part of the general human need to connect, perhaps we all want to augment our own stories by borrowing from those of other people. In any case, the news of his untimely death while playing my chosen sport has touched me, touched me enough to write about it.

Seldom will associates of a departed person, have bad things to say about them, particularly when the person’s life has ended so suddenly. However it is very noticeable that, amongst the many tributes from the cricketing world, virtually everyone has taken the time to single out what a good person Phillip Hughes was off the field. While his passing is very sad, both for onlookers like me and in a much more real sense for his family and friends, these comments are testament to Hughes the man and not just the sportsman. While the ending was horrible, teammates have said that cricket was the biggest part of his life and – recognising the overused cliché – he died doing something that he loved. I hope that this fact and the kind words said about Phil by everyone who knew him bring some comfort to his family circle.

In closing I’d like to also spare a thought for another person very much affected by these awful events, up-and-coming New South Wales fast-medium bowler Sean Abottt (ESPNcircinfo profile here). It was Abbott who delivered the ball which inadvertently led to Hughes’s demise. It has been good to see the cricketing world rallying round to support Abbott as well, I suspect he will need a lot of help in coming days and months.

Forming an Information Strategy: Part I – General Strategy

26 Nov 20149 Feb 2016 Peter James Thomas business intelligence, data strategy, it business alignment, strategy information strategy

Forming an Information Strategy
I – General Strategy	II – Situational Analysis	III – Completing the Strategy

Maybe we could do with some better information, but how to go about getting it? Hmm...

This article is the first of three which address how to formulate an Information Strategy. I have written a number of other articles which touch on this subject^[1] and have also spoken about the topic^[2]. However I realised that I had never posted an in-depth review of this important area. This series of articles seeks to remedy this omission.

Part I – General Strategy explores the nature of strategy, lays some foundations and presents a framework of questions which will need to be answered in order to form any general strategy. Part II – Situational Analysis adapts the first part of this general framework – The Situational Analysis – to the task of starting to form an Information Strategy. The final chapter, Part III – Completing the Strategy, rounds out this process by working through the rest of the general framework and explaining how this can be used to produce a fully-formed Information Strategy.

A more optimal strategy would probably have been to beware the Ides of March

This would seem to be a relatively easy question to answer as the word is used (more likely over-used) in many areas of human endeavour and in business in particular. Let’s start by seeing if we can reach a consensus by the power of Google:

Wikipedia	Strategy (from Greek στρατηγία stratēgia, “art of troop leader; office of general, command, generalship”) is a high level plan to achieve one or more goals under conditions of uncertainty. [and also later in the same article] Max McKeown (2011) argues that “strategy is about shaping the future” and is the human attempt to get to “desirable ends with available means”.
The Oxford English Dictionary^[3]	/stráttiji/ n. 1 the art of war. 2 a the management of an army or armies in a campaign. b the art of moving troops, ships, aircraft, etc. into favourable positions (cf. TACTICS). c an instance of this or a plan formed according to it. 3 a plan of action or policy in business or politics etc. (economic strategy) [F stratégie f. Gk strātegia generalship f. stratēgeos]
Meta-entry	[…] a plan of action designed to achieve a long-term or overall aim. “time to develop a coherent economic strategy” […]
The Business Dictionary	[…] a method or plan chosen to bring about a desired future, such as achievement of a goal or solution to a problem. […]

So – assuming we decide, in the context of this blog, that the objective is probably not to better order military affairs, or to teach infantry men and women to write MDX – some sort of loose consensus emerges from the various definitions above. It seems that a strategy is something which seeks to influence the future, to bring about some conditions or cause an event, neither of which would manifest themselves without some action being taken^[4]. I am going to adopt the definition that a strategy is a method to achieve some future objective; or at least to make the realisation of this aim more likely. This means that a strategy implies change. If the situation is now X then after the strategy has been successfully enacted then the situation will then be Y^[5].

A Metaphor for Strategy

Plotting a Strategy

The role of change in strategy leads me to think about strategy formulation in the following way^[6]. I think of situation X (the current one) as a place on a map. Then situation Y (the desired one) is a second place on the same map. We are at X and we want to get to Y, we have a starting point and a destination, an origin and a terminus. The shortest distance between two points is of course a straight line^[7]. However a straight line between between X and Y may not exist (there could be a lake in between with no method to traverse this), or it might not be the quickest route (if the line passes over an intervening mountain, which could instead be more quickly circumnavigated). In general there may be more than one route between X and Y and each may have its advantages and disadvantages. I tend to think of strategy formation as the process by which the best (or, if this is all that is achievable, least bad) route is established.

Of course a challenge here is that – outside the realms of mathematics (or indeed SatNav) – there may not be an optimum route and equally no optimum strategy. Even if an optimum strategy does exist, the strategist may not have enough information^[8] to hand to discern this. Also, while effecting change is the objective of a strategy, this aim may itself be impacted by change; to employ our metaphor of travel, change to the destination, or to the territory in between. This may mean that the route (the strategy) must be adjusted, or in some cases wholly abandoned in favour of a different approach. Strategy formulation has some scientific-like qualities and I will focus on some of these shortly. However for the reasons just put forward (and indeed others we will examine later) elements of the strategy formation process can sometimes be more of an art form.

Of course another problem could be that you don’t have a map!

Having introduced a geographic quality to describing strategy formation, I’ll leverage this analogy^[9] for the rest of the article. However, first a slight detour to establish the credentials of your guide to the terrain of Information Strategy; namely me. Any readers who are already familiar with my work are encouraged to scroll past the next section.

So what do I know about Information Strategies anyway?

Résumé

I have worked in IT for over quarter of a century with much of that related to turning data into information. Indeed one of my early tasks during my first job at a software house was to help design and develop the automated Balance Sheet and Profit and Loss statements provided as part of the company’s flagship product. These took the transactions entered into the company’s General Ledger system and assembled them into sensible Financial statements, which could be sliced and diced^[10] by period, cost centre or project code. However, my full initiation to the related areas of Business Intelligence and Data Warehousing was not until the beginning of 2000, when I was asked to establish a Management Information function for a pan-European insurance organisation. This means that I don’t reach my 15-year BI/DW milestone until New Year (actually probably some point in the middle of January 2015)^[11].

Having both developed and executed an Information Strategy for the European part of this company, I extended both of these processes to encompass Latin America. I then developed a broader Information Strategy which included all of their International operations. It is gratifying to note that this strategy still guides information provision at this organisation to this day. After this, I went on to shape Information Strategies for other companies in sectors such as Manufacturing, Retail and back to Reinsurance / Insurance again. In each of these cases, I either saw the execution of these strategies through to at least their first delivery, or the programmes of work that I crafted were then executed by the teams that I had built.

My teams also won a couple of awards for this work along the way.

The Questions driving Strategy Formation

Questions

There are many good resources available in printed form and on-line for those who want to understand various approaches to general strategy formulation. For readers who are interested in strategy outside of a technology context and specifically outside of the area of Information Strategy, then Google is your friend. For anyone who is still with us, then while I would not claim to be an all-purpose strategy guru, I think that it is worth starting by presenting some general questions that pertain to the area of strategy formation. I am going to cast these in the shape of the geographic / journey metaphor that I developed above. Adopting this framework, any general strategy will have to answer the following questions:

Where are we?
Answering this question is the province of a Situational Analysis. Such a study will highlight what is good about the current situation as well as what needs to be changed.
Where do we want to be instead and why?
Here it is useful to consider two things: first Drivers for Change (which may emerge from the Situational Analysis); second a further question, What does good look like? This area is thus a mixture of what is wrong with the current situation and what would be good about the one proposed as the objective of the strategy^[12].
How do we get there, how long will it take and what will it cost?
Thinking of the most perfect of destinations is going to be of little use if it costs too much to get there or the journey time is prohibitive. Here the strategist needs to get more concrete and consider realistic estimates of time and money.
Will the trip be worth it?
There is a relation here to areas covered under the earlier bullet points, but answering this question will normally require some sort of cost/benefit analysis. In describing what good looks like, many potential benefits may be articulated, here there is a need to quantify them as best as is possible.
What else can we do along the way?
Some might quibble at the inclusion of this item. However I think that the metaphor of a journey lends itself to considering what tactical work can help buttress the central activities of the strategy.

The framing of the above in terms pertinent to a journey may not be familiar^[13], but I think that it is useful. This metaphor also has the benefit of alluding to what is inevitably the case with each of strategy development, strategy execution and the most worthwhile of journeys; they seldom happen overnight.

Having laid some general foundations, the next article in this series, Part II, will begin to be more specific and consider how these questions can be applied to forming the first element an Information Strategy, a Situational Analysis.

Forming an Information Strategy
I – General Strategy	II – Situational Analysis	III – Completing the Strategy

Notes

^[1]	These include (in chronological order): Scaling-up Performance Management Developing an international BI strategy “Involving users in Business intelligence Strategy key for success” – Christina Torode on SearchCio-Midmarket.com A single version of the truth? A bad workman blames his [Business Intelligence] tools
^[2]	IRM European Data Warehouse and Business Intelligence Conference – November 2012
^[3]	Actually The Oxford English Dictionary and Thesaurus (1997). Ink on cellulose pulp edition (this format used to be quite popular once upon a time). Also still available from antiquarian booksellers.
^[4]	Here I assume that waiting around for something to happen, when it was going to happen anyway, is not a strategic approach; “that would be to mistake lethargy for strategy” (© Antony Jay and Jonathan Lynn. Yes Minister – The Writing on the Wall)
^[5]	I suppose the assumption here is that situation Y is preferable to situation X; at least from the point of view of the strategist.
^[6]	Amongst many other articles on this site, see The confluence of BI and change management for an explicit link between change and Business Intelligence.
^[7]	Assuming Euclidean Geometry, if not then maybe try this instead.
^[8]	Here I am using the term generally rather than in the sense of information generated by systems, which even today remains mostly numeric; albeit that other forms of data have streaked ahead of dowdy old numbers some time ago. Numeric data tends to be somewhat easier to transform into information than non-numeric; at least for now.
^[9]	Perhaps it is worth introducing a note of caution about the over-extension of analogies here – I do this in an earlier article bearing the same name.
^[10]	To this day, I have a compulsion to write “dice and slice” as opposed to “slice and dice”, despite the latter being a more logical sequence of events when approaching – say – a butternut squash.
^[11]	I am looking forward to my engraved TDWI decanter immensely.
^[12]	Sometimes the current situation is so bad that simply addressing its shortcomings is enough work for a strategy to consider. More often a strategy will look to ad value beyond just remediating current issues.
^[13]	Though I can hardly claim to be the first person to come up with this metaphor.

Follow @peterjthomas

The Kindness of Strangers

20 Nov 201421 Nov 2014 Peter James Thomas blogging, social media, twitter cindi howson, datavizblog, michael sandberg, pauline cabrera, Simon Barnes, twelveskip

“Whoever you are, I have always depended on the kindness of strangers.” – A Street Car Named Desire by Tennessee Williams

It is so often stated that it has become a truism of sorts that on-line interactions, particularly those via social media, displace what is termed “real world” or “face to face” interactions. My view is that this perspective, rather than being self-evidently true, is actually apocryphal. I am sure that there are examples of people who have become more isolated (in a physical sense) through use of social media; those who are engaged in a zero-sum game where time spent on-line is at the expense of being around other humans. Most communications media can be accused of the same thing, though I am not aware that anyone ever told Jane Austen to stop wasting her time writing letters and instead get out and meet people. It wasn’t so long ago that people, particularly younger people, were berated for spending so much time on the ‘phone; even back when those were connected to a wall socket by a wire. The same barbs were thrown (and still are) at what we now call Video Games; another area which I admit has occupied a lot of my time in other periods of my life.

There is however a different way of looking at this supposed issue. As I explain in my now rather antiquated review of the Twitterverse:

I have been involved in running web-sites and various on-line communities since 1999.

[…]

I think that Twitter.com^[1] can be an extremely useful way of interacting with people, expanding your network and coming into contact with interesting new people.

– Taken from New Adventures in Wi-Fi – Track 2: Twitter April 2010

I have indeed come in to contact with a wide range of different people through my, admittedly rather intermittent, use of what we now call social media. Importantly, a lot of these people are based in parts of the world, or even parts of my own country, where our paths would have been unlikely to cross. I suppose that a case could be made that any time I spend writing or reading blog articles, or talking to people on Twitter or LinkedIn, could instead have been more profitably employed sitting on a barstool; perhaps in the hope that someone with complementary interests would start talking to me. However, this does seem to be a doubtful assertion to make. As with most things in life (except chocolate of course) balance is the key. If you spend all of your time on social media (or indeed all of your time in bars) you will rule out some social experiences. If instead you spend some time on social media as part of a healthy, balanced diet, then this should lead to a wider range of associates and sometimes even friends. It is also a pretty frictionless way to find people who are passionate about the things that you are passionate about; or indeed to find out why people are passionate about areas that you think might be interesting.

I mention above that – despite the observations I make later in the same paragraph – my own use of social media has been sporadic^[2]. Having made some progress in understanding some elements of the area in an earlier stage of its evolution, jumping back in as I am doing now can feel a little daunting. These fears have been somewhat ameliorated by reconnecting with a lot of people, who still seem interested in me and what I have to say^[3]. I have also connected with some new people and acknowledging this second occurrence is the actual purpose of this article.

First, I’d like to offer thanks to Ontario-based Pauline Cabrera (@twelveskip) of twelveskip.com. Pauline describes herself thus on Twitter:

Savvy Digital Strategist / Blogger / Web Designer / Virtual Assistant (http://GeekyVA.com). I dig #SEO, blogging, social media & content marketing.

I found Pauline’s web-site when I was thinking about sprucing up my Twitter header and looking for some advice^[4]. Pauline’s observations were clear and helpful, but while I get by OK in creating images (both in a business context and with many of the diagrams on this site), I am not a graphic designer. Given Pauline’s greater experience, I decided to reach out to her. The fruits of this interaction can now be viewed on my Twitter site, @peterjthomas.

Pauline and I reached a commercial arrangement, so I’m not here referring to the kindness of strangers always meaning doing stuff for free. However, while I am sure many other people provide the services that Pauline does, I’m equally confident that very few do it with such speed and professionalism. When you couple these attributes with her being ultra-friendly and displaying an evident delight in doing what she does, you end up with someone it is a pleasure to do business with.

I mentioned that Pauline resides in Canada, I live in the UK, we wouldn’t have bumped into each other without those modern inventions of the Internet, search engines, web-sites and (the subject of the search that allowed me to find Pauline) Twitter.

Second, I recently composed an article with a Data Visualisation theme and as part of researching this looked at a number of blogs covering this area. One that stood out was Michael Sandberg’s Data Visualization Blog. Michael describes himself thus:

My main work-related areas of interest are in developing self-service interactive, dynamic reports for Web and Mobile (most notably iPad). I currently develop using MicroStrategy in the Cloud with Netezza.

Michael and I also share a mutual connection in Cindi Howson (@BIScorecard) of BI Scorecard. Despite this, I had not been aware of Michael’s work until recently. I did however connect with him via his web-site. Today he has been kind enough to feature the data visualisation piece I wrote on his blog. It is always gratifying when a fellow professional thinks that your work merits sharing with their network.

In this case, Michael is based in Arizona. The chances of us bumping in to each other, except though us both blogging, would have been slim as well.

The final person that I would like to mention is Simon Barnes, the award-winning sports and wildlife author and journalist. I based my recent blog article, Ten Million Aliens – More musings on BI-ology, on his book of a similar name. Aside from his articles for various newspapers being published on-line, Simon has not been noted for his social medial presence until recently. This has now been remedied via his blog Simon Barnes Author and Twitter account, @SimonBarnesWild; Simon has been using the former to showcase chapters from his book.

The kindness that I wanted to point out here is the diligence with which Simon responds to comments on his site. Of course, on a personal note, there is always a frisson of excitement when someone whose work you admire and who is also something of a public figure in the UK replies to you directly as Simon has to me. Politeness and consideration for others pre-date the Internet of course, but treating people reasonably gets you a long way in social media. As Simon seems to do this naturally, I am sure this characteristic will stand him in good stead.

I can’t claim that Simon lives a long way from me, his home in Norfolk is pretty adjacent to my current one in Cambridge. However, despite having read his articles for years, it was only once Simon established a web presence that the opportunity to correspond opened up.

So, in the couple of weeks during which I have dipped my toe back into the social media water, I have had the privilege to connect (in a number of different ways) with the three people that I mention above. Each of Pauline, Michael and Simon are on-line for different reasons and each have different things to say about very different areas. However, I am interested in what each of them does, as are many other people around the world. It’s hard to imagine an easier way in which I could have formed connections with these three people, one from Canada, one from the US and one from my native UK, than via the Internet and – in these cases – Twitter and Blogging. I think these are useful facts to remember in the face of accusations that social media makes people insular, closed-off and lonely. It may do that to some people, but this is a million miles away from my own experiences and – I strongly suspect – those of many of the people who are now able to access a wider world through their keyboards or touchscreens.

Notes

^[1]	The “.com” was still in use back in 2010
^[2]	This is something that I cover in another earlier article: Four [Social Media] Failures and a Success. The section describing the first failure (in this case a personal one) begins: Failure 1 – Thinking that you can dip in and out of Social Media
^[3]	Probably strongly correlated to me being interested in what they have to say of course.
^[4]	I think that the actual search terms were the rather prosaic “twitter header dimensions“.

Follow @peterjthomas

Scienceogram.org’s Infographic to celebrate the Philae landing

15 Nov 20146 Feb 2017 Peter James Thomas business intelligence, data visualisation, infographics, Mathematics & Science philae, rosetta

As a picture is said to paint a thousand words, I’ll (mostly) leave it to Scienceogram’s infographic to deliver the message.

However, The Center for Responsive Politics (I have no idea whether or not they have a political affiliation, they claim to be nonpartisan) estimates the cost of the recent US Congressional elections at around $3.67 bn (€2.93 bn). I found a lower (but still rather astonishing) figure of $1.34 bn (€1.07 bn) at the Federal Election Commission web-site, but suspect that this number excludes Political Action Committees and their like.

To make a European comparisson to a European space project, the Common Agriculture Policy cost €57.5 bn ($72.0 bn) in 2013 according to the BBC. Given that Rosetta’s costs were spread over nearly 20 years, it makes sense to move the decimal point rightwards one place in both the euro and dollar figures and then to double the resulting numbers before making comparisons (this is left as an exercise for the reader).

Of course I am well aware that a quick Google could easily produce figures (such as how many meals, or vaccinations, or so on you could get for €1.4 bn) making points that are entirely antipodal to the ones presented. At the end of the day we landed on a comet and will – fingers crossed – begin to understand more about the formation of the Solar System and potentially Life on Earth itself as a result. Whether or not you think that is good value for money probably depends mostly on what sort of person you are. As I relate in a previous article, infographics only get you so far.

Scienceogram provides précis [correct plural] of UK science spending, giving overviews of how investment in science compares to the size of the problems it’s seeking to solve.

Follow @peterjthomas

Ten Million Aliens – More musings on BI-ology

14 Nov 201420 Dec 2014 Peter James Thomas Biology, business intelligence, data warehousing biological classification, cricket, david gower, dimension, hierarchy, Simon Barnes, taxonomy

Introduction

This article relates to the book Ten Million Aliens – A Journey Through the Entire Animal Kingdom by British journalist and author Simon Barnes, but is not specifically a book review. My actual review of this entertaining and informative work appears on Amazon and is as follows:

Having enjoyed Simon’s sport journalism (particularly his insightful and amusing commentary on Test Match cricket) for many years, I was interested to learn about this new book via his web-site. As an avid consumer of pop-science literature and already being aware of Simon’s considerable abilities as a writer, I was keen to read Ten Million Aliens. To be brief, I would recommend the book to anyone with an enquiring mind, an interest in the natural world and its endless variety, or just an affection for good science writing. My only sadness was that the number of phyla eventually had to come to an end. I laughed in places, I was better informed than before reading a chapter in others and the autobiographical anecdotes and other general commentary on the state of our stewardship of the planet added further dimensions. I look forward to Simon’s next book.

Instead this piece contains some general musings which came to mind while reading Ten Million Aliens and – as is customary – applies some of these to my own fields of professional endeavour.

Some Background

David Ivon Gower

Regular readers of this blog will be aware of my affection for Cricket^[1] and also my interest in Science^[2]. Simon Barnes’s work spans both of these passions. I became familiar with Simon’s journalism when he was Chief Sports Writer for The Times^[3] an organ he wrote for over 32 years. Given my own sporting interests, I first read his articles specifically about Cricket and sometimes Rugby Union, but began to appreciate his writing in general and to consume his thoughts on many other sports.

There is something about Simon’s writing which I (and no doubt many others) find very engaging. He manages to be both insightful and amusing and displays both elegance of phrase and erudition without ever seeming to show off, or to descend into the overly-florid prose of which I can sometimes (OK often) be guilty. It also helps that we seem to share a favourite cricketer in the shape of David Gower, who appears above and was the most graceful bastman to have played for England in the last forty years. However, it is not Simon’s peerless sports writing that I am going to focus on here. For several years he also penned a wildlife column for The Times and is a patron of a number of wildlife charities. He has written books on, amongst other topics, birds, horses, his safari experiences and conservation in general.

Green Finch, Great Tit, Lesser Spotted Woodpecker, Tawny Owl, Magpie, Carrion Crow, Eurasian Jay, Jackdaw

My own interest in science merges into an appreciation of the natural world, perhaps partly also related to the amount of time I have spent in remote and wild places rock-climbing and bouldering. As I started to write this piece, some welcome November Cambridge sun threw shadows of the Green Finches and Great Tits on our feeders across the monitor. Earlier in the day, my wife and I managed to catch a Lesser Spotted Woodpecker, helping itself to our peanuts. Last night we stood on our balcony listening to two Tawny Owls serenading each other. Our favourite Corvidae family are also very common around here and we have had each of the birds appearing in the bottom row of the above image on our balcony at some point. My affection for living dinosaurs also extends to their cousins, the herpetiles, but that is perhaps a topic for another day.

Ten Million Aliens has the modest objectives, revealed by its sub-title, of saying something interesting about about each of the (at the last count) thirty-five phyla of the Animal Kingdom^[4] and of providing some insights in to a few of the thousands of familes and species that make these up. Simon’s boundless enthusiasm for the life he sees around him (and indeed the life that is often hidden from all bar the most intrepid of researchers), his ability to bring even what might be viewed as ostensibly dull subject matter^[5] to life and a seemingly limitless trove of pertinent personal anecdotes, all combine to ensure not only that he achieves these objectives, but that he does so with some élan.

Classifications and Hierarchies

Biological- Classification

Well having said that this article wasn’t going to be a book review, I guess it has borne a striking resemblance to one so far. Now to take a different tack; one which relates to three of the words that I referenced and provided links to in the last paragraph of the previous section: phylum, family and species. These are all levels in the general classification of life. At least one version of where these three levels fit into the overall scheme of things appears in the image above^[6]. Some readers may even be able to recall a related mnemonic from years gone by: Kings Play Chess on Fine Green Sand^[7].

The father of modern taxonomy, Carl Linnaeus, founded his original biological classification – not unreasonably – on the shared characteristics of organisms; things that look similar are probably related. Relations mean that like things can be collected together into groups and that the groups can be further consolidated into super-groups. This approach served science well for a long time. However when researchers began to find more and more examples of convergent evolution^[8], Linnaeus’s rule of thumb was seen to not always apply and complementary approaches also began to be adopted.

Cladogram

One of these approaches, called Cladistics, focuses on common ancestors rather than shared physical characteristics. Breakthroughs in understanding the genetic code provided impetus to this technique. The above diagram, referred to as a cladogram, represents one school of thought about the relationship between avian dinosaurs, non-avian dinosaurs and various other reptiles that I mentioned above.

It is at this point that the Business Intelligence professional may begin to detect something somewhat familiar^[9]. I am of course talking about both dimensions and organising these into hierarchies. Dimensions are the atoms of Business Intelligence and Data Warehousing^[10]. In Biological Classification: H. sapiens is part of Homo , which is part of Hominidae, which is part of Primates, which is part of Mammalia, which is part of Chordata, which then gets us back up to Animalia^[11]. In Business Intelligence: Individuals make up Teams, which make up Offices, which make up Countries and Regions.

Above I references different approaches to Biological Classification, one based on shared attributes, the other on homology of DNA. This also reminds me of the multiple ways to roll-up dimensions. To pick the most obvious, Day rolls up to Month, Quarter, Half-Year and Year; but also in a different manner to Week and then Year. Given that the aforementioned DNA evidence has caused a reappraisal of the connections between many groups of animals, the structures of Biological Classification are not rigid and instead can change over time^[12]. Different approaches to grouping living organisms can provide a range of perspectives, each with its own benefits. In a similar way, good BI/DW design practices should account for both dimensions changing and the fact that different insights may well be provided by parallel dimension hierarchies.

In summary, I suppose what I am saying is that BI/DW practitioners, as well as studying the works of Inmon and Kimball, might want to consider expanding their horizons to include Barnes; to say nothing of Linnaeus^[13]. They might find something instructive in these other taxonomical works.

Notes

^[1]	Articles from this blog in which I intertwine Cricket and aspects of business, technology and change include (in chronological order): Accuracy More Cricket and Twitter The Big Picture Wager
^[2]	Articles on this site which reference either Science or Mathematics are far too numerous to list in full. A short selection of the ones I enjoyed writing most would include (again in chronological order): A single version of the truth? Patterns patterns everywhere Analogies Data Visualisation – A Scientific Treatment
^[3]	Or perhaps The London Times for non-British readers, despite the fact that it was the first newspaper to bear that name.
^[4]	Here “Aninal Kingdom” is used in the taxonomical sense and refers to Animalia.
^[5]	For an example of the transformation of initially unpromising material, perhaps check out the chapter of Ten Million Aliens devoted to Entoprocta.
^[6]	With acknowledgment to The Font.
^[7]	Though this elides both Domains and Johny-come-latelies like super-families, sub-genuses and hyper-orders [I may have made that last one up of course].
^[8]	For example the wings of Pterosaurs, Birds and Bats.
^[9]	No pun intended.
^[10]	This metaphor becomes rather cumbersome when one tries to extend it to cover measures. It’s tempting to perhaps align these with fundamental forces, and thus bosons as opposed to combinations of fermions, but the analogy breaks down pretty quickly, so let’s conveniently forget that multidimensional data structures have fact tables at their hearts for now.
^[11]	Here I am going to strive manfully to avoid getting embroiled in discussions about domains, superregnums, superkingdoms, empires, or regios and instead leave the interested reader to explore these areas themselves if they so desire. Ten Million Aliens itself could be one good starting point, as could the following link.
^[12]	Science is yet to determine whether these slowly changing dimensions are of Type 1, 2, 3 or 4 (it has however been definitively established that they are not Type 6 / Hybrid).
^[13]	Interesting fact of the day: Linnaeus’s seminal work included an entry for The Kraken, under Cephalopoda

Follow @peterjthomas

Data Visualisation – A Scientific Treatment

6 Nov 201429 Aug 2017 Peter James Thomas Biology, business intelligence, dashboards, data visualisation, Statistics dna, journal of molecular biology, structural biology, X-ray crystallography

Introduction

The above diagram was compiled by Florence Nightingale, who was – according to The Font – “a celebrated English social reformer and statistician, and the founder of modern nursing”. It is gratifying to see her less high-profile role as a number-cruncher acknowledged up-front and central; particularly as she died in 1910, eight years before women in the UK were first allowed to vote and eighteen before universal suffrage. This diagram is one of two which are generally cited in any article on Data Visualisation. The other is Charles Minard’s exhibit detailing the advance on, and retreat from, Moscow of Napoleon Bonaparte’s Grande Armée in 1812 (Data Visualisation had a military genesis in common with – amongst many other things – the internet). I’ll leave the reader to look at this second famous diagram if they want to; it’s just a click away.

While there are more elements of numeric information in Minard’s work (what we would now call measures), there is a differentiating point to be made about Nightingale’s diagram. This is that it was specifically produced to aid members of the British parliament in their understanding of conditions during the Crimean War (1853-56); particularly given that such non-specialists had struggled to understand traditional (and technical) statistical reports. Again, rather remarkably, we have here a scenario where the great and the good were listening to the opinions of someone who was barred from voting on the basis of lacking a Y chromosome. Perhaps more pertinently to this blog, this scenario relates to one of the objectives of modern-day Data Visualisation in business; namely explaining complex issues, which don’t leap off of a page of figures, to busy decision makers, some of whom may not be experts in the specific subject area (another is of course allowing the expert to discern less than obvious patterns in large or complex sets of data). Fortunately most business decision makers don’t have to grapple with the progression in number of “deaths from Preventible or Mitigable Zymotic diseases” versus ”deaths from wounds” over time, but the point remains.

Data Visualisation in one branch of Science

von Laue, Bragg Senior & Junior, Crowfoot Hodgkin, Kendrew, Perutz, Crick, Franklin, Watson & Wilkins

Coming much more up to date, I wanted to consider a modern example of Data Visualisation. As with Nightingale’s work, this is not business-focused, but contains some elements which should be pertinent to the professional considering the creation of diagrams in a business context. The specific area I will now consider is Structural Biology. For the incognoscenti (no advert for IBM intended!), this area of science is focussed on determining the three-dimensional shape of biologically relevant macro-molecules, most frequently proteins or protein complexes. The history of Structural Biology is intertwined with the development of X-ray crystallography by Max von Laue and father and son team William Henry and William Lawrence Bragg; its subsequent application to organic molecules by a host of pioneers including Dorothy Crowfoot Hodgkin, John Kendrew and Max Perutz; and – of greatest resonance to the general population – Francis Crick, Rosalind Franklin, James Watson and Maurice Wilkins’s joint determination of the structure of DNA in 1953.

X-ray diffraction image of the double helix structure of the DNA molecule, taken 1952 by Raymond Gosling, commonly referred to as “Photo 51”, during work by Rosalind Franklin on the structure of DNA

While the masses of data gathered in modern X-ray crystallography needs computer software to extrapolate them to physical structures, things were more accessible in 1953. Indeed, it could be argued that Gosling and Franklin’s famous image, its characteristic “X” suggestive of two helices and thus driving Crick and Watson’s model building, is another notable example of Data Visualisation; at least in the sense of a picture (rather than numbers) suggesting some underlying truth. In this case, the production of Photo 51 led directly to the creation of the even more iconic image below (which was drawn by Francis Crick’s wife Odile and appeared in his and Watson’s seminal Nature paper^[1]):

Odile and Francis Crick - structure of DNA

It is probably fair to say that the visualisation of data which is displayed above has had something of an impact on humankind in the fifty years since it was first drawn.

Modern Structural Biology

The X-ray Free Electron Laser at Stanford

Today, X-ray crystallography is one of many tools available to the structural biologist with other approaches including Nuclear Magnetic Resonance Spectroscopy, Electron Microscopy and a range of biophysical techniques which I will not detain the reader by listing. The cutting edge is probably represented by the X-ray Free Electron Laser, a device originally created by repurposing the linear accelerators of the previous generation’s particle physicists. In general Structural Biology has historically sat at an intersection of Physics and Biology.

However, before trips to synchrotrons can be planned, the Structural Biologist often faces the prospect of stabilising their protein of interest, ensuring that they can generate sufficient quantities of it, successfully isolating the protein and finally generating crystals of appropriate quality. This process often consumes years, in some cases decades. As with most forms of human endeavour, there are few short-cuts and the outcome is at least loosely correlated to the amount of time and effort applied (though sadly with no guarantee that hard work will always be rewarded).

From the general to the specific

At this point I should declare a personal interest, the example of Data Visualisation which I am going to consider is taken from a paper recently accepted by the Journal of Molecular Biology (JMB) and of which my wife is the first author^[2]. Before looking at this exhibit, it’s worth a brief detour to provide some context.

In recent decades, the exponential growth in the breadth and depth of scientific knowledge (plus of course the velocity with which this can be disseminated), coupled with the increase in the range and complexity of techniques and equipment employed, has led to the emergence of specialists. In turn this means that, in a manner analogous to the early production lines, science has become a very collaborative activity; expert in stage one hands over the fruits of their labour to expert in stage two and so on. For this reason the typical scientific paper (and certainly those in Structural Biology) will have several authors, often spread across multiple laboratory groups and frequently in different countries. By way of example the previous paper my wife worked on had 16 authors (including a Nobel Laureate^[3]). In this context, the fact the paper I will now reference was authored by just my wife and her group leader is noteworthy.

The reader may at this point be relieved to learn that I am not going to endeavour to explain the subject matter of my wife’s paper, nor the general area of biology to which it pertains (the interested are recommended to Google “membrane proteins” or “G Protein Coupled Receptors” as a starting point). Instead let’s take a look at one of the exhibits.

The above diagram (in common with Nightingale’s much earlier one) attempts to show a connection between sets of data, rather than just the data itself. I’ll elide the scientific specifics here and focus on more general issues.

First the grey upper section with the darker blots on it – which is labelled (a) – is an image of a biological assay called a Western Blot (for the interested, details can be viewed here); each vertical column (labelled at the top of the diagram) represents a sub-experiment on protein drawn from a specific sample of cells. The vertical position of a blot indicates the size of the molecules found within it (in kilodaltons); the intensity of a given blot indicates how much of the substance is present. Aside from the headings and labels, the upper part of the figure is a photographic image and so essentially analogue data^[4]. So, in summary, this upper section represents the findings from one set of experiments.

At the bottom – and labelled (b) – appears an artefact familiar to anyone in business, a bar-graph. This presents results from a parallel experiment on samples of protein from the same cells (for the interested, this set of data relates to degree to which proteins in the samples bind to a specific radiolabelled ligand). The second set of data is taken from what I might refer to as a “counting machine” and is thus essentially digital. To be 100% clear, the bar chart is not a representation of the data in the upper part of the diagram, it pertains to results from a second experiment on the same samples. As indicated by the labelling, for a given sample, the column in the bar chart (b) is aligned with the column in the Western Blot above (a), connecting the two different sets of results.

Taken together the upper and lower sections^[5] establish a relationship between the two sets of data. Again I’ll skip on the specifics, but the general point is that while the Western Blot (a) and the binding assay (b) tell us the same story, the Western Blot is a much more straightforward and speedy procedure. The relationship that the paper establishes means that just the Western Blot can be used to perform a simple new assay which will save significant time and effort for people engaged in the determination of the structures of membrane proteins; a valuable new insight. Clearly the relationships that have been inferred could equally have been presented in a tabular form instead and be just as relevant. It is however testament to the more atavistic side of humans that – in common with many relationships between data – a picture says it more surely and (to mix a metaphor) more viscerally. This is the essence of Data Visualisation.

What learnings can Scientific Data Visualisation provide to Business?

Scientific presentation (c/o Nature, but looks a lot like PhD Comics IMO)

Using the JMB exhibit above, I wanted to now make some more general observations and consider a few questions which arise out of comparing scientific and business approaches to Data Visualisation. I think that many of these points are pertinent to analysis in general.

Normalisation

Broadly, normalisation^[6] consists of defining results in relation to some established yardstick (or set of yardsticks); displaying relative, as opposed to absolute, numbers. In the JMB exhibit above, the amount of protein solubilised in various detergents is shown with reference to the un-solubilised amount found in native membranes; these reference figures appear as 100% columns to the right and left extremes of the diagram.

The most common usage of normalisation in business is growth percentages. Here the fact that London business has grown by 5% can be compared to Copenhagen having grown by 10% despite total London business being 20-times the volume of Copenhagen’s. A related business example, depending on implementation details, could be comparing foreign currency amounts at a fixed exchange rate to remove the impact of currency fluctuation.

Normalised figures are very typical in science, but, aside from the growth example mentioned above, considerably less prevalent in business. In both avenues of human endeavour, the approach should be used with caution; something that increases 200% from a very small starting point may not be relevant, be that the result of an experiment or weekly sales figures. Bearing this in mind, normalisation is often essential when looking to present data of different orders on the same graph^[7]; the alternative often being that smaller data is swamped by larger, not always what is desirable.

Controls

I’ll use an anecdote to illustrate this area from a business perspective. Imagine an organisation which (as you would expect) tracks the volume of sales of a product or service it provides via a number of outlets. Imagine further that it launches some sort of promotion, perhaps valid only for a week, and notices an uptick in these sales. It is extremely tempting to state that the promotion has resulted in increased sales^[8].

However this cannot always be stated with certainty. Sales may have increased for some totally unrelated reason such as (depending on what is being sold) good or bad weather, a competitor increasing prices or closing one or more of their comparable outlets and so on. Equally perniciously, the promotion maybe have simply moved sales in time – people may have been going to buy the organisation’s product or service in the weeks following a promotion, but have brought the expenditure forward to take advantage of it. If this is indeed the case, an uptick in sales may well be due to the impact of a promotion, but will be offset by a subsequent decrease.

In science, it is this type of problem that the concept of control tests is designed to combat. As well as testing a result in the presence of substance or condition X, a well-designed scientific experiment will also be carried out in the absence of substance or condition X, the latter being the control. In the JMB exhibit above, the controls appear in the columns with white labels.

There are ways to make the business “experiment” I refer to above more scientific of course. In retail business, the current focus on loyalty cards can help, assuming that these can be associated with the relevant transactions. If the business is on-line then historical records of purchasing behaviour can be similarly referenced. In the above example, the organisation could decide to offer the promotion at only a subset of the its outlets, allowing a comparison to those where no promotion applied. This approach may improve rigour somewhat, but of course it does not cater for purchases transferred from a non-promotion outlet to a promotion one (unless a whole raft of assumptions are made). There are entire industries devoted to helping businesses deal with these rather messy scenarios, but it is probably fair to say that it is normally easier to devise and carry out control tests in science.

The general take away here is that a graph which shows some change in a business output (say sales or profit) correlated to some change in a business input (e.g. a promotion, a new product launch, or a price cut) would carry a lot more weight if it also provided some measure of what would have happened without the change in input (not that this is always easy to measure).

Rigour and Scrutiny

I mention in the footnotes that the JMB paper in question includes versions of the exhibit presented above for four other membrane proteins, this being in order to firmly establish a connection. Looking at just the figure I have included here, each element of the data presented in the lower bar-graph area is based on duplicated or triplicated tests, with average results (and error bars – see the next section) being shown. When you consider that upwards of three months’ preparatory work could have gone into any of these elements and that a mistake at any stage during this time would have rendered the work useless, some impression of the level of rigour involved emerges. The result of this assiduous work is that the authors can be confident that the exhibits they have developed are accurate and will stand up to external scrutiny. Of course such external scrutiny is a key part of the scientific process and the manuscript of the paper was reviewed extensively by independent experts before being accepted for publication.

In the business world, such external scrutiny tends to apply most frequently to publicly published figures (such as audited Financial Accounts); of course external financial analysts also will look to dig into figures. There may be some internal scrutiny around both the additional numbers used to run the business and the graphical representations of these (and indeed some companies take this area very seriously), but not every internal KPI is vetted the way that the report and accounts are. Particularly in the area of Data Visualisation, there is a tension here. Graphical exhibits can have a lot of impact if they relate to the current situation or present trends; contrawise if they are substantially out-of-date, people may question their relevance. There is sometimes the expectation that a dashboard is just like its aeronautical counterpart, showing real-time information about what is going on now^[9]. However a lot of the value of Data Visualisation is not about the here and now so much as trends and explanations of the factors behind the here and now. A well-thought out graph can tell a very powerful story, more powerful for most people than a table of figures. However a striking graph based on poor quality data, data which has been combined in the wrong way, or even – as sometimes happens – the wrong datasets entirely, can tell a very misleading story and lead to the wrong decisions being taken.

I am not for a moment suggesting here that every exhibit produced using Data Visualisation tools must be subject to months of scrutiny. As referenced above, in the hands of an expert such tools have the value of sometimes quickly uncovering hidden themes or factors. However, I would argue that – as in science – if the analyst involved finds something truly striking, an association which he or she feels will really resonate with senior business people, then double- or even triple-checking the data would be advisable. Asking a colleague to run their eye over the findings and to then probe for any obvious mistakes or weaknesses sounds like an appropriate next step. Internal Data Visualisations are never going to be subject to peer-review, however their value in taking sound business decisions will be increased substantially if their production reflects at least some of the rigour and scrutiny which are staples of the scientific method.

Dealing with Uncertainty

In the previous section I referred to the error bars appearing on the JMB figure above. Error bars are acknowledgements that what is being represented is variable and they indicate the extent of such variability. When dealing with a physical system (be that mechanical or – as in the case above – biological), behaviour is subject to many factors, not all of which can be eliminated or adjusted for and not all of which are predictable. This means that repeating an experiment under ostensibly identical conditions can lead to different results^[10]. If the experiment is well-designed and if the experimenter is diligent, then such variability is minimised, but never eliminated. Error bars are a recognition of this fundamental aspect of the universe as we understand it.

While de rigueur in science, error bars seldom make an appearance in business, even – in my experience – in estimates of business measures which emerge from statistical analyses^[11]. Even outside the realm of statistically generated figures, more business measures are subject to uncertainty than might initially be thought. An example here might be a comparison (perhaps as part of the externally scrutinised report and accounts) of the current quarter’s sales to the previous one (or the same one last year). In companies where sales may be tied to – for example – the number of outlets, care is paid to making these figures like-for-like. This might include only showing numbers for outlets which were in operation in the prior period and remain in operation now (i.e. excluding sales from both closed outlets or newly opened ones). However, outside the area of high-volume low-value sales where the Law of Large Numbers^[12] rules, other factors could substantially skew a given quarter’s results for many organisations. Something as simple as a key customer delaying a purchase (so that it fell in Q3 this year instead of Q2 last) could have a large impact on quarterly comparisons. Again companies will sometimes look to include adjustments to cater for such timing or related issues, but this cannot be a precise process.

The main point I am making here is that many aspects of the information produced in companies is uncertain. The cash transactions in a quarter are of course the cash transactions in a quarter, but the above scenario suggests that they may not always 100% reflect actual business conditions (and you cannot adjust for everything). Equally where you get in to figures that would be part of most companies’ financial results, outstanding receivables and allowance for bad debts, the spectre of uncertainty arises again without a statistical model in sight. In many industries, regulators are pushing for companies to include more forward-looking estimates of future assets and liabilities in their Financials. While this may be a sensible reaction to recent economic crises, the approach inevitably leads to more figures being produced from models. Even when these models are subject to external review, as is the case with most regulatory-focussed ones, they are still models and there will be uncertainty around the numbers that they generate. While companies will often provide a range of estimates for things like guidance on future earnings per share, providing a range of estimates for historical financial exhibits is not really a mainstream activity.

Which perhaps gets me back to the subject of error bars on graphs. In general I think that their presence in Data Visualisations can only add value, not subtract it. In my article entitled Limitations of Business Intelligence I include the following passage which contains an exhibit showing how the Bank of England approaches communicating the uncertainty inevitably associated with its inflation estimates:

Business Intelligence is not a crystal ball, Predictive Analytics is not a crystal ball either. They are extremely useful tools […] but they are not universal panaceas.

An inflation prediction from The Bank of England
Illustrating the fairly obvious fact that uncertainty increases in proportion to time from now.

[…] Statistical models will never give you precise answers to what will happen in the future – a range of outcomes, together with probabilities associated with each is the best you can hope for (see above). Predictive Analytics will not make you prescient, instead it can provide you with useful guidance, so long as you remember it is a prediction, not fact.

While I can’t see them figuring in formal financial statements any time soon, perhaps there is a case for more business Data Visualisations to include error bars.

In Summary

So, as is often the case, I have embarked on a journey. I started with an early example of Data Visualisation, diverted in to a particular branch of science with which I have some familiarity and hopefully returned, again as is often the case, to make some points which I think are pertinent to both the Business Intelligence practitioner and the consumers (and indeed commissioners) of Data Visualisations. Back in “All that glisters is not gold” – some thoughts on dashboards I made some more general comments about the best Data Visualisations having strong informational foundations underpinning them. While this observation remains true, I do see a lot of value in numerically able and intellectually curious people using Data Visualisation tools to quickly make connections which had not been made before and to tease out patterns from large data sets. In addition there can be great value in using Data Visualisation to present more quotidian information in a more easily digestible manner. However I also think that some of the learnings from science which I have presented in this article suggest that – as with all powerful tools – appropriate discretion on the part of the people generating Data Visualisation exhibits and on the part of the people consuming such content would be prudent. In particular the business equivalents of establishing controls, applying suitable rigour to data generation / combination and including information about uncertainty on exhibits where appropriate are all things which can help make Data Visualisation more honest and thus – at least in my opinion – more valuable.

Notes

^[1]	Watson, J.D., Crick, F.H.C. (1953). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature.
^[2]	Thomas, J.A., Tate, C.G. (2014). Quality Control in Eukaryotic Membrane Protein Overproduction. J. Mol. Biol. [Epub ahead of print].
^[3]	The list of scientists involved in the development of X-ray Crystallography and Structural Biology which was presented earlier in the text encompasses a further nine such laureates (four of whom worked at my wife’s current research institute), though sadly this number does not include Rosalind Franklin. Over 20 Nobel Prizes have been awarded to people working in the field of Structural Biology, you can view an interactive time line of these here.
^[4]	The intensity, size and position of blots are often digitised by specialist software, but this is an aside for our purposes.
^[5]	Plus four other analogous exhibits which appear in the paper and relate to different proteins.
^[6]	Normalisation has a precise mathematical meaning, actually (somewhat ironically for that most precise of activities) more than one. Here I am using the term more loosely.
^[7]	That’s assuming you don’t want to get into log scales, something I have only come across once in over 25 years in business.
^[8]	The uptick could be as compared to the week before, or to some other week (e.g. the same one last year or last month maybe) or versus an annual weekly average. The change is what is important here, not what the change is with respect to.
^[9]	Of course some element of real-time information is indeed both feasible and desirable; for more analytic work (which encompasses many aspects of Data Visualisation) what is normally more important is sufficient historical data of good enough quality.
^[10]	Anyone interested in some of the reasons for this is directed to my earlier article Patterns patterns everywhere.
^[11]	See my series of three articles on Using historical data to justify BI investments for just one example of these.
^[12]

Follow @peterjthomas

The 23 Most Influential Business Intelligence Blogs

2 Nov 201415 Sep 2017 Peter James Thomas blogging, business intelligence Augusto Albeghi, Barney Finucane, bi software insight, bruno aziza, cindi howson, Howard Dresner, Marcus Borba

I was flattered to be included in the recent list of the 23 most influential BI bloggers published by Better Buys. To be 100% honest, I was also a little surprised as, due to other commitments, this blog has received very little of my attention in recent years. Taking a glass half full approach, maybe my content stands the test of time; it would be nice to think so.

It was also good to be in the company of various members of the BI community whose work I respect and several of whom I have got to know on-line or in person. These include (as per the original article, in no particular order):

Blogger	Blog
Augusto Albeghi	Upstream Info
Bruno Aziza *	His blog on Forbes
Howard Dresner	Business Intelligence
Barney Finucane	Business Intelligence Products and Trends
Marcus Borba	Business Analytics News
Cindi Howson	BI Scorecard

* You can see Bruno and me talking on Microsoft’s YouTube channel here.

BI Software Insight helps organizations make smarter purchasing decisions on Business Intelligence Software. Their team of experts helps organizations find the right BI solution with expert reviews, objective resource guides, and insights on the latest BI news and trends.

Follow @peterjthomas

Patterns patterns everywhere – The Sequel

26 Jan 2014 Peter James Thomas Mathematics, Statistics xkcd

Back in 2010 I posted a piece called Patterns patterns everywhere which used the entry point of various articles on a number of web-sites relating to the, then current, Eyjafjallajokull eruption. I went ont to reference – amongst other phenomena, the weather.

The incomparable Randall Munroe from xkcd.com has just knocked my earlier work into a cocked hat with his (perhaps unsurprisingly) much more laconic observations from last Friday, which are instead inspired by the recent cold snaps in the US: