Forming an Information Strategy: Part III – Completing the Strategy

Forming an Information Strategy
I – General Strategy II – Situational Analysis III – Completing the Strategy

Maybe we could do with some better information, but how to go about getting it? Hmm...

This article is the final of three which address how to formulate an Information Strategy. I have written a number of other articles which touch on this subject [1] and have also spoken about the topic [2]. However I realised that I had never posted an in-depth review of this important area. This series of articles seeks to remedy this omission.

The first article, Part I – General Strategy, explored the nature of strategy, laid some foundations and presented a framework of questions which will need to be answered in order to formulate any general strategy. The second, Part II – Situational Analysis, explained how to adapt the first element of this general framework – The Situational Analysis – to creating an Information Strategy. In Part I, I likened formulating an Information Strategy to a journey, Part III – Completing the Strategy sees us reaching the destination by working through the rest of the general framework and showing how this can be used to produce a fully-formed Information Strategy.

As with all of my other articles, this essay is not intended as a recipe for success, a set of instructions which – if slavishly followed – will guarantee the desired outcome. Instead the reader is invited to view the following as a set of observations based on what I have learnt during a career in which the development of both Information Strategies and technology strategies in general have played a major role.
 
 
A Recap of the Strategic Framework

Forth Rail Bridge
© http://www.thomashogben.co.uk

I closed Part I of this series by presenting a set of questions, the answers to which will facilitate the formation of any strategy. These have a geographic / journey theme and are as follows:

  1. Where are we?
  2. Where do we want to be instead and why?
  3. How do we get there, how long will it take and what will it cost?
  4. Will the trip be worth it?
  5. What else can we do along the way?

Part II explained the process of answering question 1 through the medium of a Situational Analysis. It is worth pointing out at this juncture that the Situational Analysis will also naturally form the first phase of the more lengthy process of gathering and analysing business requirements. For the purposes of the rest of this article, when such requirements are mentioned, they are taken as being the embryonic ones captured as part of the Situational Analysis.

In this final article I will focus on how to approach obtaining answers to questions 2 to 5. Having spent quite some time considering question 1 in the previous chapter, the content here will be somewhat briefer for the remaining questions; not least as I have covered some of this territory in earlier articles [3].
 
 
2. Where do we want to be instead and why?

My thoughts here split into two sub-sections. The second, What does Good look like?, is (as will be obvious from the title) more forward looking than backward. It covers reasons why the destination may be worth the journey. The first is more to do with why staying in the current location may not be a great idea [4]. However, one motivation for not staying put is that somewhere else may well be better. For this reason, there is not definitive border between these two sub-sections and it will be evident from the text that they instead bleed into each other.

2a. Drivers for Change

Change Next Exit

People often say that the gains that result from Information Programmes are intangible. Of course some may indeed be fairly intangible, but even the most ephemeral of these will not be entirely immune from some sort of valuation. Other benefits, when examined closely enough, can turn out to be surprisingly tangible [5]. In making a case for change (and of course the expenditure associated with this) it is good to try to have a balance of tangible and intangible factors. Here is a selection which may be applicable:

Internal IT drivers

  • These often centre around both the cost and confusion associated with a fragmented and inconsistent Information Landscape; something which, even as we head in to 2015, is still not atypical.
  • Opportunity costs may arise from an inability to combine data from different repositories or to roll up data to cover an entire organisation.
  • There is also a case to be made here around things like the licensing costs that result from having too many information repositories and too many tools being used to access them.
  • However, the cost of remediating such fragmentation can often appear in the shape of additional IT headcount devoted to maintaining a complex landscape and additional business headcount devoted to remediating information shortcomings.

Productivity gains

  • Less number crunching, more business-focussed analysis. Often an organisation’s most highly qualified (and highly paid) staff can spend much of their time repeating quotidian tasks that computers could do far more reliably. Freeing up such able and creative people to add more business value should be an objective and should have benefits.
  • At one company I estimated that teams would spend 5-7 days assembling the information necessary to support a meeting with one of a number of key business partners or a major client; our goal became to provide the same information effectively instantaneously; these types of benefits can be costed and also tend to resonate with business stakeholders.

Increasing sales / improving profitability

  • All information programmes (indeed most any business activity) should be dedicated to increasing profitability of course. In some specific industries the leverage of high-quality information is more readily associated with profitability than others. However, with enough time spent understanding the dynamics of an organisation, I would suggest that it is possible to make this linkage in a credible manner in pretty much any industry sector.
  • With respect to sales, sometimes if you want to increase say cross-selling, a very effective way is simply to measure it, maybe by department and salesperson. If there is some reliable way to track this, improvements in cross-selling will inevitably follow.

Mitigating operational risk

  • More reliable, unbiased and transparent production of information can address a number of operational risks; what these are specifically will vary from organisation to organisation.
  • However, most years see some organisation or another have to restate their results – there have been cases where adding two figures rather than subtracting them has led to a later restatement. Cases can often be built around the specific pain points in an organisation, or sometimes even near misses that were caught at the 11th hour.
  • Equally the cost of checking and re-checking figures before publication can be extremely high.

It is also generally worth asking business users what value they would ascribe to improved information, for example what things could they do under new arrangements that they cannot do now? It is important here that any benefits – and in particular any ones which prove to be intangible – are expressed in business language, not technical jargon.

2b. What does Good look like?

OK this dates me - I don't care!

Answering this question is predicated on both experience of successful information improvement programmes and a degree of knowledge about the general information market. There are two main elements here, what does good look like technically and what does it look like from a process / people perspective.

To cover the technical first, this is the simpler area, not least as we have understood how to develop robust, flexible and highly-performing information architectures for at least 15 years.

Integrated Information Architecture (click to view a larger version in a new tab)

The basics are shown in the diagram above [6]. Questions to consider here include:

  • What would a new information architecture look like?
  • What are the characteristics of the new which would indicate that it is an improvement on the old, can these be articulated to non-technical people?
  • What are required elements and how do they relate to the high-level needs captured in the Situational Analysis?
  • How does the proposed architecture relate to incumbent technologies and current staff skills?
  • Can any elements of existing information provision be leveraged, either temporarily or on an ongoing basis?
  • What has worked for other organisations and why would this be pertinent to the organisation in question?
  • Are any new developments in technology pertinent?

Arguably the more important area is the non-technical. Here there is a range of items to consider, some of which are captured in the following exhibit [7]:

Information Process (click to view a larger version  in a new tab)

I could spend an separate set of articles commenting on the elements of the above diagram; indeed I already have and interested readers are directed to the footnotes for links to some of these [8]. However it is worth pointing out the critical role to be played by both user education (a more apt phrase than training) and formal Data Governance. Also certain elements of information tend to work well when they sit within a regular business process; such as a monthly or quarterly review of specific aspects of results and future projections.
 
 
3. How do we get there, how long will it take and what will it cost?

Tube ticket machines

3a. Outline an Indicative Programme of Work

I am not going to offer Programme Planning 101 here, but briefly the first step in putting together an indicative programme of work is to decompose the overall journey into chunks, each of which can then be estimated. Each chunk should cover a group of reports / analyses and include activities from requirements gathering through to testing and finally deployment [9]. For the purposes of an indicative programme within a strategy document, the strategist can rely upon both information gathered in the Situational Analysis and their own experience of how to best decompose such work. Ultimately the size and number of the chunks should be dictated by business need, but at this stage estimates can be based upon experience and reasonable assumptions.

It is important that each chunk (or sub-chunk) delivers value and offers an opportunity for the approach and progress to be reviewed. A further factor to consider when estimating these chunks is that they should be delivered at a pace which allows them to be properly digested by users; resource allocations should reflect this. For each chunk the strategist should consider the type and quantum of resource required and the timing with which these are applied.

The indicative programme plan should also include a first phase which relates to reviewing the plan itself. Forming a strategy involves less people than running a programme. Even if initial estimation is carried out very diligently, it is likely that further issues will emerge once more detailed work later commences. As the information programme team ramps up, it is important that time is allocated for new team members to kick the tyres on the plan and make recommendations for improvement.

3b. How much will it cost?

Coins on scales

A big element of cost estimates will be a by-product of the indicative programme plan, which will cover programme duration and the amount of resource required at different points. Some further questions to consider when looking to catalogue costs include the following:

  • What are baseline costs for current information provision?
  • To what degree to these need to be incurred in parallel to an information improvement programme, are there ways to reduce these legacy costs to free up funds for the central programme?
  • What transitional costs are needed to execute the Information Strategy?
    • Hardware and software: is change necessary?
    • People: what is the best balance between internal, contract and outsourced resources, to what degree can existing staff be leveraged without compromising their current responsibilities?
    • How will costs vary by programme phase, will these taper as elements of older information systems are replaced by new facilities?
    • Can costs be reduced by having people play different roles at different points in the programme?
  • What costs will be ongoing once the strategy has been executed?
  • How do these compare to the current baseline?
  • Sometimes one aim of an Information Strategy will be to reduce to cost of ongoing support and maintenance, if so, how will this be achieved and how will any transition be managed?

A consideration here is whether the most important thing is to maximise speed of delivery or minimise risk? Things that will reduce risk could include: initial exploratory phases; starting with a small number of programme resources and increasing these based only on success; and instigating appropriate governance processes. However each of these will also increase duration and therefore cost. In some areas a trade off will be necessary and which side of these equations is more important will vary from organisation to organisation.
 
 
4. Will the trip be worth it?

Pros and cons

Answering parts of question 2 will help with getting a handle on potential benefits of executing an Information Strategy. Work on question 3 will get us an idea of the timeframes and costs involved. There is a need to combine the two of these into a cost / benefit analysis. This should be an honest and transparent assessment of the potential payback of adopting the Information Strategy. Given that most Information Strategies will take more than a year to implement and that benefits may equally be realised on an ongoing basis, it will generally make sense to look at figures over a 3-5 year period. It may be possible to draw up a quasi-P&L statement showing the impact of adopting the strategy, such an approach can resonate with senior stakeholders.

Points to recall and questions to consider here include:

  • Costs will emerge from the Indicative Programme Plan, but remember the ongoing costs of maintaining existing information capabilities.
  • As with most initiatives, the benefits of information programmes split into tangible and intangible components:
    • Where possible make benefits tangible even if this requires a degree of guesstimation [10].
    • Remember that many supposed intangibles can be estimated with some thought.
  • What benefits have other companies seen from similar programmes, particularly ones in the same industry sector?
  • Is it possible to perform “what if?” scenarios with current and future capabilities; could better information could have led to better outcomes? [11]
  • Ask business people to estimate the impact of better information.
  • Intangible benefits resonate where they are expressed in clear business language, not IT speak.

It should be borne in mind here that the cost / benefit analysis may not add up. If this is the case, then either a less expensive approach is more suitable for the company, or the potential benefits need to be looked at again. Where progress can genuinely not be made on either of these areas, the responsible strategist will acknowledge that doing nothing may well be the logical approach for the organisation in question.
 
 
5. What else can we do along the way?

Here be elephants

Finally, it is worth noting that short-term tactical deliveries can strongly support a strategy [12]. Interim work can meet urgent business needs in a timely manner. This is a substantial benefit in itself and also evidences progress in the area of improving information capabilities. It also demonstrates that that the programme team understands commercial pressures. This type of work is also complementary in that it can be used to:

  • Validate some elements of the cost / benefit analysis.
  • Round out requirements gathering.
  • Highlight any areas which have been overlooked.
  • Provide invaluable deployment and training experience, which can be leveraged for the implementation of more strategic capabilities.

It can also be useful make mistakes early and with small deliverables, not later with major ones. For these reasons, it is suggested that any Information Strategy should embrace “throw away” work. However this should be reflected in the overall programme plan and resources should be specifically allocated to this area. If this is not done, then tactical work can easily overwhelm the team and prevent progress on more strategic areas from being made; generally a death knell for a programme.
 
 
A Recap of the Main Points

  1. Carry out a Situational Analysis.
  2. As part of this, start the process of capturing High-level Business Requirements.
  3. Establish Drivers for Change, what benefits can be realised by better information, or by producing information in a better way?
  4. Ask “What Does Good Look Like?”, from both a technical and a process / people point of view.
  5. Develop an Indicative Programme of Work with realistic resource estimates and durations.
  6. Estimate Current, Transitional and Ongoing Costs.
  7. Itemise some of the major Interim Deliverables.
  8. Create a Cost / Benefits Analysis.

 
Bringing everything together

Chickie in dee Basget! Ing vurn spuur dee Chickie, Uun yeh vurn spay dee Basget!

There is a need to take the detailed work described over the course of the last three articles and the documentation which has been created as part of the process and to distill these down into a format that is digestible by senior management. There is no silver bullet here, summarising screeds of detail in a way that preserves the main points and presents them in a way that resonates is not easy. It takes judgement, an understanding of how businesses operate and strong analytical, writing and often diagrammatic skills. These will not be acquired by reading a blog article, but by honing experience and expertise over many years of work. To an extent, producing relevant and cogent summaries is where good IT professionals earn their money.

Unfortunately, at the time of writing, there is no book entitled Summarising Complex Issues for Dummies [13], [14].

This article and its two predecessors have been akin to listing the ingredients required to make a complex meal. While it is difficult to make great food without good ingredients or with some key spice missing, these things are not sufficient to ensure culinary excellence; what is also needed is a competent chef [15]. I cook a lot myself and, whenever I try a recipe for the first time, it can be a bit fraught. Sometimes I don’t get all of the elements of the meal ready at the same time, sometimes while I’m paying attention to reading the instructions for one part, another part boils over, or gets burnt. These problems with cooking tend dissipate with repetition. In the same way, what is generally needed in developing a sound Information Strategy is the equivalents great ingredients, a competent chef and an experienced one as well.
 

Forming an Information Strategy
I – General Strategy II – Situational Analysis III – Completing the Strategy

 
Notes

 
[1]
 
These include (in chronological order):

 
[2]
 
IRM European Data Warehouse and Business Intelligence Conference
– November 2012
 
[3]
 
Where this is the case, I will of course provide links back to my previous work.
 
[4]
 
Some of the factors here may come to light as a result of the previous Situational Analysis of course.
 
[5]
 
I grapple with estimating the potential payback of Information Programmes in a series of earlier articles:

 
[6]
 
This is an expanded version of the diagram I posted as part of Using multiple business intelligence tools in an implementation – Part I back in May 2009. I have elided details such as the fine structure of the warehouse (staging, relational, multidimensional etc.), master data sources and also which parts of it are accessed by different tools and different types of users. In a severe breach with the traditional IT approach, I have also left some arrows out.
 
[7]
 
This is an updated version of an exhibit I put together working with an actuarial colleague back in 2001, early in my journey into information improvement programmes.
 
[8]
 
These include my trilogy on the change management aspects of information programmes:

and a number of articles relating to Data Governance / Data Quality, notably:

 
[9]
 
Sometimes the first level of decomposition will need to be broken up into further and smaller chunks with this process iterating until the strategist reaches tasks which they are happy to estimate with a degree of certainty.
 
[10]
 
It may make sense to have different versions of the cost / benefit analysis, more conservative ones including only the most tangible benefits and more aggressive ones taking in to account benefits which have to be somewhat less certain.
 
[11]
 
Again see the series of three articles starting with Using historical data to justify BI investments – Part I.
 
[12]
 
For further thoughts on the strategic benefits of tactical work see:

 
[13]
 
Given both the two interpretations of this phrase and the typical audience for summaries of strategies, perhaps this is a fortunate thing.
 
[14]
 
I did however find the following title:

I can't however seem to find either Quantum Chromodynamics or Brain Surgery for Dummies

 
[15]
 
Contrary to the image above, a muppet (in the English sense of the word) won’t suffice.

 

 

The need for collaboration between teams using the same data in different ways

The Data Warehousing Institute

This article is based on conversations that took place recently on the TDWI LinkedIn Group [1].

The title of the discussion thread posted was “Business Intelligence vs. Business Analytics: What’s the Difference?” and the original poster was Jon Dohner from Information Builders. To me the thread topic is something of an old chestnut and takes me back to the heady days of early 2009. Back then, Big Data was maybe a lot more than just a twinkle in Doug Cutting and Mike Cafarella‘s eyes, but it had yet to rise to its current level of media ubiquity.

Nostalgia is not going to be enough for me to start quoting from my various articles of the time [2] and neither am I going to comment on the pros and cons of Information Builders’ toolset. Instead I am more interested in a different turn that discussions took based on some comments posted by Peter Birksmith of Insurance Australia Group.

Peter talked about two streams of work being carried out on the same source data. These are Business Intelligence (BI) and Information Analytics (IA). I’ll let Peter explain more himself:

BI only produces reports based on data sources that have been transformed to the requirements of the Business and loaded into a presentation layer. These reports present KPI’s and Business Metrics as well as paper-centric layouts for consumption. Analysis is done via Cubes and DQ although this analysis is being replaced by IA.

[…]

IA does not produce a traditional report in the BI sense, rather, the reporting is on Trends and predictions based on raw data from the source. The idea in IA is to acquire all data in its raw form and then analysis this data to build the foundation KPI and Metrics but are not the actual Business Metrics (If that makes sense). This information is then passed back to BI to transform and generate the KPI Business report.

I was interested in the dual streams that Peter referred to and, given that I have some experience of insurance organisations and how they work, penned the following reply [3]:

Hi Peter,

I think you are suggesting an organisational and technology framework where the source data bifurcates and goes through two parallel processes and two different “departments”. On one side, there is a more traditional, structured, controlled and rules-based transformation; probably as the result of collaborative efforts of a number of people, maybe majoring on the technical side – let’s call it ETL World. On the other a more fluid, analytical (in the original sense – the adjective is much misused) and less controlled (NB I’m not necessarily using this term pejoratively) transformation; probably with greater emphasis on the skills and insights of individuals (though probably as part of a team) who have specific business knowledge and who are familiar with statistical techniques pertinent to the domain – let’s call this ~ETL World, just to be clear :-).

You seem to be talking about the two of these streams constructively interfering with each other (I have been thinking about X-ray Crystallography recently). So insights and transformations (maybe down to either pseudo-code or even code) from ~ETL World influence and may be adopted wholesale by ETL World.

I would equally assume that, if ETL World‘s denizens are any good at their job, structures, datasets and master data which they create (perhaps early in the process before things get multidimensional) may make work more productive for the ~ETLers. So it should be a collaborative exercise with both groups focused on the same goal of adding value to the organisation.

If I have this right (an assumption I realise) then it all seems very familiar. Given we both have Insurance experience, this sounds like how a good information-focused IT team would interact with Actuarial or Exposure teams. When I have built successful information architectures in insurance, in parallel with delivering robust, reconciled, easy-to-use information to staff in all departments and all levels, I have also created, maintained and extended databases for the use of these more statistically-focused staff (the ~ETLers).

These databases, which tend to be based on raw data have become more useful as structures from the main IT stream (ETL World) have been applied to these detailed repositories. This might include joining key tables so that analysts don’t have to repeat this themselves every time, doing some basic data cleansing, or standardising business entities so that different data can be more easily combined. You are of course right that insights from ~ETL World often influence the direction of ETL World as well. Indeed often such insights will need to move to ETL World (and be produced regularly and in a manner consistent with existing information) before they get deployed to the wider field.

Now where did I put that hairbrush?

It is sort of like a research team and a development team, but where both “sides” do research and both do development, but in complementary areas (reminiscent of a pair of entangled electrons in a singlet state, each of whose spin is both up and down until they resolve into one up and one down in specific circumstances – sorry again I did say “no more science analogies”). Of course, once more, this only works if there is good collaboration and both ETLers and ~ETLers are focussed on the same corporate objectives.

So I suppose I’m saying that I don’t think – at least in Insurance – that this is a new trend. I can recall working this way as far back as 2000. However, what you describe is not a bad way to work, assuming that the collaboration that I mention is how the teams work.

I am aware that I must have said “collaboration” 20 times – your earlier reference to “silos” does however point to a potential flaw in such arrangements.

Peter

PS I talk more about interactions with actuarial teams in: BI and a different type of outsourcing

PPS For another perspective on this area, maybe see comments by @neilraden in his 2012 article What is a Data Scientist and what isn’t?

I think that the perspective of actuaries having been data scientists long before the latter term emerged is a sound one.

I couldn't find a suitable image from Sesame Street :-o

Although the genesis of this thread dates to over five years ago (an aeon in terms of information technology), I think that – in the current world where some aspects of the old divide between technically savvy users [4] and IT staff with strong business knowledge [5] has begun to disappear – there is both an opportunity for businesses and a threat. If silos develop and the skills of a range of different people are not combined effectively, then we have a situation where:

| ETL World | + | ~ETL World | < | ETL World ∪ ~ETL World |

If instead collaboration, transparency and teamwork govern interactions between different sets of people then the equation flips to become:

| ETL World | + | ~ETL World | ≥ | ETL World ∪ ~ETL World |

Perhaps the way that Actuarial and IT departments work together in enlightened insurance companies points the way to a general solution for the organisational dynamics of modern information provision. Maybe also the, by now somewhat venerable, concept of a Business Intelligence Competency Centre, a unified team combining the best and brightest from many fields, is an idea whose time has come.
 
 
Notes

 
[1]
 
A link to the actual discussion thread is provided here. However You need to be a member of the TDWI Group to view this.
 
[2]
 
Anyone interested in ancient history is welcome to take a look at the following articles from a few years back:

  1. Business Analytics vs Business Intelligence
  2. A business intelligence parable
  3. The Dictatorship of the Analysts
 
[3]
 
I have mildly edited the text from its original form and added some new links and new images to provide context.
 
[4]
 
Particularly those with a background in quantitative methods – what we now call data scientists
 
[5]
 
Many of whom seem equally keen to also call themselves data scientists

 

 

Ten Million Aliens – More musings on BI-ology

Introduction

Ten Million Aliens by Simon Barnes

This article relates to the book Ten Million Aliens – A Journey Through the Entire Animal Kingdom by British journalist and author Simon Barnes, but is not specifically a book review. My actual review of this entertaining and informative work appears on Amazon and is as follows:

Having enjoyed Simon’s sport journalism (particularly his insightful and amusing commentary on Test Match cricket) for many years, I was interested to learn about this new book via his web-site. As an avid consumer of pop-science literature and already being aware of Simon’s considerable abilities as a writer, I was keen to read Ten Million Aliens. To be brief, I would recommend the book to anyone with an enquiring mind, an interest in the natural world and its endless variety, or just an affection for good science writing. My only sadness was that the number of phyla eventually had to come to an end. I laughed in places, I was better informed than before reading a chapter in others and the autobiographical anecdotes and other general commentary on the state of our stewardship of the planet added further dimensions. I look forward to Simon’s next book.

Instead this piece contains some general musings which came to mind while reading Ten Million Aliens and – as is customary – applies some of these to my own fields of professional endeavour.
 
 
Some Background

David Ivon Gower

Regular readers of this blog will be aware of my affection for Cricket[1] and also my interest in Science[2]. Simon Barnes’s work spans both of these passions. I became familiar with Simon’s journalism when he was Chief Sports Writer for The Times[3] an organ he wrote for over 32 years. Given my own sporting interests, I first read his articles specifically about Cricket and sometimes Rugby Union, but began to appreciate his writing in general and to consume his thoughts on many other sports.

There is something about Simon’s writing which I (and no doubt many others) find very engaging. He manages to be both insightful and amusing and displays both elegance of phrase and erudition without ever seeming to show off, or to descend into the overly-florid prose of which I can sometimes (OK often) be guilty. It also helps that we seem to share a favourite cricketer in the shape of David Gower, who appears above and was the most graceful bastman to have played for England in the last forty years. However, it is not Simon’s peerless sports writing that I am going to focus on here. For several years he also penned a wildlife column for The Times and is a patron of a number of wildlife charities. He has written books on, amongst other topics, birds, horses, his safari experiences and conservation in general.

Green Finch, Great Tit, Lesser Spotted Woodpecker, Tawny Owl, Magpie, Carrion Crow, Eurasian Jay, Jackdaw

My own interest in science merges into an appreciation of the natural world, perhaps partly also related to the amount of time I have spent in remote and wild places rock-climbing and bouldering. As I started to write this piece, some welcome November Cambridge sun threw shadows of the Green Finches and Great Tits on our feeders across the monitor. Earlier in the day, my wife and I managed to catch a Lesser Spotted Woodpecker, helping itself to our peanuts. Last night we stood on our balcony listening to two Tawny Owls serenading each other. Our favourite Corvidae family are also very common around here and we have had each of the birds appearing in the bottom row of the above image on our balcony at some point. My affection for living dinosaurs also extends to their cousins, the herpetiles, but that is perhaps a topic for another day.

Ten Million Aliens has the modest objectives, revealed by its sub-title, of saying something interesting about about each of the (at the last count) thirty-five phyla of the Animal Kingdom[4] and of providing some insights in to a few of the thousands of familes and species that make these up. Simon’s boundless enthusiasm for the life he sees around him (and indeed the life that is often hidden from all bar the most intrepid of researchers), his ability to bring even what might be viewed as ostensibly dull subject matter[5] to life and a seemingly limitless trove of pertinent personal anecdotes, all combine to ensure not only that he achieves these objectives, but that he does so with some élan.
 
 
Classifications and Hierarchies

Biological- Classification

Well having said that this article wasn’t going to be a book review, I guess it has borne a striking resemblance to one so far. Now to take a different tack; one which relates to three of the words that I referenced and provided links to in the last paragraph of the previous section: phylum, family and species. These are all levels in the general classification of life. At least one version of where these three levels fit into the overall scheme of things appears in the image above[6]. Some readers may even be able to recall a related mnemonic from years gone by: Kings Play Chess on Fine Green Sand[7].

The father of modern taxonomy, Carl Linnaeus, founded his original biological classification – not unreasonably – on the shared characteristics of organisms; things that look similar are probably related. Relations mean that like things can be collected together into groups and that the groups can be further consolidated into super-groups. This approach served science well for a long time. However when researchers began to find more and more examples of convergent evolution[8], Linnaeus’s rule of thumb was seen to not always apply and complementary approaches also began to be adopted.

Cladogram

One of these approaches, called Cladistics, focuses on common ancestors rather than shared physical characteristics. Breakthroughs in understanding the genetic code provided impetus to this technique. The above diagram, referred to as a cladogram, represents one school of thought about the relationship between avian dinosaurs, non-avian dinosaurs and various other reptiles that I mentioned above.

It is at this point that the Business Intelligence professional may begin to detect something somewhat familiar[9]. I am of course talking about both dimensions and organising these into hierarchies. Dimensions are the atoms of Business Intelligence and Data Warehousing[10]. In Biological Classification: H. sapiens is part of Homo , which is part of Hominidae, which is part of Primates, which is part of Mammalia, which is part of Chordata, which then gets us back up to Animalia[11]. In Business Intelligence: Individuals make up Teams, which make up Offices, which make up Countries and Regions.

Above I references different approaches to Biological Classification, one based on shared attributes, the other on homology of DNA. This also reminds me of the multiple ways to roll-up dimensions. To pick the most obvious, Day rolls up to Month, Quarter, Half-Year and Year; but also in a different manner to Week and then Year. Given that the aforementioned DNA evidence has caused a reappraisal of the connections between many groups of animals, the structures of Biological Classification are not rigid and instead can change over time[12]. Different approaches to grouping living organisms can provide a range of perspectives, each with its own benefits. In a similar way, good BI/DW design practices should account for both dimensions changing and the fact that different insights may well be provided by parallel dimension hierarchies.

In summary, I suppose what I am saying is that BI/DW practitioners, as well as studying the works of Inmon and Kimball, might want to consider expanding their horizons to include Barnes; to say nothing of Linnaeus[13]. They might find something instructive in these other taxonomical works.
 


 
Notes

 
[1]
 
Articles from this blog in which I intertwine Cricket and aspects of business, technology and change include (in chronological order):

 
[2]
 
Articles on this site which reference either Science or Mathematics are far too numerous to list in full. A short selection of the ones I enjoyed writing most would include (again in chronological order):

 
[3]
 
Or perhaps The London Times for non-British readers, despite the fact that it was the first newspaper to bear that name.
 
[4]
 
Here “Aninal Kingdom” is used in the taxonomical sense and refers to Animalia.
 
[5]
 
For an example of the transformation of initially unpromising material, perhaps check out the chapter of Ten Million Aliens devoted to Entoprocta.
 
[6]
 
With acknowledgment to The Font.
 
[7]
 
Though this elides both Domains and Johny-come-latelies like super-families, sub-genuses and hyper-orders [I may have made that last one up of course].
 
[8]
 
For example the wings of Pterosaurs, Birds and Bats.
 
[9]
 
No pun intended.
 
[10]
 
This metaphor becomes rather cumbersome when one tries to extend it to cover measures. It’s tempting to perhaps align these with fundamental forces, and thus bosons as opposed to combinations of fermions, but the analogy breaks down pretty quickly, so let’s conveniently forget that multidimensional data structures have fact tables at their hearts for now.
 
[11]
 
Here I am going to strive manfully to avoid getting embroiled in discussions about domains, superregnums, superkingdoms, empires, or regios and instead leave the interested reader to explore these areas themselves if they so desire. Ten Million Aliens itself could be one good starting point, as could the following link.
 
[12]
 
Science is yet to determine whether these slowly changing dimensions are of Type 1, 2, 3 or 4 (it has however been definitively established that they are not Type 6 / Hybrid).
 
[13]
 
Interesting fact of the day: Linnaeus’s seminal work included an entry for The Kraken, under Cephalopoda

 

 

A Dictionary of the Business Intelligence Language

Software Advice article

Michael Koploy of on-line technology consulting company Software Advice recently asked me, together with four other people from the Business Intelligence / Data Warehousing community, to contribute some definitions of commonly-used technology jargon pertinent to our field. The results can be viewed in his article, BI Buzzword Breakdown. Readers may be interested in the differing, but hopefully complementary, definitions that were offered.

In jockeying for space with my industry associates, only one of my definitions (that relating to Data Mining) was used. Here are two others, which were left on the cutting room floor. Maybe they’ll make it to the DVD extras.
The equivalent of the Unicorn dream sequence in Bladerunner, but imbued with greater dramatic meaning...

Big Data Rather than having the entirely obvious meaning, has come to be associated with a set of technologies, some of them open source, that emerged from the needs of several of the major on-line businesses (Google, Yahoo, Facebook and Amazon) to analyse the large amount of data they had relating to how people interact with their web-sites. The area is often linked to Apache Hadoop, a low-cost technology that allows commodity servers to be combined to collectively to store large amounts of data, particularly where the structure of these varies considerably and particularly where there is a need to support unpredictably-growing volumes.
   
Data Warehouse A collection of data, generally emanating from a number of different systems, which is combined to form a consistent structure suitable for the support of a variety of reporting and analytical needs. Most warehouses will have an element of data stored in a multi-dimensional format; i.e. one that is intended to support pivot-table like slicing and dicing. This is achieved using specific data structures: Fact tables, which hold figures, or measures (like profit, or sales, or growth); and dimension tables, which hold business entities, or dimensions (like countries, weeks, product lines, salesman etc.). The dimensions are often nested into hierarchies, such as Region => Country => City => Area. Warehouse data is generally leveraged using traditional reports, On-Line Analytical Processing (OLAP) and more advanced analytical approaches, such as data mining.

Approximately 5.5 cm isn't THAT big is it?

The above comments are perhaps most notable for representing my first reference to the latest information hot topic, the rather misleadingly named Big Data. To date I have rather avoided the rampaging herd in this area – maybe through fear of being crushed in the stampede – but it is probably a topic to which I will return once there is less hype and more substance to comment on.
 

I will be presenting at the IRM European Data Warehouse and Business Intelligence Conference

IRM UK - European Data Warehousing and Business Intelligence Conference - 2011

This IRM UK event will be taking place in central London from the 7th to 9th November 2011. It is co-located with the IRM Data Management & Information Quality Conference. Full details may be obtained from the IRM conference web-site here. I am speaking on the morning of the 9th and will be building on themes introduced in my previous artcile: A Single Version of the Truth?
 

 

Using historical data to justify BI investments – Part III

The earliest recorded surd

This article completes the three-part series which started with Using historical data to justify BI investments – Part I and continued (somewhat inevitably) with Using historical data to justify BI investments – Part II. Having presented a worked example, which focused on using historical data both to develop a profit-enhancing rule and then to test its efficacy, this final section considers the implications for justifying Business Intelligence / Data Warehouse programmes and touches on some more general issues.
 
 
The Business Intelligence angle

In my experience when talking to people about the example I have just shared, there can be an initial “so what?” reaction. It can maybe seem that we have simply adopted the all-too-frequently-employed business ruse of accentuating the good and down-playing the bad. Who has not heard colleagues say “this was a great month excluding the impact of X, Y and Z”? Of course the implication is that when you include X, Y and Z, it would probably be a much less great month; but this is not what we have done.

One goal of business intelligence is to help in estimating what is likely to happen in the future and guiding users in taking decisions today that will influence this. What we have really done in the above example is as follows:

Look out Morlocks, here I come... [alumni of Imperial College London are so creative aren't they?]

  1. shift “now” back two years in time
  2. pretend we know nothing about what has happened in these most recent two years
  3. develop a predictive rule based solely on the three years preceding our back-shifted “now”
  4. then use the most recent two years (the ones we have metaphorically been covering with our hand) to see whether our proposed rule would have been efficacious

For the avoidance of doubt, in the previously attached example, the losses incurred in 2009 – 2010 have absolutely no influence on the rule we adopt, this is based solely on 2006 – 2008 losses. All the 2009 – 2010 losses are used for is to validate our rule.

We have therefore achieved two things:

  1. Established that better decisions could have been taken historically at the juncture of 2008 and 2009
  2. Devised a rule that would have been more effective and displayed at least some indication that this could work going forward in 2011 and beyond

From a Business Intelligence / Data Warehousing perspective, the general pitch is then something like:

Eight out of ten cats said that their owners got rid of stubborn stains no other technology could shift with BI - now with added BA

  1. if we can mechanically take such decisions, based on a very non-sophisticated analysis of data, then if we make even simple information available to the humans taking decisions (i.e. basic BI), then surely the quality of their decision-making will improve
  2. If we go beyond this to provide more sophisticated analyses (e.g. including industry segmentation, analysis of insured attributes, specific products sold etc., i.e. regular BI) then we can – by extrapolation from the example – better shape the evolution of the performance of whole books of business
  3. We can also monitor the decisions taken to determine the relative effectiveness of individuals and teams and compare these to their peers – ideally these comparisons would also be made available to the individuals and teams themselves, allowing them to assess their relative performance (again regular BI)
  4. Finally, we can also use more sophisticated approaches, such as statistical modelling to tease out trends and artefacts that would not be easily apparent when using a standard numeric or graphical approach (i.e. sophisticated BI, though others might use the terms “data mining”, “pattern recognition” or the now ubiquitous marketing term “analytics”)

The example also says something else – although we may already have reporting tools, analysis capabilities and even people dabbling in statistical modelling, it appears that there is room for improvement in our approach. The 2009 – 2010 loss ratio was 54% and it could have been closer to 40%. Thus what we are doing now is demonstrably not as good as it could be and the monetary value of making a stepped change in information capabilities can be estimated.

The generation of which should be the object of any BI/DW project worth its salt - thinking of which, maybe a mound of salt would also have worked as an illustration

In the example, we are talking about £1m of biannual premium and £88k of increased profit. What would be the impact of better information on an annual book of £1bn premium? Assuming a linear relationship and using some advanced Mathematics, we might suggest £44m. What is more, these gains would not be one-off, but repeatable every year. Even if we moderate our projected payback to a more conservative figure, our exercise implies that we would be not out of line to suggest say an ongoing annual payback of £10m. These are numbers and concepts which are likely to resonate with Executive decision-makers.

To put it even more directly an increase of £10m a year in profits would quickly swamp the cost of a BI/DW programme in very substantial benefits. These are payback ratios that most IT managers can only dream of.

As an aside, it may have occurred to readers that the mechanistic rule is actually rather good and – if so – why exactly do we need the underwriters? Taking to one side examples of solely rule-based decision-making going somewhat awry (LTCM anyone?) the human angle is often necessary in messy things like business acquisition and maintaining relationships. Maybe because of this, very few insurance organisations are relying on rules to take all decisions. However it is increasingly common for rules to play some role in their overall approach. This is likely to take the form of triage of some sort. For example:

  1. A rule – maybe not much more sophisticated than the one I describe above – is established and run over policies before renewal.
  2. This is used to score polices as maybe having green, amber or red lights associated with them.
  3. Green policies may be automatically renewed with no intervention from human staff
  4. Amber polices may be looked at by junior staff, who may either OK the renewal if they satisfy themselves that the issues picked up are minor, or refer it to more senior and experienced colleagues if they remain concerned
  5. Red policies go straight to the most experienced staff for their close attention

In this way process efficiencies are gained. Staff time is only applied where it is necessary and the most expensive resources are applied to those cases that most merit their abilities.

 
Correlation

From the webcomic of the inimitable Randall Munroe - his mouse-over text is a lot better than mine BTW
© xkcd.com

Let’s pause for a moment and consider the Insurance example a little more closely. What has actually happened? Well we seem to have established that performance of policies in 2006 – 2008 is at least a reasonable predictor of performance of the same policies in 2009 – 2010. Taking the mutual fund vendors’ constant reminder that past performance does not indicate future performance to one side, what does this actually mean?

What we have done is to establish a loose correlation between 2006 – 2008 and 2009 – 2010 loss ratios. But I also mentioned a while back that I had fabricated the figures, so how does that work? In the same section, I also said that the figures contained an intentional bias. I didn’t adjust my figures to make the year-on-year comparison work out. However, at the policy level, I was guilty of making the numbers look like the type of results that I have seen with real policies (albeit of a specific type). Hopefully I was reasonably realistic about this. If every policy that was bad in 2006 – 2008 continued in exactly the same vein in 2009 – 2010 (and vice versa) then my good segment would have dropped from an overall loss ratio of 54% to considerably less than 40%. The actual distribution of losses is representative of real Insurance portfolios that I have analysed. It is worth noting that only a small bias towards policies that start bad continuing to be bad is enough for our rule to work and profits to be improved. Close scrutiny of the list of policies will reveal that I intentionally introduced several counter-examples to our rule; good business going bad and vice versa. This is just as it would be in a real book of business.

Not strongly correlated

Rather than continuing to justify my methodology, I’ll make two statements:

  1. I have carried out the above sort of analysis on multiple books of Insurance business and come up with comparable results; sometimes the implied benefit is greater, sometimes it is less, but it has been there without exception (of course statistics being what it is, if I did the analysis frequently enough I would find just such an exception!).
  2. More mathematically speaking, the actual figure for the correlation between the two sets of years is a less than stellar 0.44. Of course a figure of 1 (or indeed -1) would imply total correlation, and one of 0 would imply a complete lack of correlation, so I am not working with doctored figures. Even a very mild correlation in data sets (one much less than the threshold for establishing statistical dependence) can still yield a significant impact on profit.

 
Closing thoughts

Ground floor: Perfumery, Stationery and leather goods, Wigs and haberdashery, Kitchenware and food…. Going up!

Having gone into a lot of detail over the course of these three articles, I wanted to step back and assess what we have covered. Although the worked-example was drawn from my experience in Insurance, there are some generic learnings to be made.

Broadly I hope that I have shown that – at least in Insurance, but I would argue with wider applicability – it is possible to use the past to infer what actions we should take in the future. By a slight tweak of timeframes, we can even take some steps to validate approaches suggested by our information. It is important that we remember that the type of basic analysis I have carried out is not guaranteed to work. The same can be said of the most advanced statistical models; both will give you some indication of what may happen and how likely this is to occur, but neither of them is foolproof. However, either of these approaches has more chance of being valuable than, for example, solely applying instinct, or making decisions at random.

In Patterns, patterns everywhere, I wrote about the dangers associated with making predictions about events are essentially unpredictable. This is another caveat to be born in mind. However, to balance this it is worth reiterating that even partial correlation can lead to establishing rules (or more sophisticated models) that can have a very positive impact.

While any approach based on analysis or statistics will have challenges and need careful treatment, I hope that my example shows that the option of doing nothing, of continuing to do things how they have been done before, is often fraught with even more problems. In the case of Insurance at least – and I suspect in many other industries – the risks associated with using historical data to make predictions about the future are, in my opinion, outweighed by the risks of not doing this; on average of course!

But then 1=2 for very large values of 1
 

Using historical data to justify BI investments – Part II

The earliest recorded surd

This article is the second in what has now expanded from a two-part series to a three-part one. This started with Using historical data to justify BI investments – Part I and finishes with Using historical data to justify BI investments – Part III (once again exhibiting my talent for selecting buzzy blog post titles).
 
 
Introduction and some belated acknowledgements

The intent of these three pieces is to present a fairly simple technique by which existing, historical data can be used to provide one element of the justification for a Business Intelligence / Data Warehousing programme. Although the specific example I will cover applies to Insurance (and indeed I spent much of the previous, introductory segment discussing some Insurance-specific concepts which are referred to below), my hope is that readers from other sectors (or whose work crosses multiple sectors) will be able to gain something from what I write. My learnings from this period of my career have certainly informed my subsequent work and I will touch on more general issues in the third and final section.

This second piece will focus on the actual insurance example. The third will relate the example to justifying BI/DW programmes and, as mentioned above, also consider the area more generally.

Before starting on this second instalment in earnest, I wanted to pause and mention a couple of things. At the beginning of the last article, I referenced one reason for me choosing to put fingertip to keyboard now, namely me briefly referring to my work in this area in my interview with Microsoft’s Bruno Aziza (@brunoaziza). There were a couple of other drivers, which I feel rather remiss to have not mentioned earlier.

First, James Taylor (@jamet123) recently published his own series of articles about the use of BI in Insurance. I have browsed these and fully intend to go back and read them more carefully in the near future. I respect James and his thoughts brought some of my own Insurance experiences to the fore of my mind.

Second, I recently posted some reflections on my presentation at the IRM MDM / Data Governance seminar. These focussed on one issue that was highlighted in the post-presentation discussion. The approach to justifying BI/DW investments that I will outline shortly also came up during these conversations and this fact provided additional impetus for me to share my ideas more widely.
 
 
Winners and losers

Before him all the nations will be gathered, and he will separate them one from another, as a shepherd separates the sheep from the goats

The main concept that I will look to explain is based on dividing sheep from goats. The idea is to look at a set of policies that make up a book of insurance business and determine whether there is some simple factor that can be used to predict their performance and split them into good and bad segments.

In order to do this, it is necessary to select policies that have the following characteristics:

  1. Having been continuously renewed so that they at least cover a contiguous five-year period (policies that have been “in force” for five years in Insurance parlance).

    The reason for this is that we are going to divide this five-year term into two pieces (the first three and the final two years) and treat these differently.

  2. Ideally with the above mentioned five-year period terminating in the most recent complete year – at the time of writing 2010.

    This is so that the associated loss ratios better reflect current market conditions.

  3. Being short-tail policies.

    I explained this concept last time round. Short-tail policies (or lines or business) are ones in which any claims are highly likely to be reported as soon as they occur (for example property or accident insurance).

    These policies tend to have a low contribution from IBNR (again see the previous piece for a definition). In practice this means that we can use the simplest of the Insurance ratios, paid loss-ratio (i.e. simply Claims divided by Premium), with some confidence that it will capture most of the losses that will be attached to the policy, even if we are talking about say 2010.

    Another way of looking at this is that (borrowing an idea discussed last time round) for this type of policy the Underwriting Year and Calendar Year treatments are closer than in areas where claims may be reported many years after the policy was in force.

Before proceeding further, it perhaps helps to make things more concrete. To achieve this, you can download a spreadsheet containing a sample set of Insurance policies, together with their premiums and losses over a five-year period from 2006 to 2010 by clicking here (this is in Office 97-2003 format – if you would prefer, there is also a PDF version available here). Hopefully you will be able to follow my logic from the text alone, but the figures may help.

A few comments about the spreadsheet. First these are entirely fabricated policies and are not even loosely based on any data set that I have worked with before. Second I have also adopted a number of simplifications:

  1. There are only 50 policies, normally many thousand would be examined.
  2. Each policy has the same annual premium – £10,000 (I am British!) – and this premium does not change over the five years being considered. In reality these would vary immensely according to changes in cover and the insurer’s pricing strategy.
  3. I have entirely omitted dates. In practice not every policy will fit neatly into a year and account will normally need to be taken of this fact.
  4. Given that this is a fabricated dataset, the claims activity has not been generated randomly. Instead I have simply selected values (though I did perform a retrospective sense check as to their distribution). While this example is not meant to 100% reflect reality, there is an intentional bias in the figures; one that I will come back to later.

The sheet also calculates the policy paid loss ratio for each year and figures for the whole portfolio appear at the bottom. While the in-year performance of any particular policy can gyrate considerably, it may be seen from the aggregate figures that overall performance of this rather small book of business is relatively consistent:

Year Paid Loss Ratio
2006 53%
2007 59%
2008 54%
2009 53%
2010 54%
Total 54%

Above I mentioned looking at the five years in two parts. At least metaphorically we are going to use our right hand to cover the results from years 2009 and 2010 and focus on the first three years on the left. Later – after we have established a hypothesis based on 2006 to 2008 results – we can lift our hand and check how we did against the “real” figures.

For the purposes of this illustration, I want to choose a rather mechanistic way to differentiate business that has performed well and badly. In doing this I have to remember that a policy may have a single major loss one year and then run free of losses for the next 20. If I was simply to say any policy with a large loss is bad, I am potentially drastically and unnecessarily culling my book (and also closing the stable door after the horse has bolted). Instead we need to develop a rule that takes this into account.

In thinking about overall profitability, while we have greatly reduced the impact of both reported but unpaid claims and IBNR by virtue of picking a short-tail business, it might be prudent to make say a 5% allowance for these. If we also assume an expense ratio of 35%, then we have a total of non-underwriting-related outgoings of 40%. This means that we can afford to have a paid loss ratio of up to 60% (100% – 40%) and still turn a profit.

Using this insight, my simple rule is as follows:

A policy will be tagged as “bad” if two things occur:

  1. The overall three-year loss ratio is in excess of 60%

    i.e. is has been unprofitable over this period; and

  2. The loss ratio is in excess of 30% in at least two of the three years

    i.e. there is a sustained element to the poor performance and not just the one-off bad luck that can hit the best underwritten of policies

This rule roughly splits the book 75 / 25; with 74% of policies being good. Other choices of parameters may result in other splits and it would be advisable spending a little time optimising things. Perhaps 26% of policies being flagged as bad is too aggressive for example (though this rather depends on what you do about them – see below). However in the simpler world of this example, I’ll press on to the next stage with my first pick.

The ultimate sense of perspective

Well all we have done so far is to tag policies that have performed badly – in the parlance of Analytics zealots we are being backward-looking. Now it is time to lift our hand on 2009 to 2010 and try to be forward-looking. While these figures are obviously also backward looking (the day that someone comes up with future data I will eat my hat), from the frame of reference of our experimental perspective (sitting at the close of 2008), they can be thought of as “the future back then”. We will use the actual performance of the policies in 2009 – 2010 to validate our choice of good and bad that was based on 2006 – 2008 results.

Overall the 50 policies had a loss ratio of 54% in 2009 – 2010. However those flagged as bad in our above exercise had a subsequent loss ratio of 92%. Those flagged as good had a subsequent loss ratio of 40%. The latter is a 14 point improvement on the overall performance of the book.

So we can say with some certainly that our rule, though simplistic, has produced some interesting results. The third part of this series will focus more closely on why this has worked. For now, let’s consider what actions the split we have established could drive.
 
 
What to do with the bad?

You shall be taken to the place from whence you came...

We were running a 54% paid ratio in 2009-2010. Using the same assumptions as above, this might have equated to a 94% combined ratio. Our book of business had an annual premium of £0.5m so we received £1m over the two years. The 94% combined would have implied making a £60k profit if we had done nothing different. So what might have happened if we had done something?

There are a number of options. The most radical of these would have been to not renew any of the bad policies; to have carried out a cull. Let us consider what would have been the impact of such an approach. Well our book of business would have shrunk to £740k over the two years at a combined of 40% (the ratio of the good book) + 40% (other outgoing) = 80%, which implies a profit of £148k, up £88k. However there are reasons why we might not have wanted to so drastically shrink our business. A smaller pot of money for investment purposes might have been one. Also we might have had customers with policies in both the good and bad segments and it might have been tricky to cancel the bad while retaining the good. And so on…

Another option would have been to have refined our rule to catch fewer policies. Inevitably, however, this would have reduced the positive impact on profits.

At the other extreme, we might have chosen to take less drastic action relating to the bad policies. This could have included increasing the premium we charged (which of course could also have resulted in us losing the business but via the insured’s choice), raising the deductible payable on any losses, or looking to work with insureds to put in place better risk management processes. Let’s be conservative and say that if the bad book was running at 92% and the overall book at 54% then perhaps it would have been feasible to improve the bad book’s performance to a neutral figure of say 60% (implying a break-even combined of 100%). This would have enabled the insurance organisation to maintain its investment base, to have not lost good business as a result of culling related bad and to have preserved the profit increase generated by the cull.

In practice of course it is likely that some sort of mixed approach would have been taken. The general point is that we have been able to come up with a simple strategy to separate good and bad business and then been able to validate how accurate our choices were. If, in the future, we possessed similar information, then there is ample scope for better decisions to be taken, with potentially positive impact on profits.
 
 
Next time…

In the final part of what is now a trilogy, I will look more deeply at what we have learnt from the above example, tie these learnings into how to pitch a BI/DW programme in Insurance and make some more general observations.