Using historical data to justify BI investments – Part II

12 May 2011

The earliest recorded surd

This article is the second in what has now expanded from a two-part series to a three-part one. This started with Using historical data to justify BI investments – Part I and finishes with Using historical data to justify BI investments – Part III (once again exhibiting my talent for selecting buzzy blog post titles).
 
 
Introduction and some belated acknowledgements

The intent of these three pieces is to present a fairly simple technique by which existing, historical data can be used to provide one element of the justification for a Business Intelligence / Data Warehousing programme. Although the specific example I will cover applies to Insurance (and indeed I spent much of the previous, introductory segment discussing some Insurance-specific concepts which are referred to below), my hope is that readers from other sectors (or whose work crosses multiple sectors) will be able to gain something from what I write. My learnings from this period of my career have certainly informed my subsequent work and I will touch on more general issues in the third and final section.

This second piece will focus on the actual insurance example. The third will relate the example to justifying BI/DW programmes and, as mentioned above, also consider the area more generally.

Before starting on this second instalment in earnest, I wanted to pause and mention a couple of things. At the beginning of the last article, I referenced one reason for me choosing to put fingertip to keyboard now, namely me briefly referring to my work in this area in my interview with Microsoft’s Bruno Aziza (@brunoaziza). There were a couple of other drivers, which I feel rather remiss to have not mentioned earlier.

First, James Taylor (@jamet123) recently published his own series of articles about the use of BI in Insurance. I have browsed these and fully intend to go back and read them more carefully in the near future. I respect James and his thoughts brought some of my own Insurance experiences to the fore of my mind.

Second, I recently posted some reflections on my presentation at the IRM MDM / Data Governance seminar. These focussed on one issue that was highlighted in the post-presentation discussion. The approach to justifying BI/DW investments that I will outline shortly also came up during these conversations and this fact provided additional impetus for me to share my ideas more widely.
 
 
Winners and losers

Before him all the nations will be gathered, and he will separate them one from another, as a shepherd separates the sheep from the goats

The main concept that I will look to explain is based on dividing sheep from goats. The idea is to look at a set of policies that make up a book of insurance business and determine whether there is some simple factor that can be used to predict their performance and split them into good and bad segments.

In order to do this, it is necessary to select policies that have the following characteristics:

  1. Having been continuously renewed so that they at least cover a contiguous five-year period (policies that have been “in force” for five years in Insurance parlance).

    The reason for this is that we are going to divide this five-year term into two pieces (the first three and the final two years) and treat these differently.

  2. Ideally with the above mentioned five-year period terminating in the most recent complete year – at the time of writing 2010.

    This is so that the associated loss ratios better reflect current market conditions.

  3. Being short-tail policies.

    I explained this concept last time round. Short-tail policies (or lines or business) are ones in which any claims are highly likely to be reported as soon as they occur (for example property or accident insurance).

    These policies tend to have a low contribution from IBNR (again see the previous piece for a definition). In practice this means that we can use the simplest of the Insurance ratios, paid loss-ratio (i.e. simply Claims divided by Premium), with some confidence that it will capture most of the losses that will be attached to the policy, even if we are talking about say 2010.

    Another way of looking at this is that (borrowing an idea discussed last time round) for this type of policy the Underwriting Year and Calendar Year treatments are closer than in areas where claims may be reported many years after the policy was in force.

Before proceeding further, it perhaps helps to make things more concrete. To achieve this, you can download a spreadsheet containing a sample set of Insurance policies, together with their premiums and losses over a five-year period from 2006 to 2010 by clicking here (this is in Office 97-2003 format – if you would prefer, there is also a PDF version available here). Hopefully you will be able to follow my logic from the text alone, but the figures may help.

A few comments about the spreadsheet. First these are entirely fabricated policies and are not even loosely based on any data set that I have worked with before. Second I have also adopted a number of simplifications:

  1. There are only 50 policies, normally many thousand would be examined.
  2. Each policy has the same annual premium – £10,000 (I am British!) – and this premium does not change over the five years being considered. In reality these would vary immensely according to changes in cover and the insurer’s pricing strategy.
  3. I have entirely omitted dates. In practice not every policy will fit neatly into a year and account will normally need to be taken of this fact.
  4. Given that this is a fabricated dataset, the claims activity has not been generated randomly. Instead I have simply selected values (though I did perform a retrospective sense check as to their distribution). While this example is not meant to 100% reflect reality, there is an intentional bias in the figures; one that I will come back to later.

The sheet also calculates the policy paid loss ratio for each year and figures for the whole portfolio appear at the bottom. While the in-year performance of any particular policy can gyrate considerably, it may be seen from the aggregate figures that overall performance of this rather small book of business is relatively consistent:

Year Paid Loss Ratio
2006 53%
2007 59%
2008 54%
2009 53%
2010 54%
Total 54%

Above I mentioned looking at the five years in two parts. At least metaphorically we are going to use our right hand t cover the results from years 2009 and 2010 and focus on the first three years on the left. Later – after we have established a hypothesis based on 2006 to 2008 results – we can lift our hand and check how we did against the “real” figures.

For the purposes of this illustration, I want to choose a rather mechanistic way to differentiate business that has performed well and badly. In doing this I have to remember that a policy may have a single major loss one year and then run free of losses for the next 20. If I was simply to say any policy with a large loss is bad, I am potentially drastically and unnecessarily culling my book (and also closing the stable door after the horse has bolted). Instead we need to develop a rule that takes this into account.

In thinking about overall profitability, while we have greatly reduced the impact of both reported but unpaid claims and IBNR by virtue of picking a short-tail business, it might be prudent to make say a 5% allowance for these. If we also assume an expense ratio of 35%, then we have a total of non-underwriting-related outgoings of 40%. This means that we can afford to have a paid loss ratio of up to 60% (100% – 40%) and still turn a profit.

Using this insight, my simple rule is as follows:

A policy will be tagged as “bad” if two things occur:

  1. The overall three-year loss ratio is in excess of 60%

    i.e. is has been unprofitable over this period; and

  2. The loss ratio is in excess of 30% in at least two of the three years

    i.e. there is a sustained element to the poor performance and not just the one-off bad luck that can hit the best underwritten of policies

This rule roughly splits the book 75 / 25; with 74% of policies being good. Other choices of parameters may result in other splits and it would be advisable spending a little time optimising things. Perhaps 26% of policies being flagged as bad is too aggressive for example (though this rather depends on what you do about them – see below). However in the simpler world of this example, I’ll press on to the next stage with my first pick.

The ultimate sense of perspective

Well all we have done so far is to tag policies that have performed badly – in the parlance of Analytics zealots we are being backward-looking. Now it is time to lift our hand on 2009 to 2010 and try to be forward-looking. While these figures are obviously also backward looking (the day that someone comes up with future data I will eat my hat), from the frame of reference of our experimental perspective (sitting at the close of 2008), they can be thought of as “the future back then”. We will use the actual performance of the policies in 2009 – 2010 to validate our choice of good and bad that was based on 2006 – 2008 results.

Overall the 50 policies had a loss ratio of 54% in 2009 – 2010. However those flagged as bad in our above exercise had a subsequent loss ratio of 92%. Those flagged as good had a subsequent loss ratio of 40%. The latter is a 14 point improvement on the overall performance of the book.

So we can say with some certainly that our rule, though simplistic, has produced some interesting results. The third part of this series will focus more closely on why this has worked. For now, let’s consider what actions the split we have established could drive.
 
 
What to do with the bad?

You shall be taken to the place from whence you came...

We were running a 54% paid ratio in 2009-2010. Using the same assumptions as above, this might have equated to a 94% combined ratio. Our book of business had an annual premium of £0.5m so we received £1m over the two years. The 94% combined would have implied making a £60k profit if we had done nothing different. So what might have happened if we had done something?

There are a number of options. The most radical of these would have been to not renew any of the bad policies; to have carried out a cull. Let us consider what would have been the impact of such an approach. Well our book of business would have shrunk to £740k over the two years at a combined of 40% (the ratio of the good book) + 40% (other outgoing) = 80%, which implies a profit of £148k, up £88k. However there are reasons why we might not have wanted to so drastically shrink our business. A smaller pot of money for investment purposes might have been one. Also we might have had customers with policies in both the good and bad segments and it might have been tricky to cancel the bad while retaining the good. And so on…

Another option would have been to have refined our rule to catch fewer policies. Inevitably, however, this would have reduced the positive impact on profits.

At the other extreme, we might have chosen to take less drastic action relating to the bad policies. This could have included increasing the premium we charged (which of course could also have resulted in us losing the business but via the insured’s choice), raising the deductible payable on any losses, or looking to work with insureds to put in place better risk management processes. Let’s be conservative and say that if the bad book was running at 92% and the overall book at 54% then perhaps it would have been feasible to improve the bad book’s performance to a neutral figure of say 60% (implying a break-even combined of 100%). This would have enabled the insurance organisation to maintain its investment base, to have not lost good business as a result of culling related bad and to have preserved the profit increase generated by the cull.

In practice of course it is likely that some sort of mixed approach would have been taken. The general point is that we have been able to come up with a simple strategy to separate good and bad business and then been able to validate how accurate our choices were. If, in the future, we possessed similar information, then there is ample scope for better decisions to be taken, with potentially positive impact on profits.
 
 
Next time…

In the final part of what is now a trilogy, I will look more deeply at what we have learnt from the above example, tie these learnings into how to pitch a BI/DW programme in Insurance and make some more general observations.
 


Data visualisation

10 April 2011

Some pictures speak for themselves:

If you don't know what this is, check out the announcement from the CDF Collaboration at: http://www.fnal.gov/pub/today/archive_2011/today11-04-07_CDFpeakresult.html - All you have to do is click here. HINT: the peak at 140 GeV/c^2 may be important.
 


The triangle paradox – solved

10 April 2011

When I posted The triangle paradox, I said that I would post a solution in few days. As per the comments on my earlier article, some via Twitter and indeed the context of the article in which this supposed mathematical conundrum was posted, the heart of the matter is an optical illusion.

If we consider just the first part of the paradox:

More than meets the eyes

Then the key is in realising that the red and green triangles are not similar (in the geometric sense of the word). In particular the left hand angles are not the same, thus when lined-up they do not form the hypotenuse of the larger, compound triangle that our eyes see. In the example above, the line tracing the red and green triangles dips below what would be the hypotenuse of the big triangle. In the rearranged version, it bulges above. This is where the extra white square comes from.

It is probably easier to see this diagrammatically. The following figure has been distorted to make things easier to understand:

Dimensions exaggerated

Let’s start with my point about the triangles not being similar:

EAB = tan-1(2/5) ≈ 21.8°

FAC = tan-1(3/8) ≈ 20.6°

So the two triangles are not similar and, as stated above, the two arrangements don’t quite line up to form the big triangle shown in the paradox. There is a “gap” between them formed by the grey parallelogram above, whose size has been exaggerated. This difference gets lost in the thickness of the lines and also our eyes just assume that the two arrangements form the same big triangle.

To work out the area of the parallelogram:

AE = (22 + 52)½ = √29
EI = (32 + 82)½ = √73
AI = (52 + 132)½ = √194

The area of a triangle with sides a, b and c is given by:

Area of triangle

Sparing you the arithmetic, when you substritute the values for AE, EI and AI in the above equation, the area of ∆ AEI is precisely ½.

∆ AEI and ∆ AFI are clearly identical, so the area of parallelogram AEIF is twice the area of either is

2 x ½ = 1

This is where the “missing” square comes from.
 


 
As was pointed out in a comment on the original post, the above should form something of a warning to those who place wholly uncritical faith in data visualisation. Much like statistics, while this is a powerful tool in the hands of the expert, it can mislead if used without due care and attention.
 


Illuminating the darkness

8 April 2011

Recrudescence

My partner was kind enough to buy me an Amazon Kindle for Christmas and I have enjoyed using it. Yes there were the problems with them registering me to Amazon.com, rather than Amazon.co.uk (thereby incurring foreign transaction charges). And yes they didn’t cancel a trial Economist subscription I took out on the former when I was transferred to the latter. However, these issues were sorted out and money refunded.

I suppose I had the same initial reaction as many people; that they had left a sticker covering the screen, which was intended to demonstrate what the display looked like. After failing to peal it off (thankfully not too energetically) I realised that the screen was actually that clear and that different from a “normal” computer display (I was thinking smart ‘phone or laptop). I am writing this post on one of my many laptops, the screen is OK, but the Kindle is much easier on the eye and pretty close to a high-quality printed page. Suffice it to say that I downloaded new copies of several of my favourite books to it with the prospect of re-engaging with them at my leisure.

But enough of me singing the general praises of the device, I have discovered a particular benefit. While this may well be realised by other people, it is of particular pertinence to devotees of the works of Joseph Conrad.

Joseph Conrad

As one of the undisputed giants of English prose, it is rather ironic that English itself was either Conrad’s fifth, or sixth, language (chronologically: Polish; Russian – though he later, perhaps understandably given the turbulence of the times, repudiated this as a language; French; Latin; German; and – finally, when he was in his twenties, English). I have greatly appreciated his work, since first reading Heart of Darkness. I won’t attempt to offer a literary appreciation of his genius and leave this to others with greater talents in that area. However, despite coming late to the English tongue, Conrad was a master of it and had an amazing vocabulary.

An indispensable companion to Conrad's works

I generally view myself as being reasonably erudite (less charitably I have been accused of having swallowed a thesaurus), but used to have to keep a dictionary at hand when reading Conrad; either that or try to impute meaning from context (probably getting it wrong more times that I care to admit). In some ways, my own limitations slightly diluted my enjoyment of reading. It is a bit distracting to put down one book, pick up a dictionary, look up a word and then revert to the original tome (it was even more complicated as a child reading Jules Verne’s 20,000 Leagues under the Sea with both a dictionary and gazetteer to hand!).

Incidentally my fondness of Conrad led to my one contribution to the field of science. I established my result after extensive fieldwork involving Nostromo and a daily commute. Thomas’ Theorem is as follows:

While this feat is more than achievable with the works of other authors, it is impossible to read Conrad on the Tube.

However, the Kindle is a joy in this respect as you can look up words using the built in dictionary, quickly, easily and without disturbing the thread of the narrative too much. This has got me out of my rather lazy habit of assuming that I sort of know what a word means and thereby given me a few surprises. Based on the the initial illustration above, for example, I had to modify my understanding of recrudescence!

Of course this means that I may have to re-evaluate whether Thomas’ Theorem holds in all conditions. Perhaps a sub-clause excluding the use of a Kindle is required. I will report back…
 


 
This is not the first time that Conrad has appeared in the pages of this blog, I had the temerity to also reference him in Aphorism of the Week some time ago.
 


What is wrong with this picture?

7 April 2011

Following on from the optical illusions that I featured earlier in the week, here is another picture with something subtly (or perhaps not so subtly) wrong with it. Can you spot what?

So which one is your favourite?
 


The triangle paradox

4 April 2011

This seems to be turning into Mathematics week at peterjamesthomas.com. The “paradox” shown in the latter part of this article was presented to the author and some of his work colleagues at a recent seminar. It kept company with some well-know trompe l’œil such as:

Old or young woman?

and

Quadruped?

and

Parallel lines?

However the final item presented was rather more worrying as it seemed to be less related to the human eye’s (or perhaps more accurately the human brain’s) ability to discern shape from minimal cues and more to do with mathematical fallacy. The person presenting these images (actually they were slightly different ones, I have simplified the problem) claimed that they themselves had no idea about the solution.

Consider the following two triangles:

Spot the difference...

The upper one has been decomposed into two smaller triangles – one red, one green – a blue rectangle and a series of purple squares.

These shapes have then been rearranged to form the lower triangle. But something is going wrong here. Where has the additional white square come from?

Without even making recourse to Gödel, surely this result stabs at the heart of Mathematics. What is going on?

After a bit of thought and going down at least one blind alley, I managed to work this one out (and thereby save Mathematics single-handedly). I’ll publish the solution in a later article. Until then, any suggestions are welcome.
 


 
For those who don’t want to think about this too much, the solution has now been posted here.
 


Half full, or half empty?

3 April 2011

Glass half, er...

Someone being described as a “glass half-full” or “glass half-empty” sort of person is something that one hears increasingly frequently. I was recently discussing this with a friend and we both agreed that the analogy was unhelpful. First it supports a drastically simplistic and binary view of people having fixed attitudes and behaviours in all circumstances. Day-to-day observation suggests on the contrary that a person my be an avid optimist one day about one thing and a manic pessimist the next day about another thing. This rather shallow type of characterisation rather reminds me of some of the subjects I touched on in The Big Picture and Pigeonholing – A tragedy some time ago.

However, there is a more fundamental consideration; wilful inaccuracy. A glass that is half empty is also half full; that’s the definition of a half. Either description is 100% valid and therefore logically can tell you nothing about the person’s mindset.

Instead what might be more apposite is to adopt a different way to divide sheep from goats. This is still rather too binary for my taste, but at least it has the merit of a greater degree of rigour. I propose dividing people according to how they view a glass that is three quarters empty:

  • I still have some left: optimist
  • There isn’t very much left: pessimist

I think that all of our lives would be much the better for adopting this simple principle.

The International Organisation for stamping out sloppiness in spoken speech

Accordingly, I am going to submit this recommendation to the International Standards Organisation for their urgent consideration. I’ll make sure that I keep readers up-to-date with how my submission progresses.
 


I will be presenting at the IRM European Data Governance Conference

2 March 2011

This IRM UK event will be taking place in central London from the 21st to 23rd March 2011. It is co-located with another related IRM conferences on Master Data Management.

My presentation will be entitled Making Business Intelligence an Integral part of your Data Quality Programme. Full details may be obtained from the IRM conference web-site here.
 


How to use your BI Tool to Highlight Deficiencies in Data

28 January 2011

My interview with Microsoft’s Bruno Aziza (@brunoaziza), which I trailed in Another social media-inspired meeting, was published today on his interesting and entertaining bizintelligence.tv site.

You can take a look at the canonical version here and the YouTube version appears below:

The interview touches on themes that I have discussed in:

 


Another social media-inspired meeting

31 October 2010

Lights, camera, action!

Back in June 2009, I wrote an article entitled A first for me. In this I described meeting up with Seth Grimes (@SethGrimes), an acknowledged expert in analytics and someone I had initially “met” via Twitter.com.

I have vastly expanded my network of international contacts through social media interactions such as these. Indeed I am slated to meet up with a few other people during November; a month in which I have a couple of slots speaking at BI/DW conferences (IRM later this week and Obis Omni towards the end of the month).

Another person that I became a virtual acquaintance of via social media is Bruna Aziza (@brunoaziza), Worldwide Strategy Lead for Business Intelligence at Microsoft. I originally “met” Bruno via LinkedIn.com and then also connected on Twitter.com. Later Bruno asked me for my thoughts on his article, Use Business Intelligence To Compete More Effectively, and I turned these into a blog post called BI and competition.

bizintelligence.tv - by Bruno Aziza of Microsoft

We have kept in touch since and last week Bruno asked me to be interviewed on the bizintelligence.tv channel that he is setting up. It was good to meet in person and I thought that we had some interesting discussions. Though I have done video and audio interviews before with organisations like IBM Cognos, Informatica, Computing Magazine and SmartDataCollective (see the foot of this article for links), these were mostly a while back and so it was interesting to be in front of a camera again.

The bizintelligence.tv format seems to be an interesting one, with key points in BI discussed in a focussed and punchy manner (not an approach that I am generally associated with) and a target audience of busy senior IT managers. As I have remarked elsewhere, it is also notable that the more foresighted of corporations are now taking social media seriously and getting quite good at engaging without any trace of hard selling; something that perhaps compromised the earlier efforts of some organisations in this area (for the avoidance of doubt, this is a general comment and not one levelled at Microsoft).

Bruno and I touched on a number of areas including, driving improvements in data quality, measuring the value of BI programmes, using historical data to justify BI investments (something that I am overdue writing about – UPDATE: now remedied here) and the cultural change aspect of BI. I am looking forward to seeing the results. Watch this space and in the meantime, take a look at some of the earlier interviews that Bruno has conducted.
 


 

Other video and audio interviews that I have recorded:

 

tweet this Tweet this article on twitter.com
Bookmark this article with:
| Facebook | del.icio.us | digg | Reddit | Stumble

 


Follow

Get every new post delivered to your Inbox.

Join 3,026 other followers