New Adventures in Wi-Fi – Track 3: LinkedIn

New Adventures in Wi-Fi (with apologies to R.E.M.)

Forming the final part of the trilogy, earlier episodes being:

New Adventures in Wi-Fi – Track 1: Blogging

New Adventures in Wi-Fi – Track 2: Twitter

 
Introduction

Having recently published an entire trilogy whose gestation had consumed more than three times that of a human infant, I am now returning to another troika whose first part I published back in July 2009. Before starting, I’ll repeat something that I mentioned at the beginning of both of the previous articles; I am not a great believer in Recipes for Success, this piece reflects my journey within LinkedInLand and your path may be very different. The intention is to provide some ideas, not to offer a foolproof set of steps that will lead to instant success in the media.

I should also stress that the suggestions that I present here are related to the professional aspects of Social Media. The personal aspects are different and, while there may be some overlap, please don’t expect my recommendations to wow your friends and relations!
 

Facelessness

If there's something strange in your neighBAAhood Who ya gonna call?

It may have occurred to some readers that my trilogy is winding to a close without encompassing the doyen of dozens of SM mavens; Facebook. I am probably exhibiting my occasional Luddite tendencies here, but I have always rather struggled to form the equation:

Facebook = Professional

To me throwing farm animals at other people is not 100% consistent with a medium for raising your industry profile (unless you are in on-line games development that is). If you are a B2C organisation, then I can see the point (The Arch Climbing Wall in London is a good example of a small business using Facebook well). If you are a B2B behemoth, then a Facebook presence seems more like a wheeze dreamt up by those awfully creative people in Marketing.

I do use Facebook, but used to 100% separate this from professional networking. Because I interact with a number of people that I have met through Blogging / LinkedIn / Twitter in areas outside the strictly professional (and also if I am honest as clicking the thumbs-up button is rather easy), I have strayed somewhat from this purist path of late. However it remains true that I have one sixth of the Facebook friends as I do LinkedIn connections.

Maybe at some point in the not too distant future my trio of professional Social Media outlets will become a quartet, but for now Facebook remains a peripheral business activity for me.

 
Why LinkedIn?

LinkedIn.com

I joined LinkedIn in July 2005 and so have been engaged in it for much longer than I have either blogged or tweeted. However, me devoting any real time to this area dates to around the same time that I embarked on these other activities; late 2008. At that point I was looking to achieve a few, fairly limited things:

  1. To build on my public speaking to establish a profile in the IT industry
  2. To develop a network of fellow professionals, both in my native UK and more widely
  3. To create another platform from which to showcase my abilities and experience
  4. To reconnect with past colleagues
  5. To try out what was – even at that point – an emerging media

It is perhaps odd to think, but I believe now that item five was probably much more influential that the others back then.

Over time these objectives have morphed as I have become more familiar with LinkedIn. Today the list would more often mention either “grow” or “maintain” than “develop”. Also LinkedIn has become the main channel through which my content – such as this article – reaches people who may be interested in reading it. This is one notable aspect of LinkedIn and the observation raises two points that I will come back to later in this article. First, that LinkedIn is a great way to find, or even form, groups who are interested in niche subjects (and I am not as yet arrogant enough to think that much of what I write is in the mainstream). Second, that LinkedIn tends to work best in conjunction with other elements of Social Media; for me at least the two that I cover in the earlier articles in this series.
 
 
The Seven Habits of Highly Connected People

I tend to have an allergic reaction to articles entitled “10 steps towards successful X”. I certainly don’t have all the answers and the last thing that I would ever want to do is to stop readers thinking for themselves. However, the material I will cover in this piece, which is based on no greater insight that my own experiences, is inevitably going to fit fairly and squarely within this blogosphere cliché.
 

  1. Your page – a shop window

    Once upon a time. Not so long ago. There was a little girl and her name was Emily. And she had a shop. There it is. It was rather an unusual shop because it didn't sell anything. You see, everything in that shop window was a thing that somebody had once lost. And Emily had found. And brought home to Bagpuss. Emily's cat Bagpuss. The most Important. The most Beautiful. The most Magical. Saggy old cloth cat in the whole wide world.

    First things first, once you have signed up for LinkedIn, you will need to build your own page. This is not as daunting as it might seem and LinkedIn have done most of the hard work for you. Also they are always coming up with new sections and new features that will allow you to position snippets of information about yourself. However, in essence, your LinkedIn page is your shop window and it is important to realise that developing its contents merits some care and attention.

    It is useful to bear in mind your main objective for using LinkedIn. If this is to get a new job, then – much like a CV – you should be looking to highlight the same things that you would highlight in a CV (try Googling “10 steps towards writing a successful CV”). However remember that you can also easily host your actual CV on LinkedIn, so it will probably be productive to take a slightly different slant on your page itself. If you are a consultant and want to generate new clients, then explaining what you offer and why it is different from others will be valuable. If you are simply interested in connecting with like-minded individuals, with whom you can converse about issues and trends in your industry or sector, then perhaps listing the types of areas that you would like to talk about is a good idea. Of course, most people will have multiple and overlapping reasons for being on LinkedIn and – if so – a measured and blended approach will probably be best.

    Example of Professional Headline

    As with a CV or a static advert, you probably have only a fleeting amount of time to engage the reader’s attention before they move on elsewhere. Given this, it makes sense to make use of things like your Professional Headline to pithily pitch yourself. It does no harm at all to also have a decent photo posted. My opinion is that a business-related one sets the right tone, but others think differently.

    An example of what you can do with your LinkedIn status

    If you catch the eye of passers-by, then your next hook is your Status – this can be something that you type in yourself, an update from your activity on a group, recent Twitter postings, or a link to other content. Again a little thought here will pay dividends. This is a chance to convey something distinctive to your readers, so do your best to take advantage of it.

    After the summary of basic career details that LinkedIn auto-generates, your next opportunity to engage with readers is the experience section. Here (within a limited number of characters) you can build on what you have led with in your Professional Headline and Status to provide a more rounded perspective of you as an individual.

    Although it makes most sense to get the upper pieces of your page just right (whatever that means for you), I would recommend also paying close attention to each of the details of your career (or those that you choose to post anyway) and even interests and other information. If you do manage to engage a reader and they invest the time to go through all of your information, then the last thing you want is to put them off right at the end with a glaring typo or inane comment. Whatever your reasons for being on LinkedIn, you probably would like readers to take away the idea that you are professional in what you do and a little thoroughness never hurt anyone.

    I will cover other ways in which you can use your LinkedIn page to greater effect later on, for now – as with most things in life – the more time and thought that you spend on this area, the better the results are likely to be.
     

  2. Who will you look to connect with?

    Knee bone connected to your thigh bone. Thigh bone connected to your hip bone. Hip bone connected to your Back bone. Back bone connected to you shoulder bone. Shoulder bone connected to your Neck bone. Neck bone connected to your Head bone Now hear the word of the Lord.

    There are two ways that connections are forged, you initiate the bond being formed, or someone else does. I’ll consider the second area in the next section, what type of people does it make sense for the LinkedIn user to try to actively connect with? There are a number of obvious categories:

    1. Current colleagues or business partners

      It is becoming increasingly prevalent that connecting on LinkedIn plays the role that exchanging business cards used to in previous times (it is actually not that uncommon to see LinkedIn details on business cards either). This is the most obvious source of connections and LinkedIn will helpfully suggest people who work for your organisation as candidates.

      Available for weddings, bar mitzvahs and christenings

      Having recently started at a new company, I would not suggest indiscriminately inviting everyone at your place of work to connect. As and when you meet people face-to-face and begin to interact more, a LinkedIn invitation can help to expand your relationship (and also potentially showcase aspects of your experience that have not formed part of your day-to-day dealings with someone). If you gave new colleagues or business partners a copy of your CV, they would probably never read it. People do however seem to have the habit of checking out LinkedIn profiles, no matter how similar the two activities would appear to be on the face of things.

      Anyone that you work with extensively at the current moment is a prime candidate for a LinkedIn contact; not least as you may be able to call on such people to recommend you at some later point (see below).

    2. Former colleagues or business partners

      The same comments apply (and the same LinkedIn suggestions), but it may pay to be a little more discerning with this group. It might even make sense to be a little hard-nosed – think about what such a connection might do for you and what being connected to them might say about you. Of course where you have enjoyed a very good and mutually productive business relationship with someone, why would you not want to connect? If you instead occasionally came across someone in an old organisation and you don’t have much in common, the case for sending out an invitation may be much less strong.

      Don’t get caught in the trap of chasing connections just for the sake of it; there are better ways to receive validation in life than via the cardinality of the set of people you are linked to!

    3. People who you have never met

      It's got to hurt having a question mark branded on your face...

      This is a strange one. Typically the advice from LinkedIn gurus – and from LinkedIn itself – is not to make such connections. I am actually in rather close connection with several people I have never met via the combination of Blogging, Twitter and LinkedIn, but they generally all fall into the next section. Approaching people that you really have no business approaching is probably just as much of an antisocial behaviour on LinkedIn as it is in real life.

      Unless you share a group (or pay to upgrade to a premium account), you will need the e-mail of a target connection in order for an invitation to reach such a person. If you find yourself trying to Google this, you have probably crossed a line and should carefully consider if you really want to continue in this way.

    4. People who you have never met, but with whom you have some other connection

      What you have in common could be anything from both being members of a group on LinkedIn (see below again), to having read one of their blog articles, which you found interesting. Best is if you have actually “met” them virtually, e.g. struck up a discussion on LinkedIn, or via Twitter, or on the comments section of their (or your) blog. There are any number of people who I first “met” virtually and then physically later (see A first for me…, Another social media-inspired meeting and Some thoughts on the IRM(UK) DW/BI conference for some examples), most also were LinkedIn connections before we met face-to-face.

    5. Friends

      Aside from showing other people that you are not a sociopath (and excepting the case where friends are in a similar line of business), I’m not sure what value having cohorts of friends as connections serves. Returning to the box at the beginning of this article, maybe Facebook is the place for this.

    Finally in this section, asking someone to connect doesn’t have a major downside. At best they accept. At worst they ignore you (actually at worst they write to you and say how they would love to connect except for issues A, B and C and how this is all very unfortunate, but have a nice life). If you do get snubbed, you can comfort you self by thinking that probably no one else will ever know, or indeed care!
     

  3. Who should you accept invitations from?

    I was going to quote 50 Cent's ditty 'The Invitation' then decided that maybe this was a bad idea for a family blog

    This is a shorter section than the previous one. The answer to the question is “all of the above”. The only exception is in the People You Have Never Met section. I used to follow the received LinkedIn wisdom of only connecting with people with whom I had had some previous interaction (either on-line or IRL). Latterly I have come to the conclusion that if someone has gone to the substantial trouble of finding, or figuring out, my e-mail and then asking to be my connection, they must have some valid reason and who am I to deny them? Of course if the valid reason is wanting to sell me something, then it is not too onerous to disconnect. This actually seems to happen less frequently than one might think.
     

  4. Groups and what to share with them

    Every finite simple group is isomorphic to one of the following groups: 1) A cyclic group with prime order; 2) An alternating group of degree at least 5; 3) A simple group of Lie type, including both: a) the classical Lie groups, namely the groups of projective special linear, unitary, symplectic, or orthogonal transformations over a finite field; b) the exceptional and twisted groups of Lie type (including the Tits group which is not strictly a group of Lie type); 4) The 26 sporadic simple groups.

    As alluded to above, groups are one of the strongest points of LinkedIn. It could be argued that they have proliferated and splintered too much since their inception, but they remain a great way to interact with people who share your interests (for me everything from Mountain Biking to Data Warehouse Architecture). Joining a group both flags your areas of enthusiasm or expertise to the reader of your profile and provides a mechanism to connect with people via just what you have in common (you can generally send an invitation to the members of one a group you belong to without needing to know their e-mail address).

    However the greatest benefit of joining a group is that you can get involved in discussions. These may be responding to topics that others have raised, or web-pages that they have shared, or you may choose to initiate discussion threads of your own. For example, and anticipating the final part of this piece, I have lost track of how many of my blog articles had their genesis in LinkedIn group discussions. Of course when a group inspires you to write, you can then share the results back with the very people who provided the inspiration; a virtuous circle. You can learn a lot by just reading, but even more by jumping in and getting involved.

    Particular LinkedIn groups that have inspired me to write include:

    1. Business Intelligence Group
    2. Chief Information Officer (CIO) Network
    3. CIO Forum
    4. TDWI’s Business Intelligence and Data Warehousing Discussion Group

    Nowadays, of the above, you are most likely to find me hanging out here:

    The Data Warehousing Institute

    At the time of writing there is a limit of 50 groups to which a LinkedIn user can belong. I am at that limit and probably need to do some weeding out in order to focus on the truly useful versus the mildly interesting. A final suggestion here is to – unlike me at present – devote your time to a smaller number of groups, giving each the attention that it deserves.

    A final recommendation under this sub-heading: don’t get into discussions with Young Earth advocates, especially those who somehow managed to graduate from your science-based alma mater – you have been warned.
     

  5. Recommendations – giving and getting

    Glenn McGrath, Australian cricket legend, recommending England as the team most likely to become world number one after their home and away Ashes victories

    Recommendations are another tricky area. Ideally you will receive these spontaneously, but back in the real world you may need to ask. As ever the praise of the praiseworthy is the most treasured of all, so I would strongly suggest that you do not ask for recommendations from all and sundry. Qualifications should be a) that you respect the person you are asking to recommend you, b) that you did substantive work together, c) that the person’s recommendation is pertinent to whatever you are trying to achieve on LinkedIn and d) [sadly this one is not within your control] that the recommendation conveys something other than mere platitudes. You can of course ask people to edit their recommendations, but maybe at that point the trickiness becomes terminal.

    Some people suggest that recommendations from superiors, or customers are the only ones that are worth having. I say poppycock! Two of the LinkedIn recommendations that I am most proud of come from colleagues who worked for people who worked for me. If displaying man-management or leadership skills play any part in your LinkedIn objectives – and of course if such recommendations appear genuine – then surely there is an awful lot of value in any recommendation from a colleague. Perhaps solely having testimonials from people who have worked for you might not set the right tone, but having none also says something in my opinion.
     

  6. Applications – closing the loop

    I mentioned above that there are other ways to jazz-up your LinkedIn page. Amongst these are add-in applications. The number of these has increased of late, but don’t expect the Apple or Android app stores. There are apps that will let you share presentations, tell people what you are reading (via Amazon), or flag your travels around the globe (useful if you are a rock band on its world tour, less helpful for a humble ITer like me). I only use a couple, but they both seem to add value.

    Box.net

    First I use Box.net, a cloud-based document repository on which I store nothing more exciting than my CV and some other career documentation. The app tells you when a document is downloaded (though obviously not who has downloaded it) and I am surprised how many readers have taken advantage of this. I hope that they found my CV a riveting read.

    Wordpress LinkedIn application

    Second I use WordPress’ own add-in which allows content from my blog to be displayed (see next section). The app doesn’t provide tracking information, but I can tell whence (anonymised) visitors to my blog arrive and a fair percentage appear to originate from this LinkedIn feature.

    Despite a slow start, I anticipate a growing number of LinkedIn apps becoming available in coming months. It will be interesting to see what other opportunities these provide. The core value of LinkedIn is going to continue to be vested in the sections that I describe above, but I can see future applications enhancing this in interesting ways.
     

  7. Combination with other elements of Social Media

    Media est omnis divisa in partes tres

    Way back in the first segment of this series I said that I felt that they interplay between Blogging, Twitter and LinkedIn was more powerful than any single element. I have probably come into contact with a wider range of people via Twitter, maybe due to the low friction associated with following someone, but most of the more useful relationships have also become connections on LinkedIn. I mention above that LinkedIn groups have inspired a number of my blog articles. These include some of my most highly-rated pieces such as Who should be accountable for data quality?, A single version of the truth? and “Why Business Intelligence projects fail”. Perhaps the fact that they related to topical issues that people clearly wanted to discuss was a contributory factor in their popularity. I like to think that I often take a different slant from the original discussions on LinkedIn, but I would have often not put fingertip to keyboard without the initial conversation giving me a nudge.

    Of the three media, I put the most effort into blogging (as attested to by the length of this piece for example), but I interact with people more on LinkedIn. The way that WordPress reports referring URLs makes it difficult to be precise, but a back-of-the-envelope calculation suggests that linkedin.com is my most frequent referring domain by some way. My Twitter output has fallen somewhat in recent months, both due to other things consuming my time and also my developing opinion that it is becoming tougher to tell signal from noise. Nevertheless, it is a very common occurrence that a Twitter follow leads to a LinkedIn invitation in rapid succession and vice versa; it helps that each of the three sites have many links off to the others.

    You can link your Twitter output to LinkedIn, but I find that this can be a bit overwhelming for me, let alone people reading my LinkedIn page, so have generally turned this off again. Although I think there is great value in forming connections between LinekdIn and Twitter, I also think it is important to remember that they are distinct media which people peruse for different reasons, albeit with some overlap.

 
Final thoughts

B.T.L. - An eternal golden braid (with apologies to Douglas R Hofstadter

It has been a long journey, but I have now completed my traverse of the triangle formed by Blogging, Twitter and LinkedIn, with each “side” having its own dedicated article. I think that I will risk over-extending this analogy by saying two things.

First in arriving back where I started it is important to state that you can never declare success in Social Media, you are only as good as your last article or tweet (OK maybe the bar is not set that high for tweets). In fact I feel mildly motivated to re-read the first article in this trilogy and see which of my own blogging tips I have been ignoring recently. As with most activities, Social Media success is driven by practice and, to borrow from the other Seven Habits by continually sharpening the saw.

Second a triangle, if properly formed, has structural integrity beyond that of its component parts. I think that the same holds true for the three parts of Social Media that I have covered in this series. For those readers who have persevered this far, there is just one thing that I would like you to take away from this article. This is the strength generated by using Blogging, Twitter and LinkedIn in a mutually reinforcing way.
 
Usurpo - Sustineo - Servo
 

 

Analogies

Disaster Area's chief research accountant has recently been appointed Professor of Neomathematics at the University of Maximegalon, in recognition of both his General and his Special Theories of Disaster Area Tax Returns, in which he proves that the whole fabric of the space- time continuum is not merely curved, it is in fact totally bent.

Note: In the following I have used the abridgement Maths when referring to Mathematics, I appreciate that this may be jarring to US readers, omitting the ‘s’ is jarring to me, so please accept my apologies in advance.

Introduction

Regular readers of this blog will be aware of my penchant for analogies. Dominant amongst these have been sporting ones, which have formed a major part of articles such as:

Rock climbing: Perseverance
A bad workman blames his [BI] tools
Running before you can walk
Feasibility studies continued…
Incremental Progress and Rock Climbing
Cricket: Accuracy
The Big Picture
Mountain Biking: Mountain Biking and Systems Integration
Football (Soccer): “Big vs. Small BI” by Ann All at IT Business Edge

I have also used other types of analogy from time to time, notably scientific ones such as in the middle sections of Recipes for Success?, or A Single Version of the Truth? – I was clearly feeling quizzical when I wrote both of those pieces! Sometimes these analogies have been buried in illustrations rather than the text as in:

Synthesis RNA Polymerase transcribing DNA to produce RNA in the first step of protein synthesis
The Business Intelligence / Data Quality symbiosis A mitochondria, the possible product of endosymbiosis of proteobacteria and eukaryots
New Adventures in Wi-Fi – Track 2: Twitter Paul Dirac, the greatest British Physicist since Newton

On other occasions I have posted overtly Mathematical articles such as Patterns, patterns everywhere, The triangle paradox and the final segment of my recently posted trilogy Using historical data to justify BI investments.

Jim Harris' OCDQ Blog

Jim Harris (@ocdqblog) frequently employs analogies on his excellent Obsessive Compulsive Data Quality blog. If there is a way to form a title “The X of Data Quality”, and relate this in a meaningful way back to his area of expertise, Jim’s creative brain will find it. So it is encouraging to feel that I am not alone in adopting this approach. Indeed I see analogies employed increasingly frequently in business and technology blogs, to say nothing of in day-to-day business life.

However, recently two things have given me pause for thought. The first was the edition of Randall Munroe’s highly addictive webcomic, xkcd.com, that appeared on 6th May 2011, entitled “Teaching Physics”. The second was a blog article I read which likened a highly abstract research topic in one branch of Theoretical Physics to what BI practitioners do in their day job.

An homage to xkcd.com

Let’s consider xkcd.com first. Anyone who finds some nuggets of interest in the type of – generally rather oblique – references to matters Mathematical or Scientific that I mention above is likely to fall in love with xkcd.com. Indeed anyone who did a numerate degree, works in a technical role, or is simply interested in Mathematics, Science or Engineering would as well – as Randall says in a footnote:

“this comic occasionally contains […] advanced mathematics (which may be unsuitable for liberal-arts majors)”

Although Randall’s main aim is to entertain – something he manages to excel at – his posts can also be thought-provoking, bitter-sweet and even resonate with quite profound experiences and emotions. Who would have thought that some stick figures could achieve all that? It is perhaps indicative of the range of topics dealt with on xkcd.com that I have used it to illustrate no fewer than seven of my articles (including this one, a full list appears at the end of the article). It is encouraging that Randall’s team of corporate lawyers has generally viewed my requests to republish his work favourably.

The example of Randall’s work that I wanted to focus on is as follows.

Space-time is like some simple and familiar system which is both intuitively understandable and precisely analogous, and if I were Richard Feynman I’d be able to come up with it.
© xkcd.com (adapted from the original to fit the dimensions of this page)

It is worth noting that often the funniest / most challenging xkcd.com observations appear in the mouse-over text of comic strips (alt or title text for any HTML heads out there – assuming that there are any of us left). I’ll reproduce this below as it is pertinent to the discussion:

Space-time is like some simple and familiar system which is both intuitively understandable and precisely analogous, and if I were Richard Feynman I’d be able to come up with it.

If anyone needs some background on the science referred to then have a skim of this article if you need some background on the scientist mentioned (who has also made an appearance on peterjamesthomas.com in Presenting in Public) then glance through this second one.

Here comes the Science…

Randall points out the dangers of over-extending an analogy. While it has always helped me to employ the rubber-sheet analogy of warped space-time when thinking about the area, it is rather tough (for most people) to extrapolate a 2D surface being warped to a 4D hyperspace experiencing the same thing. As an erstwhile Mathematician, I find it easy enough to cope with the following generalisation:

S(1) = The set of all points defined by one variable (x1)
– i.e. a straight line
S(2) = The set of all points defined by two variables (x1, x2)
– i.e. a plane
S(3) = The set of all points defined by three variables (x1, x2, x3)
– i.e. “normal” 3-space
S(4) = The set of all points defined by four variables (x1, x2, x3, x4)
– i.e. 4-space
” ” ” “
S(n) = The set of all points defined by n variables (x1, x2, … , xn)
– i.e. n-space

As we increase the dimensions, the Maths continues to work and you can do calculations in n-space (e.g. to determine the distance between two points) just as easily (OK with some more arithmetic) as in 3-space; Pythagoras still holds true. However, actually visualising say 7-space might be rather taxing for even a Field’s Medallist or Nobel-winning Physicist.

… and the Maths

More importantly while you can – for example – use 3-space as an analogue for some aspects of 4-space, there are also major differences. To pick on just one area, some pieces of string that are irretrievably knotted in 3-space can be untangled with ease in 4-space.

To briefly reference a probably familiar example, starting with 2-space we can look at what is clearly a family of related objects:

2-space: A square has 4 vertexes, 4 edges joining them and 4 “faces” (each consisting of a line – so the same as edges in this case)
3-space: A cube has 8 vertexes, 12 edges and 6 “faces” (each consisting of a square)
4-space: A tesseract (or 4-hypercube) has 16 vertexes, 32 edges and 8 “faces” (each consisting of a cube)
Note: The reason that faces appears in inverted commas is that the physical meaning changes, only in 3-space does this have the normal connotation of a surface with two dimensions. Instead of faces, one would normally talk about the bounding cubes of a tesseract forming its cells.

Even without any particular insight into multidimensional geometry, it is not hard to see from the way that the numbers stack up that:

n-space: An n-hypercube has 2n vertexes, 2n-1n edges and 2n “faces” (each consisting of an (n-1)-hypercube)

Again, while the Maths is compelling, it is pretty hard to visualise a tesseract. If you think that a drawing of a cube, is an attempt to render a 3D object on a 2D surface, then a picture of a tesseract would be a projection of a projection. The French (with a proud history of Mathematics) came up with a solution, just do one projection by building a 3D “picture” of a tesseract.

La Grande Arche de la Défense

As aside it could be noted that the above photograph is of course a 2D projection of a 3D building, which is in turn a projection of a 4D shape; however recursion can sometimes be pushed too far!

Drawing multidimensional objects in 2D, or even building them in 3D, is perhaps a bit like employing an analogy (this sentence being of course a meta-analogy). You may get some shadowy sense of what the true object is like in n-space, but the projection can also mask essential features, or even mislead. For some things, this shadowy sense may be more than good enough and even allow you to better understand the more complex reality. However, a 2D projection will not be good enough (indeed cannot be good enough) to help you understand all properties of the 3D, let alone the 4D. Hopefully, I have used one element of the very subject matter that Randall raises in his webcomic to further bolster what I believe are a few of the general points that he is making, namely:

  1. Analogies only work to a degree and you over-extend them at your peril
  2. Sometimes the wholly understandable desire to make a complex subject accessible by comparing it to something simpler can confuse rather than illuminate
  3. There are subject areas that very manfully resist any attempts to approach them in a manner other than doing the hard yards – not everything is like something less complex

Why BI is not [always] like Theoretical Physics

Hand with reflecting sphere - Maurits Cornelis Escher (1935). This is your only clue.

Having hopefully supported these points, I’ll move on to the second thing that I mentioned reading; a BI-related blog also referencing Theoretical Physics. I am not going to name the author, mention where I read their piece, state what the title was, or even cite the precise area of Physics they referred to. If you are really that interested, I’m sure that the nice people at Google can help to assuage your curiosity. With that out of the way, what were the concerns that reading this piece raised in my mind?

Well first of all, from the above discussion (and indeed the general tone of this blog), you might think that such an article would be right up my street. Sadly I came away feeling that the connection made was, tenuous at best, rather unhelpful (it didn’t really tell you anything about Business Intelligence) and also exhibited a lack of anything bar a superficial understanding of the scientific theory involved.

The analogy had been drawn based on a single word which is used in both some emerging (but as yet unvalidated) hypotheses in Theoretical Physics and in Business Intelligence. While, just like the 2D projection of a 4D shape, there are some elements in common between the two, there are some fundamental differences. This is a general problem in Science and Mathematics, everyday words are used because they have some connection with the concept in hand, but this does not always imply as close a relationship as the casual reader might infer. Some examples:

  1. In Pure Mathematics, the members of a group may be associative, but this doesn’t mean that they tend to hang out together.
  2. In Particle Physics, an object may have spin, but this does not mean that it has been bowled by Murali
  3. In Structural Biology, a residue is not precisely what a Chemist might mean by one, let alone a lay-person

Part of the blame for what was, in my opinion, an erroneous connection between things that are not actually that similar lies with something that, in general, I view more positively; the popular science book. The author of the BI/Physics blog post referred to just such a tome in making his argument. I have consumed many of these books myself and I find them an interesting window into areas in which I do not have a background. The danger with them lies when – in an attempt to convey meaning that is only truly embodied (if that is the word) in Mathematical equations – our good friend the analogy is employed again. When done well, this can be very powerful and provide real insight for the non-expert reader (often the writers of pop-science books are better at this kind of thing than the scientists themselves). When done less well, this can do more than fail to illuminate, it can confuse, or even in some circumstances leave people with the wrong impression.

Tridimensional realisation of the Riemann Zeta function
© Jean-François Colonna

During my MSc, I spent a year studying the Riemann Hypothesis and the myriad of results that are built on the (unproven) assumption that it is true. Before this I had spent three years obtaining a Mathematics BSc. Before this I had taken two Maths A-levels (national exams taken in the UK during and at the end of what would equate to High School in the US), plus (less relevantly perhaps) Physics and Chemistry. One way or another I had been studying Maths for probably 15 plus years before I encountered this most famous and important of ideas.

So what is the Riemann Hypotheis? A statement of it is as follows:

The real part of all non-trivial zeros of the Riemann Zeta function is equal to one half

There! Are you any the wiser? If I wanted to explain this statement to those who have not studied Pure Mathematics at a graduate level, how would I go about it? Maybe my abilities to think laterally and be creative are not well-developed, but I struggle to think of an easily accessible way to rephrase the proposal. I could say something gnomic such as, “it is to do with the distribution of prime numbers” (while trying to avoid the heresy of adding that prime numbers are important because of cryptography – I believe that they are important because they are prime numbers!).

I spent a humble year studying this area, after years of preparation. Some of the finest Mathematical minds of the last century (sadly not a set of which I am a member) have spent vast chunks of their careers trying to inch towards a proof. The Riemann Hypothesis is not like something from normal experience; it is complicated. Some things are complicated and not easily susceptible to analogy.

Equally – despite how interesting, stimulating, rewarding and even important Business Intelligence can be – it is not Theoretical Physics and n’er the twain shall meet.

And so what?

So after this typically elliptical journey through various parts of Science and Mathematics, what have I learnt? Mainly that analogies must be treated with care and not over-extended lest they collapse in a heap. Will I therefore stop filling these pages with BI-related analogies, both textual and visual? Probably not, but maybe I’ll think twice before hitting the publish key in future!

Euler's product formula for the Riemann Zeta function


Chronological list of articles using xkcd.com illustrations:

  1. A single version of the truth?
  2. Especially for all Business Analytics professionals out there
  3. New Adventures in Wi-Fi – Track 1: Blogging
  4. Business logic [My adaptation]
  5. New Adventures in Wi-Fi – Track 2: Twitter
  6. Using historical data to justify BI investments – Part III

 

Using historical data to justify BI investments – Part III

The earliest recorded surd

This article completes the three-part series which started with Using historical data to justify BI investments – Part I and continued (somewhat inevitably) with Using historical data to justify BI investments – Part II. Having presented a worked example, which focused on using historical data both to develop a profit-enhancing rule and then to test its efficacy, this final section considers the implications for justifying Business Intelligence / Data Warehouse programmes and touches on some more general issues.
 
 
The Business Intelligence angle

In my experience when talking to people about the example I have just shared, there can be an initial “so what?” reaction. It can maybe seem that we have simply adopted the all-too-frequently-employed business ruse of accentuating the good and down-playing the bad. Who has not heard colleagues say “this was a great month excluding the impact of X, Y and Z”? Of course the implication is that when you include X, Y and Z, it would probably be a much less great month; but this is not what we have done.

One goal of business intelligence is to help in estimating what is likely to happen in the future and guiding users in taking decisions today that will influence this. What we have really done in the above example is as follows:

Look out Morlocks, here I come... [alumni of Imperial College London are so creative aren't they?]

  1. shift “now” back two years in time
  2. pretend we know nothing about what has happened in these most recent two years
  3. develop a predictive rule based solely on the three years preceding our back-shifted “now”
  4. then use the most recent two years (the ones we have metaphorically been covering with our hand) to see whether our proposed rule would have been efficacious

For the avoidance of doubt, in the previously attached example, the losses incurred in 2009 – 2010 have absolutely no influence on the rule we adopt, this is based solely on 2006 – 2008 losses. All the 2009 – 2010 losses are used for is to validate our rule.

We have therefore achieved two things:

  1. Established that better decisions could have been taken historically at the juncture of 2008 and 2009
  2. Devised a rule that would have been more effective and displayed at least some indication that this could work going forward in 2011 and beyond

From a Business Intelligence / Data Warehousing perspective, the general pitch is then something like:

Eight out of ten cats said that their owners got rid of stubborn stains no other technology could shift with BI - now with added BA

  1. if we can mechanically take such decisions, based on a very non-sophisticated analysis of data, then if we make even simple information available to the humans taking decisions (i.e. basic BI), then surely the quality of their decision-making will improve
  2. If we go beyond this to provide more sophisticated analyses (e.g. including industry segmentation, analysis of insured attributes, specific products sold etc., i.e. regular BI) then we can – by extrapolation from the example – better shape the evolution of the performance of whole books of business
  3. We can also monitor the decisions taken to determine the relative effectiveness of individuals and teams and compare these to their peers – ideally these comparisons would also be made available to the individuals and teams themselves, allowing them to assess their relative performance (again regular BI)
  4. Finally, we can also use more sophisticated approaches, such as statistical modelling to tease out trends and artefacts that would not be easily apparent when using a standard numeric or graphical approach (i.e. sophisticated BI, though others might use the terms “data mining”, “pattern recognition” or the now ubiquitous marketing term “analytics”)

The example also says something else – although we may already have reporting tools, analysis capabilities and even people dabbling in statistical modelling, it appears that there is room for improvement in our approach. The 2009 – 2010 loss ratio was 54% and it could have been closer to 40%. Thus what we are doing now is demonstrably not as good as it could be and the monetary value of making a stepped change in information capabilities can be estimated.

The generation of which should be the object of any BI/DW project worth its salt - thinking of which, maybe a mound of salt would also have worked as an illustration

In the example, we are talking about £1m of biannual premium and £88k of increased profit. What would be the impact of better information on an annual book of £1bn premium? Assuming a linear relationship and using some advanced Mathematics, we might suggest £44m. What is more, these gains would not be one-off, but repeatable every year. Even if we moderate our projected payback to a more conservative figure, our exercise implies that we would be not out of line to suggest say an ongoing annual payback of £10m. These are numbers and concepts which are likely to resonate with Executive decision-makers.

To put it even more directly an increase of £10m a year in profits would quickly swamp the cost of a BI/DW programme in very substantial benefits. These are payback ratios that most IT managers can only dream of.

As an aside, it may have occurred to readers that the mechanistic rule is actually rather good and – if so – why exactly do we need the underwriters? Taking to one side examples of solely rule-based decision-making going somewhat awry (LTCM anyone?) the human angle is often necessary in messy things like business acquisition and maintaining relationships. Maybe because of this, very few insurance organisations are relying on rules to take all decisions. However it is increasingly common for rules to play some role in their overall approach. This is likely to take the form of triage of some sort. For example:

  1. A rule – maybe not much more sophisticated than the one I describe above – is established and run over policies before renewal.
  2. This is used to score polices as maybe having green, amber or red lights associated with them.
  3. Green policies may be automatically renewed with no intervention from human staff
  4. Amber polices may be looked at by junior staff, who may either OK the renewal if they satisfy themselves that the issues picked up are minor, or refer it to more senior and experienced colleagues if they remain concerned
  5. Red policies go straight to the most experienced staff for their close attention

In this way process efficiencies are gained. Staff time is only applied where it is necessary and the most expensive resources are applied to those cases that most merit their abilities.

 
Correlation

From the webcomic of the inimitable Randall Munroe - his mouse-over text is a lot better than mine BTW
© xkcd.com

Let’s pause for a moment and consider the Insurance example a little more closely. What has actually happened? Well we seem to have established that performance of policies in 2006 – 2008 is at least a reasonable predictor of performance of the same policies in 2009 – 2010. Taking the mutual fund vendors’ constant reminder that past performance does not indicate future performance to one side, what does this actually mean?

What we have done is to establish a loose correlation between 2006 – 2008 and 2009 – 2010 loss ratios. But I also mentioned a while back that I had fabricated the figures, so how does that work? In the same section, I also said that the figures contained an intentional bias. I didn’t adjust my figures to make the year-on-year comparison work out. However, at the policy level, I was guilty of making the numbers look like the type of results that I have seen with real policies (albeit of a specific type). Hopefully I was reasonably realistic about this. If every policy that was bad in 2006 – 2008 continued in exactly the same vein in 2009 – 2010 (and vice versa) then my good segment would have dropped from an overall loss ratio of 54% to considerably less than 40%. The actual distribution of losses is representative of real Insurance portfolios that I have analysed. It is worth noting that only a small bias towards policies that start bad continuing to be bad is enough for our rule to work and profits to be improved. Close scrutiny of the list of policies will reveal that I intentionally introduced several counter-examples to our rule; good business going bad and vice versa. This is just as it would be in a real book of business.

Not strongly correlated

Rather than continuing to justify my methodology, I’ll make two statements:

  1. I have carried out the above sort of analysis on multiple books of Insurance business and come up with comparable results; sometimes the implied benefit is greater, sometimes it is less, but it has been there without exception (of course statistics being what it is, if I did the analysis frequently enough I would find just such an exception!).
  2. More mathematically speaking, the actual figure for the correlation between the two sets of years is a less than stellar 0.44. Of course a figure of 1 (or indeed -1) would imply total correlation, and one of 0 would imply a complete lack of correlation, so I am not working with doctored figures. Even a very mild correlation in data sets (one much less than the threshold for establishing statistical dependence) can still yield a significant impact on profit.

 
Closing thoughts

Ground floor: Perfumery, Stationery and leather goods, Wigs and haberdashery, Kitchenware and food…. Going up!

Having gone into a lot of detail over the course of these three articles, I wanted to step back and assess what we have covered. Although the worked-example was drawn from my experience in Insurance, there are some generic learnings to be made.

Broadly I hope that I have shown that – at least in Insurance, but I would argue with wider applicability – it is possible to use the past to infer what actions we should take in the future. By a slight tweak of timeframes, we can even take some steps to validate approaches suggested by our information. It is important that we remember that the type of basic analysis I have carried out is not guaranteed to work. The same can be said of the most advanced statistical models; both will give you some indication of what may happen and how likely this is to occur, but neither of them is foolproof. However, either of these approaches has more chance of being valuable than, for example, solely applying instinct, or making decisions at random.

In Patterns, patterns everywhere, I wrote about the dangers associated with making predictions about events are essentially unpredictable. This is another caveat to be born in mind. However, to balance this it is worth reiterating that even partial correlation can lead to establishing rules (or more sophisticated models) that can have a very positive impact.

While any approach based on analysis or statistics will have challenges and need careful treatment, I hope that my example shows that the option of doing nothing, of continuing to do things how they have been done before, is often fraught with even more problems. In the case of Insurance at least – and I suspect in many other industries – the risks associated with using historical data to make predictions about the future are, in my opinion, outweighed by the risks of not doing this; on average of course!

But then 1=2 for very large values of 1
 

A quantised approach to formal group interactions of hominidae (size > 2)

Much as he liked to debunk the field, I always thought that photon paths in Feynman diagrams presaged elements of String Theory

I am very excited to be able to report that I have taken a major step forward in expanding our understanding of the universe. The paper has been lodged on arXiv.org and it is only a matter of time before the Nobel Committee gets round to calling.

I have established that the fundamental element of time (at least between 9am and 5pm Monday to Friday) can exist only in pre-determined, discrete quantities. Furthermore I have shown that these also have a minimum value, with all other quantities being multiples of this. More conventional (aka hidebound) researchers would slavishly adhere to established, but outmoded, protocol and allocate this minimum quantity the number 1. I have been braver and less mentally constrained than my more quotidian colleagues in deciding to associate the number ½ with this quantity. My reasoning for this is that while quantum numbers of 1, 2, 3 and so on are regularly observed, those consisting of an odd multiple (n > 1) of the minimum value are rarer that free lunches.

However, I have left the most exciting finding until last. I have rigorously calculated the value of the initial quantum number. My work determines beyond any doubt that this is 1.8 x 109 µs (p < 0.003). I have modestly called this fundamental building block of nature Peter’s Constant and – as is customary – selected an appropriate Greek letter to represent it. The first letter of my name is ‘P’ and the Greek letter for ‘P’ is π, so I have naturally adopted this.

I believe that there may be some other antiquated use for this letter, but am confident that the importance of my discoveries are such that π will soon come to be associated only with its more relevant (albeit slightly newer) meaning and justice will have been seen to be done.

Congratulatory telegrams, bouquets of flowers and magna of Champagne (one of my hobbies is Latin plurals and Bollinger would be nice) may be sent to the normal address.
 


 
With acknowledgements to S. L. Cooper PhD – Department of Theoretical Physics, California Institute of Technology, Pasadena, CA 91125, USA – without whose inspiration this work would not have been possible.

Using historical data to justify BI investments – Part II

The earliest recorded surd

This article is the second in what has now expanded from a two-part series to a three-part one. This started with Using historical data to justify BI investments – Part I and finishes with Using historical data to justify BI investments – Part III (once again exhibiting my talent for selecting buzzy blog post titles).
 
 
Introduction and some belated acknowledgements

The intent of these three pieces is to present a fairly simple technique by which existing, historical data can be used to provide one element of the justification for a Business Intelligence / Data Warehousing programme. Although the specific example I will cover applies to Insurance (and indeed I spent much of the previous, introductory segment discussing some Insurance-specific concepts which are referred to below), my hope is that readers from other sectors (or whose work crosses multiple sectors) will be able to gain something from what I write. My learnings from this period of my career have certainly informed my subsequent work and I will touch on more general issues in the third and final section.

This second piece will focus on the actual insurance example. The third will relate the example to justifying BI/DW programmes and, as mentioned above, also consider the area more generally.

Before starting on this second instalment in earnest, I wanted to pause and mention a couple of things. At the beginning of the last article, I referenced one reason for me choosing to put fingertip to keyboard now, namely me briefly referring to my work in this area in my interview with Microsoft’s Bruno Aziza (@brunoaziza). There were a couple of other drivers, which I feel rather remiss to have not mentioned earlier.

First, James Taylor (@jamet123) recently published his own series of articles about the use of BI in Insurance. I have browsed these and fully intend to go back and read them more carefully in the near future. I respect James and his thoughts brought some of my own Insurance experiences to the fore of my mind.

Second, I recently posted some reflections on my presentation at the IRM MDM / Data Governance seminar. These focussed on one issue that was highlighted in the post-presentation discussion. The approach to justifying BI/DW investments that I will outline shortly also came up during these conversations and this fact provided additional impetus for me to share my ideas more widely.
 
 
Winners and losers

Before him all the nations will be gathered, and he will separate them one from another, as a shepherd separates the sheep from the goats

The main concept that I will look to explain is based on dividing sheep from goats. The idea is to look at a set of policies that make up a book of insurance business and determine whether there is some simple factor that can be used to predict their performance and split them into good and bad segments.

In order to do this, it is necessary to select policies that have the following characteristics:

  1. Having been continuously renewed so that they at least cover a contiguous five-year period (policies that have been “in force” for five years in Insurance parlance).

    The reason for this is that we are going to divide this five-year term into two pieces (the first three and the final two years) and treat these differently.

  2. Ideally with the above mentioned five-year period terminating in the most recent complete year – at the time of writing 2010.

    This is so that the associated loss ratios better reflect current market conditions.

  3. Being short-tail policies.

    I explained this concept last time round. Short-tail policies (or lines or business) are ones in which any claims are highly likely to be reported as soon as they occur (for example property or accident insurance).

    These policies tend to have a low contribution from IBNR (again see the previous piece for a definition). In practice this means that we can use the simplest of the Insurance ratios, paid loss-ratio (i.e. simply Claims divided by Premium), with some confidence that it will capture most of the losses that will be attached to the policy, even if we are talking about say 2010.

    Another way of looking at this is that (borrowing an idea discussed last time round) for this type of policy the Underwriting Year and Calendar Year treatments are closer than in areas where claims may be reported many years after the policy was in force.

Before proceeding further, it perhaps helps to make things more concrete. To achieve this, you can download a spreadsheet containing a sample set of Insurance policies, together with their premiums and losses over a five-year period from 2006 to 2010 by clicking here (this is in Office 97-2003 format – if you would prefer, there is also a PDF version available here). Hopefully you will be able to follow my logic from the text alone, but the figures may help.

A few comments about the spreadsheet. First these are entirely fabricated policies and are not even loosely based on any data set that I have worked with before. Second I have also adopted a number of simplifications:

  1. There are only 50 policies, normally many thousand would be examined.
  2. Each policy has the same annual premium – £10,000 (I am British!) – and this premium does not change over the five years being considered. In reality these would vary immensely according to changes in cover and the insurer’s pricing strategy.
  3. I have entirely omitted dates. In practice not every policy will fit neatly into a year and account will normally need to be taken of this fact.
  4. Given that this is a fabricated dataset, the claims activity has not been generated randomly. Instead I have simply selected values (though I did perform a retrospective sense check as to their distribution). While this example is not meant to 100% reflect reality, there is an intentional bias in the figures; one that I will come back to later.

The sheet also calculates the policy paid loss ratio for each year and figures for the whole portfolio appear at the bottom. While the in-year performance of any particular policy can gyrate considerably, it may be seen from the aggregate figures that overall performance of this rather small book of business is relatively consistent:

Year Paid Loss Ratio
2006 53%
2007 59%
2008 54%
2009 53%
2010 54%
Total 54%

Above I mentioned looking at the five years in two parts. At least metaphorically we are going to use our right hand to cover the results from years 2009 and 2010 and focus on the first three years on the left. Later – after we have established a hypothesis based on 2006 to 2008 results – we can lift our hand and check how we did against the “real” figures.

For the purposes of this illustration, I want to choose a rather mechanistic way to differentiate business that has performed well and badly. In doing this I have to remember that a policy may have a single major loss one year and then run free of losses for the next 20. If I was simply to say any policy with a large loss is bad, I am potentially drastically and unnecessarily culling my book (and also closing the stable door after the horse has bolted). Instead we need to develop a rule that takes this into account.

In thinking about overall profitability, while we have greatly reduced the impact of both reported but unpaid claims and IBNR by virtue of picking a short-tail business, it might be prudent to make say a 5% allowance for these. If we also assume an expense ratio of 35%, then we have a total of non-underwriting-related outgoings of 40%. This means that we can afford to have a paid loss ratio of up to 60% (100% – 40%) and still turn a profit.

Using this insight, my simple rule is as follows:

A policy will be tagged as “bad” if two things occur:

  1. The overall three-year loss ratio is in excess of 60%

    i.e. is has been unprofitable over this period; and

  2. The loss ratio is in excess of 30% in at least two of the three years

    i.e. there is a sustained element to the poor performance and not just the one-off bad luck that can hit the best underwritten of policies

This rule roughly splits the book 75 / 25; with 74% of policies being good. Other choices of parameters may result in other splits and it would be advisable spending a little time optimising things. Perhaps 26% of policies being flagged as bad is too aggressive for example (though this rather depends on what you do about them – see below). However in the simpler world of this example, I’ll press on to the next stage with my first pick.

The ultimate sense of perspective

Well all we have done so far is to tag policies that have performed badly – in the parlance of Analytics zealots we are being backward-looking. Now it is time to lift our hand on 2009 to 2010 and try to be forward-looking. While these figures are obviously also backward looking (the day that someone comes up with future data I will eat my hat), from the frame of reference of our experimental perspective (sitting at the close of 2008), they can be thought of as “the future back then”. We will use the actual performance of the policies in 2009 – 2010 to validate our choice of good and bad that was based on 2006 – 2008 results.

Overall the 50 policies had a loss ratio of 54% in 2009 – 2010. However those flagged as bad in our above exercise had a subsequent loss ratio of 92%. Those flagged as good had a subsequent loss ratio of 40%. The latter is a 14 point improvement on the overall performance of the book.

So we can say with some certainly that our rule, though simplistic, has produced some interesting results. The third part of this series will focus more closely on why this has worked. For now, let’s consider what actions the split we have established could drive.
 
 
What to do with the bad?

You shall be taken to the place from whence you came...

We were running a 54% paid ratio in 2009-2010. Using the same assumptions as above, this might have equated to a 94% combined ratio. Our book of business had an annual premium of £0.5m so we received £1m over the two years. The 94% combined would have implied making a £60k profit if we had done nothing different. So what might have happened if we had done something?

There are a number of options. The most radical of these would have been to not renew any of the bad policies; to have carried out a cull. Let us consider what would have been the impact of such an approach. Well our book of business would have shrunk to £740k over the two years at a combined of 40% (the ratio of the good book) + 40% (other outgoing) = 80%, which implies a profit of £148k, up £88k. However there are reasons why we might not have wanted to so drastically shrink our business. A smaller pot of money for investment purposes might have been one. Also we might have had customers with policies in both the good and bad segments and it might have been tricky to cancel the bad while retaining the good. And so on…

Another option would have been to have refined our rule to catch fewer policies. Inevitably, however, this would have reduced the positive impact on profits.

At the other extreme, we might have chosen to take less drastic action relating to the bad policies. This could have included increasing the premium we charged (which of course could also have resulted in us losing the business but via the insured’s choice), raising the deductible payable on any losses, or looking to work with insureds to put in place better risk management processes. Let’s be conservative and say that if the bad book was running at 92% and the overall book at 54% then perhaps it would have been feasible to improve the bad book’s performance to a neutral figure of say 60% (implying a break-even combined of 100%). This would have enabled the insurance organisation to maintain its investment base, to have not lost good business as a result of culling related bad and to have preserved the profit increase generated by the cull.

In practice of course it is likely that some sort of mixed approach would have been taken. The general point is that we have been able to come up with a simple strategy to separate good and bad business and then been able to validate how accurate our choices were. If, in the future, we possessed similar information, then there is ample scope for better decisions to be taken, with potentially positive impact on profits.
 
 
Next time…

In the final part of what is now a trilogy, I will look more deeply at what we have learnt from the above example, tie these learnings into how to pitch a BI/DW programme in Insurance and make some more general observations.
 

Using historical data to justify BI investments – Part I

The earliest recorded surd

This is the first of what was originally a two part piece that has now expanded into three. In the initial chapter, I provide some background on Insurance industry concepts and practices. These are built on in the second chapter (Using historical data to justify BI investments – Part II), in which I offer an Insurance-based worked example. In the final piece, which is cunningly named Part III, I will explain how such an approach to analysing historical data can be used to justify BI investments.

Readers who are already au fait with insurance may choose to wait for the next instalment.

Introduction

Quite some time ago, when I wrote Measuring the Benefits of Business Intelligence, I mentioned that, in some circumstances, I had been able to leverage historical data (is there any other kind?) to justify Business Intelligence investments. I briefly touched on this area in my recent interview with Microsoft’s Bruno Aziza (@brunoaziza) and thought that it was well past time me writing more fully on the topic.

My general approach applies where there are periodic decisions to be made about a business relationship and where how that relationship has performed in the past informs these decisions. These criteria particularly pertain to the industry in which I ran my first BI / DW project; commercial property and casualty insurance. While I hope that users from other sectors may be able to extrapolate my example to apply to them, it is to insurance that I will turn to explain what I did.

An insurance primer

I have always wanted to launch a '[...] for Pacifiers' series in the US

My previous article, The Specific Benefits of Business Intelligence in Insurance, starts with a widely used and pig-related (no typo) explanation of how insurance works, both for the insurer and the insured. I won’t repeat this here, but if you are unfamiliar with the area I recommend you taking a look first.

Although of course there are exceptions (event related insurance for example), many commercial insurance policies – just like those that most of us purchase in our personal lives to cover cars and property – have an annual term after which either party can decide whether or not to renew the cover. At renewal, as in the pig example, the insurer will first of all want to assess whether or not they have received more money than they have paid out over the past year. However, the entire point of insurance is that sometimes an event occurs which requires the insurer to give the insured a sum in excess of the premium that they have paid in a given year (or indeed over many years). The insurer is therefore less interested in whether a particular year has been bad – from their perspective – than whether the overall relationship has been, or will become, bad. Perhaps I am over simplifying, but if in most years the insurer pays out less in settling claims than they receive in premium (or ideally there are no claims at all) and if one bad year’s claims are unlikely to negate the benefits accrued in the normal years, then this is good business for the insurer.

Some rational comments

The intuitive mind is a sacred gift and the rational mind is a faithful servant. We have created a society that honors the servant and has forgotten the gift

I have bandied about a number of rather woolly concepts in the previous section which include: how much money the insured has paid out and how much they have taken in. Of course these things tend to be more complicated. On the simpler side of the equation, broadly speaking, money coming in is from the insurance premiums paid by customers (but see also the box appearing below).

Investment income

Some insurers are actually relatively relaxed about paying out more in claims that they receive in premium over the life of a policy. This is because of timing differences. So long as the claims are settled some time after premium is received and so long as there are relatively lucrative investment opportunities (remember that?), it may be that the investment income that the insurer can generate while it has use of the insured’s premium will more than compensate for what might be termed an operating loss on the policy. Equally some insurers will have the business goal of – at least in aggregate – always having premiums exceeding claims and thus making a profit on their core underwriting activities. In this case any investment income is added to the underwriting-related profits, rather than compensating for underwriting-related losses. I won’t complicate this article any further by including investment income, but it is a factor in the profitability of insurance companies.

Equally broadly speaking, money going out is normally in six categories:

  1. settlement of claims – often referred to as case payments
  2. claims adjusters’ estimates for the settlement of specific claims that have been notified to the insurer, but not as yet paid – often referred to as case reserves
  3. actuarial estimates of insurance events that have occurred, but which have not yet been reported to the insurer – generally known as incurred but not reported losses, or IBNR (more on this later)
  4. fees paid to insurance intermediaries for placing their clients’ business with the carrier – commission
  5. premiums paid to other organisations to transfer some of the risk associated with specific policies, or baskets of types of policies – facultative or treaty reinsurance
  6. the general expense of being in business (staff, premises, consumables, equipment, IT, advertising, uncollectable premiums etc.)

In the cause of clarity, I will lump commission, reinsurance and the general expense of being in business into Other Expenses for what follows. However please bear in mind that, as is often the case in life, things are not as simple as I will make them out to be.

Rather than dealing in monetary units, insurance companies like percentages; though they then insist on referring to these as ratios. Taking the above categories of money flowing in and out of an insurance company, the main ratios that they consider are then:

 
Insurance Ratios

Incurred but not reported

Not sure whether the Nixon administration set up any Watergate-related reserves

This concept requires a short diversion as later on I will exclude it from our discussions and will need to explain why. There are some interesting time lags in insurance. Take the sad case of asbestosis (also mentioned in my previous article). Here those unfortunately exposed developed symptoms of the disease in some cases many years later. However if their exposure was in say 1972, they would be covered by whatever Employers Liability policy their organisation held or whatever personal policy they held in the case of the self-employed. An asbestosis sufferer may have changed insurance company ten times since their exposure, but it is the insurance company who provided cover at the time who is liable for any claims.

Rather than waiting for such claims to emerge, insurance companies follow the best practise of recognising liabilities at the earliest point. Because of this, they set up estimated reserves for claims that they may receive in future years (or decades) and apply these to the year in which the policy was in force. Of course in some lines of business, say Property cover, most claims are reported as soon as they occur and so IBNR reserves are low. However in others, say Directors and Officers Liability, or the Employers Liability mentioned above, claims may arise many years hence and IBNR can be a big factor in results.

It should be stressed that IBNR is seldom calculated for a single policy (though it is conceivable that this would happen on a very large risk). Instead it is estimated for classes of policies, often grouped into lines of business, and the same “rate” of IBNR is applied across the board. Of course IBNR is calculated based on experience of losses in the same baskets of policies in previous years, adjusted to take account of current differences (e.g. more or less favourable economic conditions for Directors and Officers Liability, or maybe rising or falling property indeces for Property).

For reasons that are probably obvious, lines of business where most claims are promptly reported (i.e. low IBNR) are called short-tail lines. Those where claims may emerge some time after the period covered by the policy (i.e. high IBNR) are called long-tail lines. Later on I will be focussing just on short-tail business.

[Incidentally, improving this process of estimation is one of the specific benefits of Business Intelligence in insurance that I highlighted in my previous article.]

Underwriting Year

Fundamental particles of the Underwriting Year

Something else may have occurred to readers when considering the time lags that I reference in the previous section, namely that while a policy may last from say 1st January 2006 to 31st December 2006, claims against this may occur either during this period, or after it. The financial statements of an insurance company will place claims in the period that they are notified or settled. So in the above example, a claim paid on 23rd April 2008 (assuming the financial and calendar years coincide) will be reflected in the 2008 report and accounts.

However it is often useful for analysis purposes to lump together all of the claims relating to a policy and associate these with the year in which it was written. Again in our example this would mean our 23rd April 2008 claim would be recorded in the Underwriting Year of 2006. So an Underwriting Year report comparing 2006 and 2007 say would have the premium for all policies written in 2006 and all the claims against these policies – regardless of when they occur – compared to the premium for 2007 and all the claims against these policies, whenever they occur.

Because of this, Underwriting Year reports provide a good measure of the performance of policies (or books of business) over time, regardless of how associated losses are dispersed. By contrast Calendar Year (i.e. financial) reports will often have premium from policies written in say 2010 combined with losses from policies written in say 2000 – 2010.

Tune in next time…

BBC ANNOUNCER: Tune in to the next exciting instalment of... CAST: Dick Barton, Special Agent!

Having laid some foundations, in the next article, I will draw on the various concepts that I have introduced above to offer a worked example. In the closing chapter, I will explain how I such an example to justify a major, multi-year Business Intelligence / Data Warehousing programme within the insurance industry.

Trouble at the top

IRM MDM/DG

Several weeks back now, I presented at IRM’s collocated European Master Data Management Summit and Data Governance Conference. This was my second IRM event, having also spoken at their European Data Warehouse and Business Intelligence Conference back in 2010. The conference was impeccably arranged and the range of speakers was both impressive and interesting. However, as always happens to me, my ability to attend meetings was curtailed by both work commitments and my own preparations. One of these years I will go to all the days of a seminar and listen to a wider variety of speakers.

Anyway, my talk – entitled Making Business Intelligence an Integral part of your Data Quality Programme – was based on themes I had introduced in Using BI to drive improvements in data quality and developed in Who should be accountable for data quality?. It centred on the four-pillar framework that I introduced in the latter article (yes I do have a fetish for four-pillar frameworks as per):

The four pillars of improved data quality

Given my lack of exposure to the event as a whole, I will restrict myself to writing about a comment that came up in the question section of my slot. As per my article on presenting in public, I try to always allow time at the end for questions as this can often be the most interesting part of the talk; for delegates and for me. My IRM slot was 45 minutes this time round, so I turned things over to the audience after speaking for half-an-hour.

There were a number of good questions and I did my best to answer them, based on past experience of both what had worked and what had been less successful. However, one comment stuck in my mind. For obvious reasons, I will not identify either the delegate, or the organisation that she worked for; but I also had a brief follow-up conversation with her afterwards.

She explained that her organisation had in place a formal data governance process and that a lot of time and effort had been put into communicating with the people who actually entered data. In common with my first pillar, this had focused on educating people as to the importance of data quality and how this fed into the organisation’s objectives; a textbook example of how to do things, on which the lady in question should be congratulated. However, she also faced an issue; one that is probably more common than any of us information professionals would care to admit. Her problem was not at the bottom, or in the middle of her organisation, but at the top.

So how many miles per gallon do you get out of that?

In particular, though data governance and a thorough and consistent approach to both the entry of data and transformation of this to information were all embedded into the organisation; this did not prevent the leaders of each division having their own people take the resulting information, load it into Excel and “improve” it by “adjusting anomalies”, “smoothing out variations”, “allowing for the impact of exceptional items”, “better reflecting the opinions of field operatives” and the whole panoply of euphemisms for changing figures so that they tell a more convenient story.

In one sense this was rather depressing, someone having got so much right, but still facing challenges. However, it also chimes with another theme that I have stressed many times under the banner of cultural transformation; it is crucially important than any information initiative either has, or works assiduously to establish, the active support of all echelons of the organisation. In some of my most successful BI/DW work, I have had the benefit of the direct support of the CEO. Equally, it is is very important to ensure that the highest levels of your organisation buy in before commencing on a stepped-change to its information capabilities.

I am way overdue employing another sporting analogy - odd however how must of my rugby-related ones tend to be non-explicit

My experience is that enhanced information can have enormous payback. But it is risky to embark on an information programme without this being explicitly recognised by the senior management team. If you avoid laying this important foundation, then this is simply storing up trouble for the future. The best BI/DW projects are totally aligned with the strategic goals of the organisation. Given this, explaining their objectives and soliciting executive support should be all the easier. This is something that I would encourage my fellow information professionals to seek without exception.
 

Data visualisation

Some pictures speak for themselves:

If you don't know what this is, check out the announcement from the CDF Collaboration at: http://www.fnal.gov/pub/today/archive_2011/today11-04-07_CDFpeakresult.html - All you have to do is click here. HINT: the peak at 140 GeV/c^2 may be important.
 

The triangle paradox – solved

When I posted The triangle paradox, I said that I would post a solution in few days. As per the comments on my earlier article, some via Twitter and indeed the context of the article in which this supposed mathematical conundrum was posted, the heart of the matter is an optical illusion.

If we consider just the first part of the paradox:

More than meets the eyes

Then the key is in realising that the red and green triangles are not similar (in the geometric sense of the word). In particular the left hand angles are not the same, thus when lined-up they do not form the hypotenuse of the larger, compound triangle that our eyes see. In the example above, the line tracing the red and green triangles dips below what would be the hypotenuse of the big triangle. In the rearranged version, it bulges above. This is where the extra white square comes from.

It is probably easier to see this diagrammatically. The following figure has been distorted to make things easier to understand:

Dimensions exaggerated

Let’s start with my point about the triangles not being similar:

EAB = tan-1(2/5) ≈ 21.8°

FAC = tan-1(3/8) ≈ 20.6°

So the two triangles are not similar and, as stated above, the two arrangements don’t quite line up to form the big triangle shown in the paradox. There is a “gap” between them formed by the grey parallelogram above, whose size has been exaggerated. This difference gets lost in the thickness of the lines and also our eyes just assume that the two arrangements form the same big triangle.

To work out the area of the parallelogram:

AE = (22 + 52)½ = √29
EI = (32 + 82)½ = √73
AI = (52 + 132)½ = √194

The area of a triangle with sides a, b and c is given by:

Area of triangle

Sparing you the arithmetic, when you substritute the values for AE, EI and AI in the above equation, the area of ∆ AEI is precisely ½.

∆ AEI and ∆ AFI are clearly identical, so the area of parallelogram AEIF is twice the area of either is

2 x ½ = 1

This is where the “missing” square comes from.
 


 
As was pointed out in a comment on the original post, the above should form something of a warning to those who place wholly uncritical faith in data visualisation. Much like statistics, while this is a powerful tool in the hands of the expert, it can mislead if used without due care and attention.
 

Illuminating the darkness

Recrudescence

My partner was kind enough to buy me an Amazon Kindle for Christmas and I have enjoyed using it. Yes there were the problems with them registering me to Amazon.com, rather than Amazon.co.uk (thereby incurring foreign transaction charges). And yes they didn’t cancel a trial Economist subscription I took out on the former when I was transferred to the latter. However, these issues were sorted out and money refunded.

I suppose I had the same initial reaction as many people; that they had left a sticker covering the screen, which was intended to demonstrate what the display looked like. After failing to peal it off (thankfully not too energetically) I realised that the screen was actually that clear and that different from a “normal” computer display (I was thinking smart ‘phone or laptop). I am writing this post on one of my many laptops, the screen is OK, but the Kindle is much easier on the eye and pretty close to a high-quality printed page. Suffice it to say that I downloaded new copies of several of my favourite books to it with the prospect of re-engaging with them at my leisure.

But enough of me singing the general praises of the device, I have discovered a particular benefit. While this may well be realised by other people, it is of particular pertinence to devotees of the works of Joseph Conrad.

Joseph Conrad

As one of the undisputed giants of English prose, it is rather ironic that English itself was either Conrad’s fifth, or sixth, language (chronologically: Polish; Russian – though he later, perhaps understandably given the turbulence of the times, repudiated this as a language; French; Latin; German; and – finally, when he was in his twenties, English). I have greatly appreciated his work, since first reading Heart of Darkness. I won’t attempt to offer a literary appreciation of his genius and leave this to others with greater talents in that area. However, despite coming late to the English tongue, Conrad was a master of it and had an amazing vocabulary.

An indispensable companion to Conrad's works

I generally view myself as being reasonably erudite (less charitably I have been accused of having swallowed a thesaurus), but used to have to keep a dictionary at hand when reading Conrad; either that or try to impute meaning from context (probably getting it wrong more times that I care to admit). In some ways, my own limitations slightly diluted my enjoyment of reading. It is a bit distracting to put down one book, pick up a dictionary, look up a word and then revert to the original tome (it was even more complicated as a child reading Jules Verne’s 20,000 Leagues under the Sea with both a dictionary and gazetteer to hand!).

Incidentally my fondness of Conrad led to my one contribution to the field of science. I established my result after extensive fieldwork involving Nostromo and a daily commute. Thomas’ Theorem is as follows:

While this feat is more than achievable with the works of other authors, it is impossible to read Conrad on the Tube.

However, the Kindle is a joy in this respect as you can look up words using the built in dictionary, quickly, easily and without disturbing the thread of the narrative too much. This has got me out of my rather lazy habit of assuming that I sort of know what a word means and thereby given me a few surprises. Based on the the initial illustration above, for example, I had to modify my understanding of recrudescence!

Of course this means that I may have to re-evaluate whether Thomas’ Theorem holds in all conditions. Perhaps a sub-clause excluding the use of a Kindle is required. I will report back…
 


 
This is not the first time that Conrad has appeared in the pages of this blog, I had the temerity to also reference him in Aphorism of the Week some time ago.