Alphabet Soup

Alphabet soup

This article is about the latest consumer product from the Google stable, something which will revolutionise your eating experience by combining a chicken-broth base with a nanotechnology garnish and a soupçon of deep learning techniques to create a warming meal that also provides a gastro-intestinal health-check. Wait…

…I may have got my wires crossed a bit there. No, I mis-spoke, the article is actually about ever increasing number of CxO titles [1], which has made a roster of many organisations’ executives come to resemble a set of Scrabble tiles.

Specifically I will focus on two values of x, A and D, so the CAO and CDO roles [2]. What do these TLAs [3] stand for, what do people holding these positions do and can we actually prove that, for these purposes only, “A” ≡ “D”?
Breaking the Code


The starting position is not auspicious. What might CAO stand for? Existing roles that come to mind include: Chief Accounting Officer and Chief Administrative Officer. However, in our context, it actually stands for Chief Analytics Officer. There is no ISO definition of Analytics, as I note in one of my recent seminar decks [4] (quoting the Gartner IT Glossary, but with my underlining):

Analytics has emerged as a catch-all term for a variety of different business intelligence and application-related initiatives. In particular, BI vendors use the ‘analytics’ moniker to differentiate their products from the competition. Increasingly, ‘analytics’ is used to describe statistical and mathematical data analysis that clusters, segments, scores and predicts what scenarios are most likely to happen.

I should of course mention here that my current role incorporates the word “Analytics” [5], so I may be making a point against myself. But before I start channeling my 2009 article, Business Analytics vs Business Intelligence [6], I’ll perhaps instead move on to the second acronym. How to decode CDO? Well an equally recent translation would be Chief Digital Officer, but you also come across Chief Development Officer and sometimes even Chief Diversity Officer. Our meaning will however be Chief Data Officer. You can read about what I think a CDO does here.

A observation that is perhaps obvious to make at this juncture is that when the acronym of a role is not easy to pin down, the content of the role may be equally amorphous. It is probably fair to say that this is true of both CAO and CDO job descriptions. Both are emerging roles in the majority of organisations.
Before the Flood

HMS/USS* Chief Information Officer (* delete as applicable)

One thing that both roles have in common is that – in antediluvian days – their work used to be the province of another CxO, the CIO. This was before many CIOs became people who focus on solution architecture, manage relationships with outsourcers and have their time consumed by running Service Desks and heading off infrastructure issues [7]. Where organisations may have had just a CIO, they may well now have a CIO, a CAO and a CDO (and also a CTO perhaps which splits one original “C” role into four).

Aside from being a job creation scheme, the reasons for such splits are well-documented. The prevalence of outsourcing (and the complexity of managing such arrangements); the pervasiveness and criticality of technology leading to many CIOs focussing more on the care and feeding of systems than how businesses employ them; the relentless rise of Change organisations; and (frequently related to the last point) the increase in size of IT departments (particularly if staff in external partner organisations are included). All of these have pushed CIOs into more business as usual / back-room / engineering roles, leaving a vacuum in the nexus between business, technology and transformation. The fact that data processing is very different to data collation and synthesis has been another factor in CAOs and / or CDOs filling this vacuum.
Some other Points of View

James Taylor Robert Morison Jen Stirrup

As trailed in some previous articles [8], I have been thinking about the potential CAO / CDO dichotomy for some time. Towards the beginning of this period I read some notes that decision management luminary James Taylor had published based on the proceedings of the 2015 Chief Analytics Officer Summit. In the first part of these he cites comments made by Robert Morison as follows:

Practically speaking organizations need both roles [CAO and CDO] filled – either by one person or by two working closely together. This is hard because the roles are both new and evolving – role clarity was not the norm creating risk. In particular if both roles exist they must have some distinction such as demand v supply, offense v defense – adding value to data with analytics v managing data quality and consistency. But enterprises need to be ready – in particular when data is being identified as an asset by the CEO and executive team. CDOs tend to be driven by fragmented data environments, regulatory challenges, customer centricity. CAO tends to be driven by a focus on improving decision-making, moving to predictive analytics, focusing existing efforts.

Where CAO and CDO roles are separate, the former tends to work on exploiting data, the latter on data foundations / compliance. These are precisely the two vertical extremities of the spectrum I highlighted in The Chief Data Officer “Sweet Spot”. As Robert points out, in order for both to be successful, the CAO and CDO need to collaborate very closely.

Around the same time, another take on the same general question was offered by Jen Stirrup in her 2015 PASS Diary [9] article, Why are PASS doing Business Analytics at all?. Here Jen cites the Gartner distinctions between descriptive, diagnostic, predictive and prescriptive analytics adding that:

Business Intelligence and Business Analytics are a continuum. Analytics is focused more on a forward motion of the data, and a focus on value.

Channeling Douglas Adams, this model can be rehashed as:

  1. What happened?
  2. Why did it happen?
  3. What is going to happen next?
  4. What should we be doing?

As well as providing a finer grain distinguishing different types of analytics, the steps necessary to answer these questions also tend to form a bridge between what might be regarded as definitively CDO work and what might be regarded as definitively CAO work. As Jen notes, it’s a continuum. Answering “What happened?” with any accuracy requires solid data foundations and decent data quality, working out “What is going to happen next?” requires each of solid data foundations, decent data quality and a statistical approach.
Much CDO about Nothing

Just an excuse to revist a happy ending for Wesley Wyndam-Pryce and Winifred Burkle - I'm such a fanboy :-o

In some organisations, particularly the type where headcount is not a major factor in determining overall results, separate CAO and CDO departments can coexist; assuming of course that their leaders recognise their mutual dependency, park their egos at the door and get on with working together. However, even in such organisations, the question arises of to whom should the CAO and CDO report, a single person, two different people, or should one of them report to the other? In more cost-conscious organisations entirely separate departments may feel like something of a luxury.

My observation is that CAO staff generally end up doing data collation and cleansing, while CDO staff often get asked to provide data and carry out data analysis. This blurs what is already a fairly specious distinction between the two areas and provides scope for both duplication of work and – more worryingly – different answers to the same business questions. As I have mentioned in earlier articles, to anyone engaged in the fields, Analytics and Data Management are two sides of the same coin and both benefit from being part of the same unitary management structure.

Alignment of Data teams

If we consider the arrangements on the left-hand side of the above diagram, the two departments may end up collaborating, but the structure does not naturally lead to this. Indeed, where the priorities of the people that the CAO and CDO report in to differ, then there is scope for separate agendas, unhealthy competition and – again – duplication and waste. It is my assertion that the arrangements on the right-hand side are more likely to lead to a cohesive treatment of the spectrum of data matters and thus superior business outcomes.

In the right-hand exhibit, I have intentionally steered away from CAO and CDO titles. I recognise that there are different disciplines within the data world, but would expect virtual teams to form, disband and reform as required drawing on a variety of skills and experience. I have also indicated that the whole area should report into a single person, here given the monicker of TDJ (or Top Data Job [10]). You could of course map Analytics Lead to CAO and Data Management lead to CDO if you chose. Equally you could map one or other of these to the TDJ, with the other subservient. To an extent it doesn’t really matter. What I do think matters is that the TDJ goes to someone who understands the whole data arena; both the CAO and CDO perspectives. In my opinion this rules out most CEOs, COOs and CFOs from this role.
More or less Mandatory Sporting Analogy [11]

Association Football Free Kick

An analogy here comes from Robert Morison’s mention of “offense v defense” [12]. This puts me in mind of an [Association] Football Manager. In Soccer (to avoid further confusion), there are not separate offensive and defensive teams, whose presence on the field of play are mutually exclusive. Instead your defenders and attackers are different roles within one team; also sometimes defenders have to attack and attackers have to defend. The arrangements in the left-hand organogram are as if the defenders in a Soccer team were managed by one person, the attackers by another and yet they were all expected to play well together. Of course there are specialist coaches, but there is one Manager of a Soccer team who has overall accountability for tactics, selection and style of play (they also manage any specialist coaches). It is generally the Manager who lives or dies according to their team’s success. Equally, in the original right-hand organogram, if the TDJ is held by someone who understands just analytics or just data management, then it is like a Soccer Manager who only understands attack, but not defence.

The point I am trying to make is probably more readily apprehended via the following diagram:


On the assumption that the Manager on the right knows a lot about both attack and defence in Soccer, whereas the team owner is at best an interested amateur, then is the set up on the left or on the right likely to be a more formidable footballing force?

Even in American Football the analogy still holds. There are certainly offensive and defensive coaches, each of whom has “their” team on the park for a period. However, it is the Head Coach who calls the shots and this person needs to understand all of the nuances of the game.
In Closing

So, my recommendation is that – in data matters – you similarly have someone in the Top Data Job, with a broad knowledge of all aspects of data. They can be supported by specialists of course, but again someone needs to be accountable. To my mind, we already have a designation for such as person, a Chief Data Officer. However, to an extent this is semantics. A Chief Analytics Officer who is knowledgeable about Data Governance and Data Management could be the head data honcho [13], but one who only knows about analytics is likely to have their work cut out for them. Equally if CAO and CDO functions are wholly separate and only come together in an organisation under someone who has no background in data matters, then nothing but problems is going to arise.

The Top Data Job – or CDO in my parlance – has to be au fait with the span of data activities in an organisation and accountable for all work pertaining to data. If not then they will be as useful as a Soccer Manager who only knows about one aspect of the game and can only direct a handful of the 11 players on the field. Do organisations want some chance of winning the game, or to tie their hands behind their backs and don a blindfold before engaging in data activities? The choice should not really be a difficult one.


x : 65 ≤ ascii(x) ≤ 90.
“C”, “A”, “O” + “C”, “D”, “O” + (for no real reason save expediency) “R” allows you to spell ACCORD, which scores 11 in Executive Scrabble.
Three Letter Acronyms.
Data Management, Analytics, People: An Eternal Golden Braid – A Metaphorical Fugue On The Data ⇒ Information ⇒ Insight ⇒ Action Journey In The Spirit Of Douglas R. HofstadterIRM(UK) Enterprise Data / Business Intelligence 2016
I hasten to add that it also contains the phrase “Data Management” – see here.
Probably not a great idea for any of those involved.
Whether or not this evolution (or indeed regression) of the CIO role has proved to be a good thing is perhaps best handled in a separate article.

  1. Wanted – Chief Data Officer
  2. 5 Themes from a Chief Data Officer Forum
  3. 5 More Themes from a Chief Data Officer Forum and
  4. The Chief Data Officer “Sweet Spot”
PASS was co-founded by CA Technologies and Microsoft Corporation in 1999 to promote and educate SQL Server users around the world. Since its founding, PASS has expanded globally and diversified its membership to embrace professionals using any Microsoft data technology.
With acknowledgement to Peter Aiken.
A list of my articles that employ sporting analogies appears – appropriately enough – at the beginning of Analogies.
That’s “offence vs defence” in case any readers were struggling.
Maybe organisations should consider adding HDH to their already very crowded Executive alphabet soup.



Using historical data to justify BI investments – Part II

The earliest recorded surd

This article is the second in what has now expanded from a two-part series to a three-part one. This started with Using historical data to justify BI investments – Part I and finishes with Using historical data to justify BI investments – Part III (once again exhibiting my talent for selecting buzzy blog post titles).
Introduction and some belated acknowledgements

The intent of these three pieces is to present a fairly simple technique by which existing, historical data can be used to provide one element of the justification for a Business Intelligence / Data Warehousing programme. Although the specific example I will cover applies to Insurance (and indeed I spent much of the previous, introductory segment discussing some Insurance-specific concepts which are referred to below), my hope is that readers from other sectors (or whose work crosses multiple sectors) will be able to gain something from what I write. My learnings from this period of my career have certainly informed my subsequent work and I will touch on more general issues in the third and final section.

This second piece will focus on the actual insurance example. The third will relate the example to justifying BI/DW programmes and, as mentioned above, also consider the area more generally.

Before starting on this second instalment in earnest, I wanted to pause and mention a couple of things. At the beginning of the last article, I referenced one reason for me choosing to put fingertip to keyboard now, namely me briefly referring to my work in this area in my interview with Microsoft’s Bruno Aziza (@brunoaziza). There were a couple of other drivers, which I feel rather remiss to have not mentioned earlier.

First, James Taylor (@jamet123) recently published his own series of articles about the use of BI in Insurance. I have browsed these and fully intend to go back and read them more carefully in the near future. I respect James and his thoughts brought some of my own Insurance experiences to the fore of my mind.

Second, I recently posted some reflections on my presentation at the IRM MDM / Data Governance seminar. These focussed on one issue that was highlighted in the post-presentation discussion. The approach to justifying BI/DW investments that I will outline shortly also came up during these conversations and this fact provided additional impetus for me to share my ideas more widely.
Winners and losers

Before him all the nations will be gathered, and he will separate them one from another, as a shepherd separates the sheep from the goats

The main concept that I will look to explain is based on dividing sheep from goats. The idea is to look at a set of policies that make up a book of insurance business and determine whether there is some simple factor that can be used to predict their performance and split them into good and bad segments.

In order to do this, it is necessary to select policies that have the following characteristics:

  1. Having been continuously renewed so that they at least cover a contiguous five-year period (policies that have been “in force” for five years in Insurance parlance).

    The reason for this is that we are going to divide this five-year term into two pieces (the first three and the final two years) and treat these differently.

  2. Ideally with the above mentioned five-year period terminating in the most recent complete year – at the time of writing 2010.

    This is so that the associated loss ratios better reflect current market conditions.

  3. Being short-tail policies.

    I explained this concept last time round. Short-tail policies (or lines or business) are ones in which any claims are highly likely to be reported as soon as they occur (for example property or accident insurance).

    These policies tend to have a low contribution from IBNR (again see the previous piece for a definition). In practice this means that we can use the simplest of the Insurance ratios, paid loss-ratio (i.e. simply Claims divided by Premium), with some confidence that it will capture most of the losses that will be attached to the policy, even if we are talking about say 2010.

    Another way of looking at this is that (borrowing an idea discussed last time round) for this type of policy the Underwriting Year and Calendar Year treatments are closer than in areas where claims may be reported many years after the policy was in force.

Before proceeding further, it perhaps helps to make things more concrete. To achieve this, you can download a spreadsheet containing a sample set of Insurance policies, together with their premiums and losses over a five-year period from 2006 to 2010 by clicking here (this is in Office 97-2003 format – if you would prefer, there is also a PDF version available here). Hopefully you will be able to follow my logic from the text alone, but the figures may help.

A few comments about the spreadsheet. First these are entirely fabricated policies and are not even loosely based on any data set that I have worked with before. Second I have also adopted a number of simplifications:

  1. There are only 50 policies, normally many thousand would be examined.
  2. Each policy has the same annual premium – £10,000 (I am British!) – and this premium does not change over the five years being considered. In reality these would vary immensely according to changes in cover and the insurer’s pricing strategy.
  3. I have entirely omitted dates. In practice not every policy will fit neatly into a year and account will normally need to be taken of this fact.
  4. Given that this is a fabricated dataset, the claims activity has not been generated randomly. Instead I have simply selected values (though I did perform a retrospective sense check as to their distribution). While this example is not meant to 100% reflect reality, there is an intentional bias in the figures; one that I will come back to later.

The sheet also calculates the policy paid loss ratio for each year and figures for the whole portfolio appear at the bottom. While the in-year performance of any particular policy can gyrate considerably, it may be seen from the aggregate figures that overall performance of this rather small book of business is relatively consistent:

Year Paid Loss Ratio
2006 53%
2007 59%
2008 54%
2009 53%
2010 54%
Total 54%

Above I mentioned looking at the five years in two parts. At least metaphorically we are going to use our right hand to cover the results from years 2009 and 2010 and focus on the first three years on the left. Later – after we have established a hypothesis based on 2006 to 2008 results – we can lift our hand and check how we did against the “real” figures.

For the purposes of this illustration, I want to choose a rather mechanistic way to differentiate business that has performed well and badly. In doing this I have to remember that a policy may have a single major loss one year and then run free of losses for the next 20. If I was simply to say any policy with a large loss is bad, I am potentially drastically and unnecessarily culling my book (and also closing the stable door after the horse has bolted). Instead we need to develop a rule that takes this into account.

In thinking about overall profitability, while we have greatly reduced the impact of both reported but unpaid claims and IBNR by virtue of picking a short-tail business, it might be prudent to make say a 5% allowance for these. If we also assume an expense ratio of 35%, then we have a total of non-underwriting-related outgoings of 40%. This means that we can afford to have a paid loss ratio of up to 60% (100% – 40%) and still turn a profit.

Using this insight, my simple rule is as follows:

A policy will be tagged as “bad” if two things occur:

  1. The overall three-year loss ratio is in excess of 60%

    i.e. is has been unprofitable over this period; and

  2. The loss ratio is in excess of 30% in at least two of the three years

    i.e. there is a sustained element to the poor performance and not just the one-off bad luck that can hit the best underwritten of policies

This rule roughly splits the book 75 / 25; with 74% of policies being good. Other choices of parameters may result in other splits and it would be advisable spending a little time optimising things. Perhaps 26% of policies being flagged as bad is too aggressive for example (though this rather depends on what you do about them – see below). However in the simpler world of this example, I’ll press on to the next stage with my first pick.

The ultimate sense of perspective

Well all we have done so far is to tag policies that have performed badly – in the parlance of Analytics zealots we are being backward-looking. Now it is time to lift our hand on 2009 to 2010 and try to be forward-looking. While these figures are obviously also backward looking (the day that someone comes up with future data I will eat my hat), from the frame of reference of our experimental perspective (sitting at the close of 2008), they can be thought of as “the future back then”. We will use the actual performance of the policies in 2009 – 2010 to validate our choice of good and bad that was based on 2006 – 2008 results.

Overall the 50 policies had a loss ratio of 54% in 2009 – 2010. However those flagged as bad in our above exercise had a subsequent loss ratio of 92%. Those flagged as good had a subsequent loss ratio of 40%. The latter is a 14 point improvement on the overall performance of the book.

So we can say with some certainly that our rule, though simplistic, has produced some interesting results. The third part of this series will focus more closely on why this has worked. For now, let’s consider what actions the split we have established could drive.
What to do with the bad?

You shall be taken to the place from whence you came...

We were running a 54% paid ratio in 2009-2010. Using the same assumptions as above, this might have equated to a 94% combined ratio. Our book of business had an annual premium of £0.5m so we received £1m over the two years. The 94% combined would have implied making a £60k profit if we had done nothing different. So what might have happened if we had done something?

There are a number of options. The most radical of these would have been to not renew any of the bad policies; to have carried out a cull. Let us consider what would have been the impact of such an approach. Well our book of business would have shrunk to £740k over the two years at a combined of 40% (the ratio of the good book) + 40% (other outgoing) = 80%, which implies a profit of £148k, up £88k. However there are reasons why we might not have wanted to so drastically shrink our business. A smaller pot of money for investment purposes might have been one. Also we might have had customers with policies in both the good and bad segments and it might have been tricky to cancel the bad while retaining the good. And so on…

Another option would have been to have refined our rule to catch fewer policies. Inevitably, however, this would have reduced the positive impact on profits.

At the other extreme, we might have chosen to take less drastic action relating to the bad policies. This could have included increasing the premium we charged (which of course could also have resulted in us losing the business but via the insured’s choice), raising the deductible payable on any losses, or looking to work with insureds to put in place better risk management processes. Let’s be conservative and say that if the bad book was running at 92% and the overall book at 54% then perhaps it would have been feasible to improve the bad book’s performance to a neutral figure of say 60% (implying a break-even combined of 100%). This would have enabled the insurance organisation to maintain its investment base, to have not lost good business as a result of culling related bad and to have preserved the profit increase generated by the cull.

In practice of course it is likely that some sort of mixed approach would have been taken. The general point is that we have been able to come up with a simple strategy to separate good and bad business and then been able to validate how accurate our choices were. If, in the future, we possessed similar information, then there is ample scope for better decisions to be taken, with potentially positive impact on profits.
Next time…

In the final part of what is now a trilogy, I will look more deeply at what we have learnt from the above example, tie these learnings into how to pitch a BI/DW programme in Insurance and make some more general observations.