Toast

1 Feb 201723 Feb 2017 Peter James Thomas Biology, data science, Mathematics & Science, Physics, Statistics Anita Makri, David Spiegelhalter, food standards agency, nature.com, public trust in science, University of Cambridge

Foreword

This blog touches on a wide range of topics, including social media, cultural transformation, general technology and – last but not least – sporting analogies. However, its primary focus has always been on data and information-centric matters in a business context. Having said this, all but the more cursory of readers will have noted the prevalence of pieces with a Mathematical or Scientific bent. To some extent this is a simple reflection of the author’s interests and experience, but a stronger motivation is often to apply learnings from different fields to the business data arena. This article is probably more scientific in subject matter than most, but I will also look to highlight some points pertinent to commerce towards the end.

Introduction

The topic I want to turn my attention to in this article is public trust in science. This is a subject that has consumed many column inches in recent years. One particular area of focus has been climate science, which, for fairly obvious political reasons, has come in for even more attention than other scientific disciplines of late. It would be distracting to get into the arguments about climate change and humanity’s role in it here ^[1] and in a sense this is just the latest in a long line of controversies that have somehow become attached to science. An obvious second example here is the misinformation circling around both the efficacy and side effects of vaccinations ^[2]. In both of these cases, it seems that at least a sizeable minority of people are willing to query well-supported scientific findings. In some ways, this is perhaps linked to the general mistrust of “experts” and “elites” ^[3] that was explicitly to the fore in the UK’s European Union Referendum debate ^[4].

“People in this country have had enough of experts”

– Michael Gove ^[5], at this point UK Justice Secretary and one of the main proponents of the Leave campaign, speaking on Sky News, June 2016.

Mr Gove was talking about economists who held a different point of view to his own. However, his statement has wider resonance and cannot be simply dismissed as the misleading sound-bite of an experienced politician seeking to press his own case. It does indeed appear that in many places around the world experts are trusted much less than they used to be and that includes scientists.

“Many political upheavals of recent years, such as the rise of populist parties in Europe, Donald Trump’s nomination for the American presidency and Britain’s vote to leave the EU, have been attributed to a revolt against existing elites.”

– The Buttonwood column, The Economist, September 2016.

Why has this come to be?

A Brief ^[6] History of the Public Perception of Science

Note: This section is focussed on historical developments in the public’s trust in science. If the reader would like to skip on to more toast-centric content, then please click here.

Answering questions about the erosion of trust in politicians and the media is beyond the scope of this humble blog. Wondering what has happened to trust in science is firmly in its crosshairs. One part of the answer is that – for some time – scientists were held in too much esteem and the pendulum was inevitably going to swing back the other way. For a while the pace of scientific progress and the miracles of technology which this unleashed placed science on a pedestal from which there was only one direction of travel. During this period in which science was – in general – uncritically held in great regard, the messy reality of actual science was never really highlighted. The very phrase “scientific facts” is actually something of an oxymoron. What we have is instead scientific theories. Useful theories are consistent with existing observations and predict new phenomena. However – as I explained in Patterns patterns everywhere – a theory is only as good as the latest set of evidence and some cherished scientific theories have been shown to be inaccurate; either in general, or in some specific circumstances ^[7]. However saying “we have a good model that helps us explain many aspects of a phenomenon and predict more, but it doesn’t cover everything and there are some uncertainties” is a little more of a mouthful than “we have discovered that…”.

There have been some obvious landmarks along the way to science’s current predicament. The unprecedented destruction unleashed by the team working on the Manhattan Project at first made the scientists involved appear God-like. It also seemed to suggest that the path to Great Power status was through growing or acquiring the best Physicists. However, as the prolonged misery caused in Japan by the twin nuclear strikes became more apparent and as the Cold War led to generations living under the threat of mutually assured destruction, the standing attached by the general public to Physicists began to wane; the God-like mantle began to slip. While much of our modern world and its technology was created off the back of now fairly old theories like Quantum Chromodynamics and – most famously – Special and General Relativity, the actual science involved became less and less accessible to the man or woman in the street. For all the (entirely justified) furore about the detection of the Higgs Boson, few people would be able to explain much about what it is and how it fits into the Standard Model of particle physics.

In the area of medicine and pharmacology, the Thalidomide tragedy, where a drug prescribed to help pregnant women suffering from morning sickness instead led to terrible birth defects in more than 10,000 babies, may have led to more stringent clinical trials, but also punctured the air of certainty that had surrounded the development of the latest miracle drug. While medical science and related disciplines have vastly improved the health of much of the globe, the glacial progress in areas such as oncology has served as a reminder of the fallibility of some scientific endeavours. In a small way, the technical achievements of that apogee of engineering, NASA, were undermined by loss of crafts and astronauts. Most notably the Challenger and Columbia fatalities served to further remove the glossy veneer that science had acquired in the 1940s to 1960s.

Lest it be thought at this point that I am decrying science, or even being anti-scientific, nothing could be further from the truth. I firmly believe that the ever growing body of scientific knowledge is one of humankind’s greatest achievements, if not its greatest. From our unpromising vantage point on an unremarkable little planet in our equally common-all-garden galaxy we have been able to grasp many of the essential truths about the whole Universe from the incomprehensibly gigantic to the most infinitesimal constituent of a sub-atomic particle. However, it seems that many people do not fully embrace the grandeur of our achievements, or indeed in many cases the unexpected beauty and harmony that they have revealed ^[8]. It is to the task of understanding this viewpoint that I am addressing my thoughts.

More recently, the austerity that has enveloped much of the developed world since the 2008 Financial Crisis has had two reinforcing impacts on science in many countries. First funding has often been cut, leading to pressure on research programmes and scientists increasingly having to make an economic case for their activities; a far cry from the 1950s. Second, income has been effectively stagnant for the vast majority of people, this means that scientific expenditure can seem something of a luxury and also fuels the anti-elite feelings cited by The Economist earlier in this article.

Into this seeming morass steps Anita Makri, “editor/writer/producer and former research scientist”. In a recent Nature article she argues that the form of science communicated in popular media leaves the public vulnerable to false certainty. I reproduce some of her comments here:

“Much of the science that the public knows about and admires imparts a sense of wonder and fun about the world, or answers big existential questions. It’s in the popularization of physics through the television programmes of physicist Brian Cox and in articles about new fossils and quirky animal behaviour on the websites of newspapers. It is sellable and familiar science: rooted in hypothesis testing, experiments and discovery.

Although this science has its place, it leaves the public […] with a different, outdated view to that of scientists of what constitutes science. People expect science to offer authoritative conclusions that correspond to the deterministic model. When there’s incomplete information, imperfect knowledge or changing advice — all part and parcel of science — its authority seems to be undermined. […] A popular conclusion of that shifting scientific ground is that experts don’t know what they’re talking about.”

– Anita Makri, Give the public the tools to trust scientists, Nature, January 2017.

I’ll come back to Anita’s article again later.

Food Safety – The Dangers Lurking in Toast

After my speculations about the reasons why science is held in less esteem than once was the case, I’ll return to more prosaic matters; namely food and specifically that humble staple of many a breakfast table, toast. Food science has often fared no better than its brother disciplines. The scientific guidance issued to people wanting to eat healthily can sometimes seem to gyrate wildly. For many years fat was the source of all evil, more recently sugar has become public enemy number one. Red wine was meant to have beneficial effects on heart health, then it was meant to be injurious; I’m not quite sure what the current advice consists of. As Makri states above, when advice changes as dramatically as it can do in food science, people must begin to wonder whether the scientists really know anything at all.

So where does toast fit in? Well the governmental body charged with providing advice about food in the UK is called the Food Standards Agency. They describe their job as “using our expertise and influence so that people can trust that the food they buy and eat is safe and honest.” While the FSA do sterling work in areas such as publicly providing ratings of food hygiene for restaurants and the like, their most recent campaign is one which seems at best ill-advised and at worst another nail in the public perception of the reliability of scientific advice. Such things matter because they contribute to the way that people view science in general. If scientific advice about food is seen as unsound, surely there must be questions around scientific advice about climate change, or vaccinations.

Before I am accused of belittling the FSA’s efforts, let’s consider the campaign in question, which is called Go for Gold and encourages people to consume less acrylamide. Here is some of what the FSA has to say about the matter:

“Today, the Food Standards Agency (FSA) is launching a campaign to ‘Go for Gold’, helping people understand how to minimise exposure to a possible carcinogen called acrylamide when cooking at home.

Acrylamide is a chemical that is created when many foods, particularly starchy foods like potatoes and bread, are cooked for long periods at high temperatures, such as when baking, frying, grilling, toasting and roasting. The scientific consensus is that acrylamide has the potential to cause cancer in humans.

[…]

as a general rule of thumb, aim for a golden yellow colour or lighter when frying, baking, toasting or roasting starchy foods like potatoes, root vegetables and bread.”

– Food Standards Agency, Families urged to ‘Go for Gold’ to reduce acrylamide consumption, January 2017.

The Go for Gold campaign was picked up by various media outlets in the UK. For example the BBC posted an article on its web-site which opened by saying:

“Bread, chips and potatoes should be cooked to a golden yellow colour, rather than brown, to reduce our intake of a chemical which could cause cancer, government food scientists are warning.”

– BBC, Browned toast and potatoes are ‘potential cancer risk’, say food scientists, January 2017.

The BBC has been obsessed with neutrality on all subjects recently ^[9], but in this case they did insert the reasonable counterpoint that:

“However, Cancer Research UK ^[10] said the link was not proven in humans.”

Acrylamide is certainly a nasty chemical. Amongst other things, it is used in polyacrylamide gel electrophoresis, a technique used in biochemistry. If biochemists mix and pour their own gels, they have to monitor their exposure and there are time-based and lifetime limits as to how often they can do such procedures ^[11]. Acrylamide has also been shown to lead to cancer in mice. So what could be more reasonable that the FSA’s advice?

Food Safety – A Statistical / Risk Based Approach

Earlier I introduced Anita Makri, it is time to meet our second protagonist, David Spiegelhalter, Winton Professor for the Public Understanding of Risk in the Statistical Laboratory, Centre for Mathematical Sciences, University of Cambridge ^[12]. Professor Spiegelhalter has penned a response to the FSA’s Go for Gold campaign. I feel that this merits reading in entirety, but here are some highlights:

“Very high doses [of Acrylamide] have been shown to increase the risk of mice getting cancer. The IARC (International Agency for Research on Cancer) considers it a ‘probable human carcinogen’, putting it in the same category as many chemicals, red meat, being a hairdresser and shift-work.

However, there is no good evidence of harm from humans consuming acrylamide in their diet: Cancer Research UK say that ‘At the moment, there is no strong evidence linking acrylamide and cancer.’

This is not for want of trying. A massive report from the European Food Standards Agency (EFSA) lists 16 studies and 36 publications, but concludes

‘In the epidemiological studies available to date, AA intake was not associated with an increased risk of most common cancers, including those of the GI or respiratory tract, breast, prostate and bladder. A few studies suggested an increased risk for renal cell, and endometrial (in particular in never-smokers) and ovarian cancer, but the evidence is limited and inconsistent. Moreover, one study suggested a lower survival in non-smoking women with breast cancer with a high pre-diagnostic exposure to AA but more studies are necessary to confirm this result. (p185)’

[…]

[Based on the EFSA study] adults with the highest consumption of acrylamide could consume 160 times as much and still only be at a level that toxicologists think unlikely to cause increased tumours in mice.

[…]

This all seems rather reassuring, and may explain why it’s been so difficult to observe any effect of acrylamide in diet.”

– David Spiegelhalter, Opinion: How dangerous is burnt toast?, University of Cambridge, January 2017.

Indeed, Professor Spiegelhalter, an esteemed statistician, also points out that most studies will adopt the standard criteria for statistical significance. Given that such significance levels are often set at 5%, then this means that:

“[As] each study is testing an association with a long list of cancers […], we would expect 1 in 20 of these associations to be positive by chance alone.”

He closes his article by stating – not unreasonably – that the FSA’s time and attention might be better spent on areas where causality between an agent and morbidity is well-established, for example obesity. My assumption is that the FSA has a limited budget and has to pick and choose what food issues to weigh in on. Even if we accept for the moment that there is some slight chance of a causal link between the consumption of low levels of acrylamide and cancer, there are plenty of other areas in which causality is firmly established; obesity as mentioned by Professor Spiegelhalter, excessive use of alcohol, even basic kitchen hygiene. It is hard to understand why the FSA did not put more effort into these and instead focussed on an area where the balance of scientific judgement is that there is unlikely to be an issue.

Having a mathematical background perhaps biases me, but I tend to side with Professor Spiegelhalter’s point of view. I don’t want to lay the entire blame for the poor view that some people have of science at the FSA’s door, but I don’t think campaigns like Go for Gold help very much either. The apocryphal rational man or woman will probably deduce that there is not an epidemic of acrylamide poisoning in progress. This means that they may question what the experts at the FSA are going on about. In turn this reduces respect for other – perhaps more urgent – warnings about food and drink. Such a reaction is also likely to colour how the same rational person thinks about “expert” advice in general. All of this can contribute to further cracks appearing in the public edifice of science, an outcome I find very unfortunate.

So what is to be done?

A Call for a New and More Honest Approach to Science Communications

As promised I’ll return to Anita Makri’s thoughts in the same article referenced above:

“It’s more difficult to talk about science that’s inconclusive, ambivalent, incremental and even political — it requires a shift in thinking and it does carry risks. If not communicated carefully, the idea that scientists sometimes ‘don’t know’ can open the door to those who want to contest evidence.

[…]

Scientists can influence what’s being presented by articulating how this kind of science works when they talk to journalists, or when they advise on policy and communication projects. It’s difficult to do, because it challenges the position of science as a singular guide to decision making, and because it involves owning up to not having all of the answers all the time while still maintaining a sense of authority. But done carefully, transparency will help more than harm. It will aid the restoration of trust, and clarify the role of science as a guide.”

The scientific method is meant to be about honesty. You record what you see, not what you want to see. If the data don’t support your hypothesis, you discard or amend your hypothesis. The peer-review process is meant to hold scientists to the highest levels of integrity. What Makri seems to be suggesting is for scientists to turn their lenses on themselves and how they communicate their work. Being honest where there is doubt may be scary, but not as scary as being caught out pushing certainty where no certainty is currently to be had.

Epilogue

At the beginning of this article, I promised that I would bring things back to a business context. With lots of people with PhDs in numerate sciences now plying their trade as data scientists and the like, there is an attempt to make commerce more scientific ^[13]. Understandably, the average member of a company will have less of an appreciation of statistics and statistical methods than their data scientists do. This can lead to data science seeming like magic; the philosopher’s stone ^[14]. There are obvious parallels here with how Physicists were seen in the period immediately after the Second World War.

Earlier in the text, I mused about what factors may have led to a deterioration in how the public views science and scientists. I think that there is much to be learnt from the issues I have covered in this article. If data scientists begin to try to peddle absolute truth and perfect insight (both of which, it is fair to add, are often expected from them by non-experts), as opposed to ranges of outcomes and probabilities, then the same decline in reputation probably awaits them. Instead it would be better if data scientists heeded Anita Makri’s words and tried to always be honest about what they don’t know as well as what they do.

Notes

^[1]	Save to note that there really is no argument in scientific circles. As ever Randall Munroe makes the point pithily in his Earth Temperature Timeline – https://xkcd.com/1732/. For a primer on the area, you could do worse than watching The Royal Society‘s video:
^[2]	For the record, my daughter has had every vaccine known to the UK and US health systems and I’ve had a bunch of them recently as well.
^[3]	Most scientists I know would be astonished that they are considered part of the amorphous, ill-defined and obviously malevolent global “elite”. Then “elite” is just one more proxy for “the other” something which it is not popular to be in various places in the world at present.
^[4]	Or what passed for debate in these post-truth times.
^[5]	Mr Gove studied English at Lady Margaret Hall, Oxford, where he was also President of the Oxford Union. Clearly Oxford produces less experts than it used to in previous eras.
^[6]	One that is also probably wildly inaccurate and certainly incomplete.
^[7]	So Newton’s celebrated theory of gravitation is “wrong” but actually works perfectly well in most circumstances. The the Rutherford–Bohr model, where atoms are little Solar Systems, with the nucleus circled by electrons much as the planets circle the Sun is “wrong”, but actually does serve to explain a number of things; if sadly not the orbital angular momentum of electrons.
^[8]	Someone should really write a book about that – watch this space!
^[9]	Not least in the aforementioned EU Referendum where it felt the need to follow the views of the vast majority of economists with those of the tiny minority, implying that the same weight be attached to both points of view. For example, 99.9999% of people believe the world to be round, but in the interests of balance my mate Jim reckons it is flat.
^[10]	According to their web-site: “the world’s leading charity dedicated to beating cancer through research”.
^[11]	As attested to personally by the only proper scientist in our family.
^[12]	Unlike Oxford (according to Mr Gove anyway), Cambridge clearly still aspires to creating experts.
^[13]	By this I mean proper science and not pseudo-science like management theory and the like.
^[14]	In the original, non-J.K. Rowling sense of the phrase.

Follow @peterjthomas

Do any technologies grow up or do they only come of age?

26 Jan 201726 Jan 2017 Peter James Thomas big data, cloud computing, data governance atscale, bruno aziza

I must of course start by offering my apologies to that doyen of data experts, Stephen King, for mangling his words to suit the purposes of this article ^[1].

The AtScale Big Data Maturity Survey for 2016 came to my attention through a connection (see Disclosure below). The survey covers “responses from more than 2,550 Big Data professionals, across more than 1,400 companies and 77 countries” and builds on their 2015 survey.

I won’t use the word clickbait ^[2], but most of the time documents like this lead you straight to a form where you can add your contact details to the organisation’s marketing database. Indeed you, somewhat inevitably, have to pay the piper to read the full survey. However AtScale are to be commended for at least presenting some of the high-level findings before asking you for the full entry price.

These headlines appear in an article on their blog. I won’t cut and paste the entire text, but a few points that stood out for me included:

Close to 70% [of respondents] have been using Big Data for more than a year (vs. 59% last year)
More than 53% of respondents are using Cloud for their Big Data deployment today and 14% of respondents have all their Big Data in the Cloud
Business Intelligence is [the] #1 workload for Big Data with 75% of respondents planning on using BI on Big Data
Accessibility, Security and Governance have become the fastest growing areas of concern year-over-year, with Governance growing most at 21%
Organizations who have deployed Spark ^[3] in production are 85% more likely to achieve value

Bullet 3 is perhaps notable as Big Data is often positioned – perhaps erroneously – as supporting analytics as opposed to “traditional BI” ^[4]. On the contrary, it appears that a lot of people are employing it in very “traditional” ways. On reflection this is hardly surprising as many organisations have as yet failed to get the best out of the last wave of information-related technology ^[5], let alone the current one.

However, perhaps the two most significant trends are the shift from on-premises Big Data to Cloud Big Data and the increased importance attached to Data Governance. The latter was perhaps more of a neglected area in the earlier and more free-wheeling era of Big Data. The rise in concerns about Big Data Governance is probably the single greatest pointer towards the increasing maturity of the area.

It will be interesting to see what the AtScale survey of 2017 has to say in 12 months.

Disclosure:

The contact in question is Bruno Aziza (@brunoaziza), AtScale’s Chief Marketing Officer. While I have no other connections with AtScale, Bruno and I did make the following video back in 2011 when both of us were at other companies.

Notes

^[1]	Excerpted from The Gunslinger.
^[2]	Oops!
^[3]	Apache Hadoop – which has become almost synonymous with Big Data – has two elements, the Hadoop Distributed File Store (HDFS, the piece which deals with storage) and MapReduce (which does processing of data). Apache Spark was developed to improve upon the speed of the MapReduce approach where the same data is accessed many times, as can happen in some queries and algorithms. This is achieved in part by holding some or all of the data to be accessed in memory. Spark works with HDFS and also other distributed file systems, such as Apache Cassandra.
^[4]	How phrases from the past come around again!
^[5]	Some elements of the technology have changed, but the vast majority of the issues I covered in “Why Business Intelligence projects fail” hold as true today as they did back in 2009 when I wrote this piece.

Follow @peterjthomas

Bumps in the Road

20 Jan 201727 Mar 2017 Peter James Thomas change management, data governance roadworks, sherlock holmes

The above image appears in my updated ^[1] seminar deck Data Management, Analytics and People: An Eternal Golden Braid. It is featured on a slide titled “Why Data Management? – The negative case” ^[2]. So what was the point that I was so keen to make?

Well the whole slide looks like this…

…and the image on the left relates most directly to the last item of bulleted text on the right-hand side ^[3].

An Introductory Anecdote

Before getting into the meat of this article, an aside which may illuminate where I am coming from. I currently live in London, a city where I was born and to which I returned after a sojourn in Cambridge while my wife completed her PhD. Towards the end of my first period in London, we lived on a broad, but one-way road in West London. One day we received notification that the road was going to be resurfaced and moving our cars might be a useful thing to consider. The work was duly carried out and our road now had a deep black covering of fresh asphalt ^[4], criss-crossed by gleaming and well-defined dashed white lines demarking parking bays. Within what seemed like days, but was certainly no more than a few weeks, roadworks signs reappeared on our road, together with red and white fencing, a digger and a number of people with pneumatic drills ^[5] and shovels. If my memory serves me well, it was the local water company (Thames Water) who visited our road first.

The efforts of the Thames Water staff, while no doubt necessary and carried out professionally, rather spoiled our pristine road cover. I guess these things happen and coordination between local government, private firms and the sub-contractors that both employ cannot be easy ^[6]. However what was notable was that things did not stop with Thames Water. Over the next few months the same stretch of road was also dug up by both the Electricity and Gas utilities. There was a further set of roadworks on top of these, but my memory fails me on which organisation carried these out and for what purpose ^[7]; we are talking about events that occurred over eight years ago here.

The result of all this uncoordinated work was a previously pristine road surface now pock-marked by a series of new patches of asphalt, or maybe other materials; they certainly looked different and (as in the above photo) had different colours and grains. Several of these patches of new road covering overlapped each other; that is one hole redug sections previously excavated by earlier holes. Also the new patches of road surface were often either raised or depressed from the main run of asphalt, leading to a very uneven terrain. I have no idea how much it cost to repave the road in the first instance, but a few months of roadworks pretty much buried the repaving and led to a road whose surface was the opposite of smooth and consistent. I’d go so far as to say that the road was now in considerably worse condition than before the initial repaving. In any case, it could be argued that the money spent on the repaving was, for all intents and purposes, wasted.

After all this activity, our road was somewhat similar to the picture at the top of this article, but its state was much worse with more extensive patching and more overlapping layers. To this day I rather wish I had taken a photograph, which would also have saved me some money on stock photos!

I understand that each of the roadworks was in support of something that was probably desirable. For example, better sewerage, or maintenance to gas supplies which might otherwise have become dangerous. My assumption is that all of the work that followed on from the repaving needed to be done and that each was done at least as well as it had to be. Probably most of these works were completed on time and on budget. However, from the point of view of the road as a whole, the result of all these unconnected and uncoordinated works was a substantial deterioration in both its appearance and utility.

In summary, the combination of a series of roadworks, each of which either needed to be done or led to an improvement in some area, resulted in the environment in which they were carried out becoming degraded and less fit-for-purpose. A series of things which could be viewed as beneficial in isolation were instead deleterious in aggregate. At this point, the issue that I wanted to highlight in the data world is probably swimming into focus for many readers.

The Entropy of a Data Asset exposed to Change tends to a Maximum ^[8]

Returning to the slide I reproduce above, my assertion – which has been borne out during many years of observing the area – is that Change Programmes and Projects, if not subject to appropriately rigorous Data Governance, inevitably led to the degradation of data assets over time.

Here both my roadworks anecdote and the initial photograph illustrate the point that I am looking to make. Over the last decade or so, the delivery of technological change has evolved ^[9] to the point where many streams of parallel work are run independently of each other with each receiving very close management scrutiny in order to ensure delivery on-time and on-budget ^[10]. It should be recognised that some of this shift in modus operandi has been as a result of IT departments running projects that have spiralled out of control, or where delivery has been significantly delayed or compromised. The gimlet-like focus of Change on delivery “come Hell or High-water” represents the pendulum swinging to the other extreme.

What this shift in approach means in practice is that – as is often the case – when things go wrong or take longer than anticipated ^[11], areas of work are de-scoped to secure delivery dates. In my experience, 9 times out of 10 one of the things that gets thrown out is data-related work; be that not bothering to develop reporting on top of new systems, not integrating new data into existing repositories, not complying with data standards, or not implementing master data management.

As well as the danger of skipping necessary data related work, if some data-related work is actually undertaken, then corners may be cut to meet deadlines and budgets. It is not atypical for instance that a Change Programme, while adding their new capabilities to interfaces or ETL, compromises or overwrites existing functionality. This can mean that data-centric code is in a worse state after a Change Programme than before. My roadworks anecdote begins to feel all too apt a metaphor to employ.

Looking more broadly at Change Programmes, even without the curse of de-scopes, their focus is seldom data and the expertise of Change staff is not often in data matters. Because of this, such work can indeed seem to be analogous to continually digging up the same stretch of road for different purposes, combined with patching things up again in a manner that can sometimes be barely adequate. Extending our metaphor ^[12], the result of Change that is not controlled from a data point of view can be a landscape with lumps, bumps and pot-holes. Maybe the sewer was re-laid on time and to budget, but the road has been trashed in the process. Perhaps a new system was shoe-horned in to production, but rendered elements of an Analytical Repository useless in the process.

Avoiding these calamities is the central role of Data Governance. What these examples also stress is that, rather than the dry, policy-based area that Data Governance is often assumed to be, it must be more dynamic and much more engaged in Change Portfolios. Such engagement should ideally be early and in a helpful manner, not late and in a policing role.

The analogy I have employed here also explains why leveraging existing Governance arrangements to add in a Data Governance dimension seldom works. This would be like asking the contractors engaged in roadworks to be extra careful to liaise with each other. This won’t work as there is no real incentive for such collaboration, the motivation of getting their piece of work done quickly and cheaply will trump other considerations. Instead some independent oversight is required. Like any good “regulator” this will work best if Data Governance professionals seek to be part of the process and focus on improving it. The alternative of simply pointing out problems after the fact adds much less business value.

And Finally

In A Study in Scarlet John Watson reads an article, which turns out to have been written by his illustrious co-lodger. A passage is as follows:

“From a drop of water,” said the writer, “a logician could infer the possibility of an Atlantic or a Niagara without having seen or heard of one or the other. So all life is a great chain, the nature of which is known whenever we are shown a single link of it.”

While I don’t claim to have the same acuity of mind as Conan-Doyle’s most famous creation, I can confirm that you can learn a lot about the need for Data Governance by simply closely observing the damage done by roadworks.

Notes

^[1]	I have updated my latest deck to use a different photo due to a dispute with the company I purchased the original photo from.
^[2]	Which you may be glad to hear is followed directly by one titled “Why Data Management? – The positive case”.
^[3]	It may be noted that I am going through a minimalist phase in my decks for public speaking. Indeed I did toy with having a deck consisting primarily of images before chickening out. Of course one benefit of being text-light is that you can focus on different elements and tell different stories for different audiences (see Presenting in Public).
^[4]	Blacktop.
^[5]	Jackhammers.
^[6]	Indeed sometime in the late 1980s or early 1990s I was approached by one of the big consultancies about a job on a project to catalogue all proposed roadworks across London in an Oracle database. The objective of this was to better coordinate roadworks. I demurred and I believe that the project was unsuccessful, certainly by the evidence of what happened to our road.
^[7]	It could well have been Thames Water again – the first time sewers, the second household water supply. It might have been British Telecom, but it probably wasn’t a cable company as they had been banned from excavations in Westminster after failing to make good after previous installations.
^[8]	Rudolf Clausius in 1865, with reference to the Second Law of Thermodynamics.
^[9]	As with the last time I used this word (see the notes section of Alphabet Soup) and also as applies with the phenomenon in the narual world, evolution implies change, but not necessarily always improvement.
^[10]	Or perhaps more realistically to ensure that delays are minimised and cost overruns managed downwards.
^[11]	Frequently it must be added because of either insufficient, or the wrong type of up-front analysis, or because a delivery timeframe was agreed based on some external factor rather than on what could practically be delivered in the time available. Oftentimes both factors are present and compound each other. The overall timetable is not based on any concrete understanding of what is to be done and analysis is either curtailed to meet timeframes, or – more insidiously – its findings are massaged to fit the desired milestones.
^[12]	Hopefully not over-extending it.

Follow @peterjthomas

Metamorphosis

13 Jan 201716 Jan 2017 Peter James Thomas data science, data visualisation automattic, Boris Gorelik, python, wordpress

No neither my observations on the work of Kafka, nor that of Escher ^[1]. Instead some musings relating on how to transform a bare bones and unengaging chart into something that both captures the attention of the reader and better informs them of the message that the data displayed is relaying. Let’s consider an example:

Before:

After:

The two images above are both renderings of the same dataset, which tracks the degree of fragmentation of the Israeli parliament – the Knesset – over time ^[2]. They are clearly rather different and – I would argue – the latter makes it a lot easier to absorb information and thus to draw inferences.

Both are the work of Boris Gorelik a data scientist at Automattic, a company that is most well-known for creating freemium SAAS blogging platform, WordPress.com and open source blogging software, WordPress ^[3].

I have been a contented WordPress.com user since the inception of this blog back in November 2008, so it was with interest that I learnt that Automattic have their own data-focussed blog, Data for Breakfast, unsurprisingly hosted on WordPress.com. It was on Data for Breakfast that I found Boris’s article, Evolution of a Plot: Better Data Visualization, One Step at a Time. In this he takes the reader step by step through what he did to transform his data visualisation from the ugly duckling “before” exhibit to the beautiful swan “after” exhibit.

Boris is using Python and various related libraries to do his data visualisation work. Given that I stopped commercially programming sometime around 2009 (admittedly with a few lapses since), I typically use the much more quotidian Excel for most of the charts that appear on peterjamesthomas.com ^[4]. Sometimes, where warranted, I enhance these using Visio and / or PaintShop Pro.

For example, the three ^[5] visualisations featured in A Tale of Two [Brexit] Data Visualisations were produced this way. Despite the use of Calibri, which is probably something of a giveaway, I hope that none of these resembles a straight-out-of-the-box Excel graph ^[6].

Brexit Bar — UK Referendum on EU Membership – Percentage voting by age bracket (see notes)

Brexit Bar 2 — UK Referendum on EU Membership – Numbers voting by age bracket (see notes)

Brexit Flag — UK Referendum on EU Membership – Number voting by age bracket (see notes)

While, in the above, I have not gone to the lengths that Boris has in transforming his initial and raw chart into something much more readable, I do my best to make my Excel charts look at least semi-professional. My reasoning is that, when the author of a chart has clearly put some effort into what their chart looks like and has at least attempted to consider how it will be read by people, then this is a strong signal that the subject matter merits some closer consideration.

Next time I develop a chart for posting on these pages, I may take Boris’s lead and also publish how I went about creating it.

Notes

^[1]	Though the latter’s work has adorned these pages on several occasions and indeed appears in my seminar decks.
^[2]	Boris has charted a metric derived from how many parties there have been and how many representatives of each. See his article itself for further background.
^[3]	You can learn more about the latter at WordPress.org.
^[4]	Though I have also used GraphPad Prism for producing more scientific charts such as the main one featured in Data Visualisation – A Scientific Treatment.
^[5]	Yes I can count. I have certificates which prove this.
^[6]	Indeed the final one was designed to resemble a fractured British flag. I’ll leave readers to draw their own conclusions here.

Follow @peterjthomas

Alphabet Soup

10 Jan 201723 Sep 2017 Peter James Thomas business analytics, chief data officer football manager, james taylor, jen stirrup, PASS, robert morison

This article is about the latest consumer product from the Google stable, something which will revolutionise your eating experience by combining a chicken-broth base with a nanotechnology garnish and a soupçon of deep learning techniques to create a warming meal that also provides a gastro-intestinal health-check. Wait…

…I may have got my wires crossed a bit there. No, I mis-spoke, the article is actually about ever increasing number of CxO titles ^[1], which has made a roster of many organisations’ executives come to resemble a set of Scrabble tiles.

Specifically I will focus on two values of x, A and D, so the CAO and CDO roles ^[2]. What do these TLAs ^[3] stand for, what do people holding these positions do and can we actually prove that, for these purposes only, “A” ≡ “D”?

Breaking the Code

The starting position is not auspicious. What might CAO stand for? Existing roles that come to mind include: Chief Accounting Officer and Chief Administrative Officer. However, in our context, it actually stands for Chief Analytics Officer. There is no ISO definition of Analytics, as I note in one of my recent seminar decks ^[4] (quoting the Gartner IT Glossary, but with my underlining):

Analytics has emerged as a catch-all term for a variety of different business intelligence and application-related initiatives. In particular, BI vendors use the ‘analytics’ moniker to differentiate their products from the competition. Increasingly, ‘analytics’ is used to describe statistical and mathematical data analysis that clusters, segments, scores and predicts what scenarios are most likely to happen.

I should of course mention here that my current role incorporates the word “Analytics” ^[5], so I may be making a point against myself. But before I start channeling my 2009 article, Business Analytics vs Business Intelligence ^[6], I’ll perhaps instead move on to the second acronym. How to decode CDO? Well an equally recent translation would be Chief Digital Officer, but you also come across Chief Development Officer and sometimes even Chief Diversity Officer. Our meaning will however be Chief Data Officer. You can read about what I think a CDO does here.

A observation that is perhaps obvious to make at this juncture is that when the acronym of a role is not easy to pin down, the content of the role may be equally amorphous. It is probably fair to say that this is true of both CAO and CDO job descriptions. Both are emerging roles in the majority of organisations.

Before the Flood

One thing that both roles have in common is that – in antediluvian days – their work used to be the province of another CxO, the CIO. This was before many CIOs became people who focus on solution architecture, manage relationships with outsourcers and have their time consumed by running Service Desks and heading off infrastructure issues ^[7]. Where organisations may have had just a CIO, they may well now have a CIO, a CAO and a CDO (and also a CTO perhaps which splits one original “C” role into four).

Aside from being a job creation scheme, the reasons for such splits are well-documented. The prevalence of outsourcing (and the complexity of managing such arrangements); the pervasiveness and criticality of technology leading to many CIOs focussing more on the care and feeding of systems than how businesses employ them; the relentless rise of Change organisations; and (frequently related to the last point) the increase in size of IT departments (particularly if staff in external partner organisations are included). All of these have pushed CIOs into more business as usual / back-room / engineering roles, leaving a vacuum in the nexus between business, technology and transformation. The fact that data processing is very different to data collation and synthesis has been another factor in CAOs and / or CDOs filling this vacuum.

Some other Points of View

As trailed in some previous articles ^[8], I have been thinking about the potential CAO / CDO dichotomy for some time. Towards the beginning of this period I read some notes that decision management luminary James Taylor had published based on the proceedings of the 2015 Chief Analytics Officer Summit. In the first part of these he cites comments made by Robert Morison as follows:

Practically speaking organizations need both roles [CAO and CDO] filled – either by one person or by two working closely together. This is hard because the roles are both new and evolving – role clarity was not the norm creating risk. In particular if both roles exist they must have some distinction such as demand v supply, offense v defense – adding value to data with analytics v managing data quality and consistency. But enterprises need to be ready – in particular when data is being identified as an asset by the CEO and executive team. CDOs tend to be driven by fragmented data environments, regulatory challenges, customer centricity. CAO tends to be driven by a focus on improving decision-making, moving to predictive analytics, focusing existing efforts.

Where CAO and CDO roles are separate, the former tends to work on exploiting data, the latter on data foundations / compliance. These are precisely the two vertical extremities of the spectrum I highlighted in The Chief Data Officer “Sweet Spot”. As Robert points out, in order for both to be successful, the CAO and CDO need to collaborate very closely.

Around the same time, another take on the same general question was offered by Jen Stirrup in her 2015 PASS Diary ^[9] article, Why are PASS doing Business Analytics at all?. Here Jen cites the Gartner distinctions between descriptive, diagnostic, predictive and prescriptive analytics adding that:

Business Intelligence and Business Analytics are a continuum. Analytics is focused more on a forward motion of the data, and a focus on value.

Channeling Douglas Adams, this model can be rehashed as:

What happened?
Why did it happen?
What is going to happen next?
What should we be doing?

As well as providing a finer grain distinguishing different types of analytics, the steps necessary to answer these questions also tend to form a bridge between what might be regarded as definitively CDO work and what might be regarded as definitively CAO work. As Jen notes, it’s a continuum. Answering “What happened?” with any accuracy requires solid data foundations and decent data quality, working out “What is going to happen next?” requires each of solid data foundations, decent data quality and a statistical approach.

Much CDO about Nothing

In some organisations, particularly the type where headcount is not a major factor in determining overall results, separate CAO and CDO departments can coexist; assuming of course that their leaders recognise their mutual dependency, park their egos at the door and get on with working together. However, even in such organisations, the question arises of to whom should the CAO and CDO report, a single person, two different people, or should one of them report to the other? In more cost-conscious organisations entirely separate departments may feel like something of a luxury.

My observation is that CAO staff generally end up doing data collation and cleansing, while CDO staff often get asked to provide data and carry out data analysis. This blurs what is already a fairly specious distinction between the two areas and provides scope for both duplication of work and – more worryingly – different answers to the same business questions. As I have mentioned in earlier articles, to anyone engaged in the fields, Analytics and Data Management are two sides of the same coin and both benefit from being part of the same unitary management structure.

If we consider the arrangements on the left-hand side of the above diagram, the two departments may end up collaborating, but the structure does not naturally lead to this. Indeed, where the priorities of the people that the CAO and CDO report in to differ, then there is scope for separate agendas, unhealthy competition and – again – duplication and waste. It is my assertion that the arrangements on the right-hand side are more likely to lead to a cohesive treatment of the spectrum of data matters and thus superior business outcomes.

In the right-hand exhibit, I have intentionally steered away from CAO and CDO titles. I recognise that there are different disciplines within the data world, but would expect virtual teams to form, disband and reform as required drawing on a variety of skills and experience. I have also indicated that the whole area should report into a single person, here given the monicker of TDJ (or Top Data Job ^[10]). You could of course map Analytics Lead to CAO and Data Management lead to CDO if you chose. Equally you could map one or other of these to the TDJ, with the other subservient. To an extent it doesn’t really matter. What I do think matters is that the TDJ goes to someone who understands the whole data arena; both the CAO and CDO perspectives. In my opinion this rules out most CEOs, COOs and CFOs from this role.

More or less Mandatory Sporting Analogy ^[11]

An analogy here comes from Robert Morison’s mention of “offense v defense” ^[12]. This puts me in mind of an [Association] Football Manager. In Soccer (to avoid further confusion), there are not separate offensive and defensive teams, whose presence on the field of play are mutually exclusive. Instead your defenders and attackers are different roles within one team; also sometimes defenders have to attack and attackers have to defend. The arrangements in the left-hand organogram are as if the defenders in a Soccer team were managed by one person, the attackers by another and yet they were all expected to play well together. Of course there are specialist coaches, but there is one Manager of a Soccer team who has overall accountability for tactics, selection and style of play (they also manage any specialist coaches). It is generally the Manager who lives or dies according to their team’s success. Equally, in the original right-hand organogram, if the TDJ is held by someone who understands just analytics or just data management, then it is like a Soccer Manager who only understands attack, but not defence.

The point I am trying to make is probably more readily apprehended via the following diagram:

On the assumption that the Manager on the right knows a lot about both attack and defence in Soccer, whereas the team owner is at best an interested amateur, then is the set up on the left or on the right likely to be a more formidable footballing force?

Even in American Football the analogy still holds. There are certainly offensive and defensive coaches, each of whom has “their” team on the park for a period. However, it is the Head Coach who calls the shots and this person needs to understand all of the nuances of the game.

In Closing

So, my recommendation is that – in data matters – you similarly have someone in the Top Data Job, with a broad knowledge of all aspects of data. They can be supported by specialists of course, but again someone needs to be accountable. To my mind, we already have a designation for such as person, a Chief Data Officer. However, to an extent this is semantics. A Chief Analytics Officer who is knowledgeable about Data Governance and Data Management could be the head data honcho ^[13], but one who only knows about analytics is likely to have their work cut out for them. Equally if CAO and CDO functions are wholly separate and only come together in an organisation under someone who has no background in data matters, then nothing but problems is going to arise.

The Top Data Job – or CDO in my parlance – has to be au fait with the span of data activities in an organisation and accountable for all work pertaining to data. If not then they will be as useful as a Soccer Manager who only knows about one aspect of the game and can only direct a handful of the 11 players on the field. Do organisations want some chance of winning the game, or to tie their hands behind their backs and don a blindfold before engaging in data activities? The choice should not really be a difficult one.

Notes

^[1]	∀ x : 65 ≤ ascii(x) ≤ 90.
^[2]	“C”, “A”, “O” + “C”, “D”, “O” + (for no real reason save expediency) “R” allows you to spell ACCORD, which scores 11 in Executive Scrabble.
^[3]	Three Letter Acronyms.
^[4]	Data Management, Analytics, People: An Eternal Golden Braid – A Metaphorical Fugue On The Data ⇒ Information ⇒ Insight ⇒ Action Journey In The Spirit Of Douglas R. Hofstadter – IRM(UK) Enterprise Data / Business Intelligence 2016
^[5]	I hasten to add that it also contains the phrase “Data Management” – see here.
^[6]	Probably not a great idea for any of those involved.
^[7]	Whether or not this evolution (or indeed regression) of the CIO role has proved to be a good thing is perhaps best handled in a separate article.
^[8]	Including: Wanted – Chief Data Officer 5 Themes from a Chief Data Officer Forum 5 More Themes from a Chief Data Officer Forum and The Chief Data Officer “Sweet Spot”
^[9]	PASS was co-founded by CA Technologies and Microsoft Corporation in 1999 to promote and educate SQL Server users around the world. Since its founding, PASS has expanded globally and diversified its membership to embrace professionals using any Microsoft data technology.
^[10]	With acknowledgement to Peter Aiken.
^[11]	A list of my articles that employ sporting analogies appears – appropriately enough – at the beginning of Analogies.
^[12]	That’s “offence vs defence” in case any readers were struggling.
^[13]	Maybe organisations should consider adding HDH to their already very crowded Executive alphabet soup.

Follow @peterjthomas

The Chief Data Officer “Sweet Spot”

22 Dec 20166 Feb 2017 Peter James Thomas business analytics, chief data officer, data governance, data management, data quality Cast of Serenity, Firefly, IRM UK

I verbally “scribbled” something quite like the exhibit above recently in conversation with a longstanding professional associate. This was while we were discussing where the CDO role currently sat in some organisations and his or her span of responsibilities. We agreed that – at least in some cases – the role was defined sub-optimally with reference to the axes in my virtual diagram.

This discussion reminded me that I was overdue a piece commenting on November’s IRM(UK) CDO Executive Forum; the third in a sequence that I have covered in these pages ^{[1], [2]}. In previous CDO Exec Forum articles, I have focussed mainly on the content of the day’s discussions. Here I’m going to be more general and bring in themes from the parent event; IRM(UK) Enterprise Data / Business Intelligence 2016. However I will later return to a theme central to the Exec Forum itself; the one that is captured in the graphic at the head of this article.

As well as attending the CDO Forum, I was speaking at the umbrella event. The title of my talk was Data Management, Analytics, People: An Eternal Golden Braid ^[3].

The real book, whose title I had plagiarised, is Gödel, Escher and Bach, an Eternal Golden braid, by Pulitzer-winning American Author and doyen of 1970s pop-science books, Douglas R. Hofstadter ^[4]. This book, which I read in my youth, explores concepts in consciousness, both organic and machine-based, and their relation to recursion and self-reference. The author argued that these themes were major elements of the work of each of Austrian Mathematician Kurt Gödel (best known for his two incompleteness theorems), Dutch graphic artist Maurits Cornelis Escher (whose almost plausible, but nevertheless impossible buildings and constantly metamorphosing shapes adorn both art galleries and college dorms alike) and German composer Johann Sebastian Bach (revered for both the beauty and mathematical elegance of his pieces, particularly those for keyboard instruments). In an age where Machine Learning and other Artificial Intelligence techniques are moving into the mainstream – or at least on to our Smartphones – I’d recommend this book to anyone who has not had the pleasure of reading it.

In my talk, I didn’t get into anything as metaphysical as Hofstadter’s essays that intertwine patterns in Mathematics, Art and Music, but maybe some of the spirit of his book rubbed off on my much lesser musings. In any case, I felt that my session was well-received and one particular piece of post-presentation validation had me feeling rather like these guys for the rest of the day:

What happened was that a longstanding internet contact ^[5] sought me out and commended me on both my talk and the prescience of my July 2009 article, Is the time ripe for appointing a Chief Business Intelligence Officer? He argued convincingly that this foreshadowed the emergence of the Chief Data Officer. While it is an inconvenient truth that Visa International had a CDO eight years earlier than my article appeared, on re-reading it, I was forced to acknowledge that there was some truth in his assertion.

To return to the matter in hand, one point that I made during my talk was that Analytics and Data Management are two sides of the same coin and that both benefit from being part of the same unitary management structure. By this I mean each area reporting into an Executive who has a strong grasp of what they do, rather than to a general manager. More specifically, I would see Data Compliance work and Data Synthesis work each being the responsibility of a CDO who has experience in both areas.

It may seem that crafting and implementing data policies is a million miles from data visualisation and machine learning, but to anyone with a background in the field, they are much more strongly related. Indeed, if managed well (which is often the main issue), they should be mutually reinforcing. Thus an insightful model can support business decision-making, but its authors would generally be well-advised to point out any areas in which their work could be improved by better data quality. Efforts to achieve the latter then both improve the usefulness of the model and help make the case for further work on data remediation; a virtuous circle.

Here we get back to the vertical axis in my initial diagram. In many organisations, the CDO can find him or herself at the extremities. Particularly in Financial Services, an industry which has been exposed to more new regulation than many in recent years, it is not unusual for CDOs to have a Risk or Compliance background. While this is very helpful in areas such as Governance, it is less of an asset when looking to leverage data to drive commercial advantage.

Symmetrically, if a rookie CDO was a Data Scientist who then progressed to running teams of Data Scientists, they will have a wealth of detailed knowledge to fall back on when looking to guide business decisions, but less familiarity with the – sometimes apparently thankless, and generally very arduous – task of sorting out problems in data landscapes.

Despite this, it is not uncommon to see CDOs who have a background in just one of these two complementary areas. If this is the case, then the analytics expert will have to learn bureaucratic and programme skills as quickly as they can and the governance guru will need to expand their horizons to understand the basics of statistical modelling and the presentation of information in easily digestible formats. It is probably fair to say that the journey to the centre is somewhat perilous when either extremity is the starting point.

Let’s now think about the second and horizontal axis. In some organisations, a newly appointed CDO will be freshly emerged from the ranks of IT (in some they may still report to the CIO, though this is becoming more of an anomaly with each passing year). As someone whose heritage is in IT (though also from very early on with a commercial dimension) I understand that there are benefits to such a career path, not least an in-depth understanding of at least some of the technologies employed, or that need to be employed. However a technology master who is also a business neophyte is unlikely to set the world alight as a newly-minted CDO. Such people will need to acquire new skills, but the learning curve is steep.

To consider the other extreme of this axis, it is undeniable that a CDO organisation will need to undertake both technical and technological work (or at least to guide this in other departments). Therefore, while an in-depth understanding of a business, its products, markets, customers and competitors will be of great advantage to a new CDO, without at least a reasonable degree of technical knowledge, they may struggle to connect with some members of their team; they may not be able to immediately grasp what technology tasks are essential and which are not; and they may not be able to paint an accurate picture of what good looks like in the data arena. Once more rapid assimilation of new information and equally rapid acquisition of new skills will be called for.

At this point it will be pretty obvious that my central point here is that the “sweet spot” for a CDO, the place where they can have greatest impact on an organisation and deliver the greatest value, is at the centre point of both of these axes. When I was talking to my friend about this, we agreed that one of the reasons why not many CDOs sit precisely at this nexus is because there are few people with equal (or at least balanced) expertise in the business and technology fields; few people who understand both data synthesis and data compliance equally well; and vanishingly few who sit in the centre of both of these ranges.

Perhaps these facts would also have been apparent from revewing the CDO job description I posted back in November 2015 as part of Wanted – Chief Data Officer. However, as always, a picture paints a thousand words and I rather like the compass-like exhibit I have come up with. Hopefully it conveys a similar message more rapidly and more viscerally.

To bring things back to the IRM(UK) CDO Executive Forum, I felt that issues around where delegates sat on my CDO “sweet spot” diagram (or more pertinently where they felt that they should sit) were a sub-text to many of our discussions. It is worth recalling that the mainstream CDO is still an emergent role and a degree of confusion around what they do, how they do it and where they sit in organisations is inevitable. All CxO roles (with the possible exception of the CEO) have gone through similar journeys. It is probably instructive to contrast the duties of a Chief Risk Officer before 2008 with the nature and scope of their responsibilities now. It is my opinion that the CDO role (and individual CDOs) will travel an analogous path and eventually also settle down to a generally accepted set of accountabilities.

In the meantime, if your organisation is lucky enough to have hired one of the small band of people whose experience and expertise already place them in the CDO “sweet spot”, then you are indeed fortunate. If not, then not all is lost, but be prepared for your new CDO to do a lot of learning on the job before they too can join the rather exclusive club of fully rounded CDOs.

Epilogue

As an erstwhile Mathematician, I’ve never seen a framework that I didn’t want to generalise. It occurs to me and – I assume – will also occur to many readers that the North / South and East / West diagram I have created could be made even more compass-like by the addition of North East / South West and North West / South East axes, with our idealised CDO sitting in the middle of these spectra as well ^[6].

Readers can debate amongst themselves what the extremities of these other dimensions might be. I’ll suggest just a couple: “Change” and “Business as Usual”. Given how organisations seem to have evolved in recent years, it is often unfortunately a case of never the twain shall meet with these two areas. However a good CDO will need to be adept at both and, from personal experience, I would argue that mastery of one does not exclude mastery of the other.

Notes

^[1]	See each of: 5 Themes from a Chief Data Officer Forum 5 More Themes from a Chief Data Officer Forum and Themes from a Chief Data Officer Forum – the 180 day perspective
^[2]	The main reasons for delay were a house move and a succession of illnesses in my family – me included – so I’m going to give myself a pass.
^[3]	The sub-title was A Metaphorical Fugue On The Data ⇨ Information ⇨ Insight ⇨ Action Journey in The Spirt Of Douglas R. Hofstadter, which points to the inspiration behind my talk rather more explicity.
^[4]	Douglas R. Hofstadter is the son of Nobel-wining physicist Robert Hofstadter. Prize-winning clearly runs in the Hofstadter family, much as with the Braggs, Bohrs, Curies, Euler-Chelpins, Kornbergs, Siegbahns, Tinbergens and Thomsons.
^[5]	I am omitting any names or other references to save his blushes.
^[6]	I could have gone for three or four dimensional Cartesian coordinates as well I realise, but sometimes (very rarely it has to be said) you can have too much Mathematics.

Follow @peterjthomas

Curiouser and Curiouser – The Limits of Brexit Voting Analysis

12 Jul 201614 Jan 2017 Peter James Thomas data quality, data science, infographics, Statistics alice in wonderland, ashcroft polling, omnium polling, through the looking glass and what alice found there

An original illustration from Charles Lutwidge Dodgson's seminal work would have been better, but sadly none such seems to be extant

Down the Rabbit-hole

When I posted my Brexit infographic reflecting the age of voters an obvious extension was to add an indication of the number of people in each age bracket who did not vote as well as those who did. This seemed a relatively straightforward task, but actually proved to be rather troublesome (this may be an example of British understatement). Maybe the caution I gave about statistical methods having a large impact on statistical outcomes in An Inconvenient Truth should have led me to expect such issues. In any case, I thought that it would be instructive to talk about the problems I stumbled across and to – once again – emphasise the perils of over-extending statistical models.

Brexit ages infographic — Click to download a larger PDF version in a new window.

Regular readers will recall that my Brexit Infographic (reproduced above) leveraged data from an earlier article, A Tale of two [Brexit] Data Visualisations. As cited in this article, the numbers used were from two sources:

The UK Electoral Commission – I got the overall voting numbers from here.
Lord Ashcroft’s Poling organisation – I got the estimated distribution of votes by age group from here.

In the notes section of A Tale of two [Brexit] Data Visualisations I [prophetically] stated that the breakdown of voting by age group was just an estimate. Based on what I have discovered since, I’m rather glad that I made this caveat explicit.

The Pool of Tears

In order to work out the number of people in each age bracket who did not vote, an obvious starting point would be the overall electorate, which the UK Electoral Commission stated as being 46,500,001. As we know that 33,551,983 people voted (an actual figure rather than an estimate), then this is where the turnout percentage of 72.2% (actually 72.1548%) came from (33,551,983 / 45,500,001).

A clarifying note, the electorate figures above refer to people who are eligible to vote. Specifically, in order to vote in the UK Referendum, people had to meet the following eligibility criteria (again drawn from the UK Electoral Commission):

To be eligible to vote in the EU Referendum, you must be:

A British or Irish citizen living in the UK, or

A Commonwealth citizen living in the UK who has leave to remain in the UK or who does not require leave to remain in the UK, or

A British citizen living overseas who has been registered to vote in the UK in the last 15 years, or

An Irish citizen living overseas who was born in Northern Ireland and who has been registered to vote in Northern Ireland in the last 15 years.

EU citizens are not eligible to vote in the EU Referendum unless they also meet the eligibility criteria above.

So far, so simple. The next thing I needed to know was how the electorate was split by age. This is where we begin to run into problems. One place to start is the actual population of the UK as at the last census (2011). This is as follows:

Ages (years)	Population	% of total
0–4	3,914,000	6.2
5–9	3,517,000	5.6
10–14	3,670,000	5.8
15–19	3,997,000	6.3
20–24	4,297,000	6.8
25–29	4,307,000	6.8
30–34	4,126,000	6.5
35–39	4,194,000	6.6
40–44	4,626,000	7.3
45–49	4,643,000	7.3
50–54	4,095,000	6.5
55–59	3,614,000	5.7
60–64	3,807,000	6.0
65–69	3,017,000	4.8
70–74	2,463,000	3.9
75–79	2,006,000	3.2
80–84	1,496,000	2.4
85–89	918,000	1.5
90+	476,000	0.8
Total	63,183,000	100.0

If I roll up the above figures to create the same age groups as in the Ashcroft analysis (something that requires splitting the 15-19 range, which I have assumed can be done uniformly), I get:

Ages (years)	Population	% of total
0-17	13,499,200	21.4
18-24	5,895,800	9.3
25-34	8,433,000	13.3
35-44	8,820,000	14.0
45-54	8,738,000	13.8
55-64	7,421,000	11.7
65+	10,376,000	16.4
Total	63,183,000	100.0

The UK Government isn’t interested in the views of people under 18^{[citation needed]}, so eliminating this row we get:

Ages (years)	Population	% of total
18-24	5,895,800	11.9
25-34	8,433,000	17.0
35-44	8,820,000	17.8
45-54	8,738,000	17.6
55-64	7,421,000	14.9
65+	10,376,000	20.9
Total	49,683,800	100.0

As mentioned, the above figures are from 2011 and the UK population has grown since then. Web-site WorldOMeters offers an extrapolated population of 65,124,383 for the UK in 2016 (this is as at 12th July 2016; if extrapolation and estimates make you queasy, I’d suggest closing this article now!). I’m going to use a rounder figure of 65,125,000 people; there is no point pretending that precision exists where it clearly doesn’t. Making the assumption that such growth is uniform across all age groups (please refer to my previous bracketed comment!), then the above exhibit can also be extrapolated to give us:

Ages (years)	Population	% of total
18-24	6,077,014	11.9
25-34	8,692,198	17.0
35-44	9,091,093	17.8
45-54	9,006,572	17.6
55-64	7,649,093	14.9
65+	10,694,918	20.9
Total	51,210,887	100.0

Looking Glass House

So our – somewhat fabricated – figure for the 18+ UK population in 2016 is 51,210,887, let’s just call this 51,200,000. As at the beginning of this article the electorate for the 2016 UK Referendum was 45,500,000 (dropping off the 1 person with apologies to him or her). The difference is explicable based on the eligibility criteria quoted above. I now have a rough age group break down of the 51.2 million population, how best to apply this to the 45.5 million electorate?

I’ll park this question for the moment and instead look to calculate a different figure. Based on the Ashcroft model, what percentage of the UK population (i.e. the 51.2 million) voted in each age group? We can work this one out without many complications as follows:

Ages (years)	Population (A)	Voted (B)	Turnout % (B/A)
18-24	6,077,014	1,701,067	28.0
25-34	8,692,198	4,319,136	49.7
35-44	9,091,093	5,656,658	62.2
45-54	9,006,572	6,535,678	72.6
55-64	7,649,093	7,251,916	94.8
65+	10,694,918	8,087,528	75.6
Total	51,210,887	33,551,983	65.5

(B) = Size of each age group in the Ashcroft sample as a percentage multiplied by the total number of people voting (see A Tale of two [Brexit] Data Visualisations).

Remember here that actual turnout figures have electorate as the denominator, not population. As the electorate is less than the population, this means that all of the turnout percentages should actually be higher than the ones calculated (e.g. the overall turnout with respect to electorate is 72.2% whereas my calculated turnout with respect to population is 65.5%). So given this, how to explain the 94.8% turnout of 55-64 year olds? To be sure this group does reliably turn out to vote, but did essentially all of them (remembering that the figures in the above table are too low) really vote in the referendum? This seems less than credible.

The turnout for 55-64 year olds in the 2015 General Election has been estimated at 77%, based on an overall turnout of 66.1% (web-site UK Political Info; once more these figures will have been created based on techniques similar to the ones I am using here). If we assume a uniform uplift across age ranges (that “assume” word again!) then one might deduce that an increase in overall turnout from 66.1% to 72.2%, might lead to the turnout in the 55-64 age bracket increasing from 77% to 84%. 84% turnout is still very high, but it is at least feasible; close to 100% turnout in from this age group seems beyond the realms of likelihood.

So what has gone wrong? Well so far the only culprit I can think of is the distribution of voting by age group in the Ashcroft poll. To be clear here, I’m not accusing Lord Ashcroft and his team of sloppy work. Instead I’m calling out that the way that I have extrapolated their figures may not be sustainable. Indeed, if my extrapolation is valid, this would imply that the Ashcroft model over estimated the proportion of 55-64 year olds voting. Thus it must have underestimated the proportion of voters in some other age group. Putting aside the likely fact that I have probably used their figures in an unintended manner, could it be that the much-maligned turnout of younger people has been misrepresented?

To test the validity of this hypothesis, I turned to a later poll by Omnium. To be sure this was based on a sample size of around 2,000 as opposed to Ashcroft’s 12,000, but it does paint a significantly different picture. Their distribution of voter turnout by age group was as follows:

Ages (years)	Turnout %
18-24	64
25-39	65
40-54	66
55-64	74
65+	90

I have to say that the Omnium age groups are a bit idiosyncratic, so I have taken advantage of the fact that the figures for 25-54 are essentially the same to create a schedule that matches the Ashcroft groups as follows:

Ages (years)	Turnout %
18-24	64
25-34	65
35-44	65
45-54	65
55-64	74
65+	90

The Omnium model suggests that younger voters may have turned out in greater numbers than might be thought based on the Ashcroft data. In turn this would suggest that a much greater percentage of 18-24 year olds turned out for the Referendum (64%) than for the last General Election (43%); contrast this with an estimated 18-24 turnout figure of 47% based on the just increase in turnout between the General Election and the Referendum. The Omnium estimates do still however recognise that turnout was still greater in the 55+ brackets, which supports the pattern seen in other elections.

Humpty Dumpty

While it may well be that the Leave / Remain splits based on the Ashcroft figures are reasonable, I’m less convinced that extrapolating these same figures to make claims about actual voting numbers by age group (as I have done) is tenable. Perhaps it would be better to view each age cohort as a mini sample to be treated independently. Based on the analysis above, I doubt that the turnout figures I have extrapolated from the Ashcroft breakdown by age group are robust. However, that is not the same as saying that the Ashcroft data is flawed, or that the Omnium figures are correct. Indeed the Omnium data (at least those elements published on their web-site) don’t include an analysis of whether the people in their sample voted Leave or Remain, so direct comparison is not going to be possible. Performing calculation gymnastics such as using the Omnium turnout for each age group in combination with the Ashcroft voting splits for Leave and Remain for the same age groups actually leads to a rather different Referendum result, so I’m not going to plunge further down this particular rabbit hole.

In summary, my supposedly simple trip to the destitution of an enhanced Brexit Infographic has proved unexpectedly arduous, winding and beset by troubles. These challenges have proved so great that I’ve abandoned the journey and will be instead heading for home.

Which dreamed it?

Based on my work so far, I have severe doubts about the accuracy of some of the age-based exhibits I have published (versions of which have also appeared on many web-sites, the BBC to offer just one example, scroll down to “How different age groups voted” and note that the percentages cited reconcile to mine). I believe that my logic and calculations are sound, but it seems that I am making too many assumptions about how I can leverage the Ashcroft data. After posting this article, I will accordingly go back and annotate each of my previous posts and link them to these later findings.

I think the broader lesson to be learnt is that estimates are just that, attempts (normally well-intentioned of course) to come up with figures where the actual numbers are not accessible. Sometimes this is a very useful – indeed indispensable – approach, sometimes it is less helpful. In either case estimation should always be approached with caution and the findings ideally sense-checked in the way that I have tried to do above.

Occam’s razor would suggest that when the stats tell you something that seems incredible, then 99 times out of 100 there is an error or inaccurate assumption buried somewhere in the model. This applies when you are creating the model yourself and doubly so where you are relying upon figures calculated by other people. In the latter case not only is there the risk of their figures being inaccurate, there is the incremental risk that you interpret them wrongly, or stretch their broader application to breaking point. I was probably guilty of one or more of the above sins in my earlier articles. I’d like my probable misstep to serve as a warning to other people when they too look to leverage statistics in new ways.

A further point is the most advanced concepts I have applied in my calculations above are addition, subtraction, multiplication and division. If these basic operations – even in the hands of someone like me who is relatively familiar with them – can lead to the issues described above, just imagine what could result from the more complex mathematical techniques (e.g. ambition, distraction, uglification and derision) used by even entry-level data scientists. This perhaps suggests an apt aphorism: Caveat calculator!

Beware the Jabberwock, my son! // The jaws that bite, the claws that catch! // Beware the Jubjub bird, and shun // The frumious Bandersnatch!

Follow @peterjthomas

Themes from a Chief Data Officer Forum – the 180 day perspective

20 May 20164 Feb 2017 Peter James Thomas big data, business, change management, chief data officer, cultural transformation, data governance, data management, education, strategy dama, IRM UK

The author would like to acknowledge the input and assistance of his fellow delegates, both initially at the IRM(UK) CDO Executive Forum itself and later in reviewing earlier drafts of this article. As ever, responsibility for any errors or omissions remains mine alone.

Introduction

Time flies as Virgil observed some 2,045 years ago. A rather shorter six months back I attended the inaugural IRM(UK) Chief Data Officer Executive Forum and recently I returned for the second of what looks like becoming biannual meetings. Last time the umbrella event was the IRM(UK) Enterprise Data and Business Intelligence Conference 2015 ^[1], this session was part of the companion conference: IRM(UK) Master Data Management Summit / and Data Governance Conference 2016.

This article looks to highlight some of the areas that were covered in the forum, but does not attempt to be exhaustive, instead offering an impressionistic view of the meeting. One reason for this (as well as the author’s temperament) is that – as previously – in order to allow free exchange of ideas, the details of the meeting are intended to stay within the confines of the room.

Last November, ten themes emerged from the discussions and I attempted to capture these over two articles. The headlines appear in the box below:

Themes from the previous Forum:

One area of interest for me was how things had moved on in the intervening months and I’ll look to comment on this later.

By way of background, some of the attendees were shared with the November 2015 meeting, but there was also a smattering of new faces, including the moderator, Peter Campbell, President of DAMA’s Belgium and Luxembourg chapter. Sectors represented included: Distribution, Extractives, Financial Services, and Governmental.

The discussions were wide ranging and perhaps less structured than in November’s meeting, maybe a facet of the familiarity established between some delegates at the previous session. However, there were four broad topics which the attendees spent time on: Management of Change (Theme 5); Data Privacy / Trust; Innovation; and Value / Business Outcomes.

While clearly the second item on this list has its genesis in the European Commission’s recently adopted General Data Protection Regulation (GDPR ^[2]), it is interesting to note that the other topics suggest that some elements of the CDO agenda appear to have shifted in the last six months. At the time of the last meeting, much of what the group talked about was foundational or even theoretical. This time round there was both more of a practical slant to the conversation, “how do we get things done?” and a focus on the future, “how do we innovate in this space?”

Perhaps this also reflects that while CDO 1.0s focussed on remedying issues with data landscapes and thus had a strong risk mitigation flavour to their work, CDO 2.0s are starting to look more at value-add and delivering insight (Theme 6). Of course some organisations are yet to embark on any sort of data-related journey (CDO 0.0 maybe), but in the more enlightened ones at least, the CDO’s focus is maybe changing, or has already changed (Theme 3).

Some flavour of the discussions around each of the above topics is provided below, but as mentioned above, these observations are both brief and impressionistic:

Management of Change

The title of Managing Change has been chosen (by the author) to avoid any connotations of Change Management. It was recognised by the group that there are two related issues here. The first is the organisational and behavioural change needed to both ensure that data is fit-for-purpose and that people embrace a more numerical approach to decision-making; perhaps this area is better described as Cultural Transformation. The second is the fact (also alluded to at the previous forum) that Change Programmes tend to have the effect of degrading data assets over time, especially where monetary or time factors lead data-centric aspects of project to be de-scoped.

On Cultural Transformation, amongst a number of issues discussed, the need to answer the question “What’s in it for me?” stood out. This encapsulates the human aspect of driving change, the need to engage with stakeholders ^[3] (at all levels) and the importance of sound communication of what is being done in the data space and – more importantly – why. These are questions to which an entire sub-section of this blog is devoted.

On the potentially deleterious impact of Change ^[4] on data landscapes, it was noted that whatever CDOs build, be these technological artefacts or data-centric processes, they must be designed to be resilient in the face of both change and Change.

Data Privacy / Trust

As referenced above, the genesis of this topic was GDPR. However, it was interesting that the debate extended from this admittedly important area into more positive territory. This related to the observation that the care with which an organisation treats its customers’ or business partners’ data (and the level of trust which this generates) can potentially become a differentiator or even a source of competitive advantage. It is good to report an essentially regulatory requirement possibly morphing into a more value-added set of activities.

Innovation

It might be expected that discussions around this topic would focus on perennials such as Big Data or Advanced Analytics. Instead the conversation was around other areas, such as distributed / virtualised data and the potential impact of Block Chain technology ^[5] on Data Management work. Inevitably The Internet of Things ^[6] also featured, together with the ethical issues that this can raise. Other areas discussed were as diverse as the gamification of Data Governance and Social Physics, so we cast the net widely.

Value / Business Outcomes

Here we have the strongest link back into the original ten themes (specifically Theme 6). Of course the acme of data strategies is of little use if it does not deliver positive business outcomes. In many organisations, focus on just remediating issues with the current data landscape could consume a massive chunk of overall Change / IT expenditure. This is because data issues generally emanate from a wide variety of often linked and frequently long-standing organisational weaknesses. These can be architectural, integrational, procedural, operational or educational in nature. One of the challenges for CDOs everywhere is how to parcel up their work in a way that adds value, gets things done and is accretive to both the overall Business and Data strategies (which are of course intimately linked as per Theme 10). There is also the need to balance foundational work with more tactical efforts; the former is necessary for lasting benefits to be secured, but the latter can showcase the value of Data Management and thus support further focus on the area.

While the risk aspect of data issues gets a foot in the door of the Executive Suite, it is only by demonstrating commercial awareness and linking Data Management work to increased business value that any CDO is ever going to get traction. (Theme 6).

The next IRM(UK) CDO Executive Forum will take place on 9th November 2016 in London – if you would like to apply for a place please e-mail jeremy.hall@irmuk.co.uk.

Notes

^[1]	I’ll be speaking at IRM(UK) ED&BI 2016 in November. Book early to avoid disappointment!
^[2]	Wikipedia offers a digestible summary of the regulation here. Anyone tempted to think this is either a parochial or arcane area is encouraged to calculate what the greater of €20 million and 4% of their organisation’s worldwide turnover might be and then to consider that the scope of the Regulation covers any company (regardless of its domicile) that processes the data of EU residents.
^[3]	I’ve been itching to use this classic example of stakeholder management for some time:
^[4]	The capital “c” is intentional.
^[5]	Harvard Business Review has an interesting and provocative article on the subject of Block Chain technology.
^[6]	GIYF

Follow @peterjthomas

Data Management as part of the Data to Action Journey

24 Dec 20156 Feb 2017 Peter James Thomas business intelligence, chief data officer, data governance, data management, data quality, infographics peter aiken

Data Information Insight Action (w700)

| Larger Version | Detailed and Annotated Version (as PDF) |

This brief article is actually the summation of considerable thought and reflects many elements that I covered in my last two pieces (5 Themes from a Chief Data Officer Forum and 5 More Themes from a Chief Data Officer Forum), in particular both the triangle I used as my previous Data Management visualisation and Peter Aiken’s original version, which he kindly allowed me to reproduce on this site (see here for more information about Peter).

What I began to think about was that both of these earlier exhibits (and indeed many that I have seen pertaining to Data Management and Data Governance) suggest that the discipline forms a solid foundation upon which other areas are built. While there is a lot of truth in this view, I have come round to thinking that Data Management may alternatively be thought of as actively taking part in a more dynamic process; specifically the same iterative journey from Data to Information to Insight to Action and back to Data again that I have referenced here several times before. I have looked to combine both the static, foundational elements of Data Management and the dynamic, process-centric ones in the diagram presented at the top of this article; a more detailed and annotated version of which is available to download as a PDF via the link above.

I have also introduced the alternative path from Data to Insight; the one that passes through Statistical Analysis. Data Management is equally critical to the success of this type of approach. I believe that the schematic suggests some of the fluidity that is a major part of effective Data Management in my experience. I also hope that the exhibit supports my assertion that Data Management is not an end in itself, but instead needs to be considered in terms of the outputs that it helps to generate. Pristine data is of little use to an organisation if it is not then exploited to form insights and drive actions. As ever, this need to drive action necessitates a focus on cultural transformation, an area that is covered in many other parts of this site.

This diagram also calls to mind the subject of where and how the roles of Chief Analytics Officer and Chief Data Officer intersect and whether indeed these should be separate roles at all. These are questions to which – as promised on several previous occasions – I will return to in future articles. For now, maybe my schematic can give some data and information practitioners a different way to view their craft and the contributions that it can make to organisational success.

Follow @peterjthomas

5 More Themes from a Chief Data Officer Forum

17 Nov 20154 Feb 2017 Peter James Thomas big data, business analytics, business intelligence, chief data officer, data governance, data management, data science big data, dama, IRM UK, peter aiken

A rather famous theme

This article is the second of two pieces reflecting on the emerging role of the Chief Data Officer. Each article covers 5 themes. You can read the first five themes here.

As with the first article, I would like to thank both Peter Aiken, who reviewed a first draft of this piece and provided useful clarifications and additional insights, and several of my fellow delegates, who also made helpful suggestions around the text. Again any errors of course remain my responsibility.

Introduction Redux

After reviewing a draft of the first article in this series and also scanning an outline of this piece, one of the other attendees at the inaugural IRM(UK) / DAMA CDO Executive Forum rightly highlighted that I had not really emphasised the strategic aspects of the CDO’s work; both data / information strategy and the close linkage to business strategy. I think the reason for this is that I spend so much of my time on strategic work that I’ve internalised the area. However, I’ve come to the not unreasonable conclusion that internalisation doesn’t work so well on a blog, so I will call out this area up-front (as well as touching on it again in Theme 10 below).

For more of my views on strategy formation in the data / information space please see my trilogy of articles starting with: Forming an Information Strategy: Part I – General Strategy.

With that said, I’ll pick up where we left off with the themes that arose in the meeting:

Theme 6 – While some CDO roles have their genesis in risk mitigation, most are focussed on growth

Epidermal growth factor receptor

This theme gets to the CDO / CAO debate (which I will be writing about soon). It is true that the often poor state of data governance in organisations is one reason why the CDO role has emerged and also that a lot of CDO focus is inevitably on this area. The regulatory hurdles faced by many industries (e.g. Solvency II in my current area of Insurance) also bring a significant focus on compliance to the CDO role. However, in the unanimous view of the delegates, while cleaning the Augean Stables is important and equally organisations which fail to comply with regulatory requirements tend to have poor prospects, most CDOs have a growth-focussed agenda. Their primary objective is to leverage data (or to facilitate its leverage) to drive growth and open up new opportunities. Of course good data management is a prerequisite for achieving this objective in a sustainable manner, but it is not an end in itself. Any CDO who allows themself to be overwhelmed by what should just be part of their role is probably heading in the same direction as a non-compliant company.

Theme 7 – New paradigms are data / analytics-centric not application-centric

Applications & Data

Historically, technology landscapes used to be application-centric. Often there would be a cluster of systems in the centre (ideally integrated with each other in some way) and each with their own analytics capabilities; a CRM system with customer analytics “out-of-the-box” (whatever that really means in practice), an ERP system with finance analytics and maybe supply-chain analytics, digital estates with web analytics and so on. Even if there was a single-central system (those of us old enough will still remember the ERP vision), then this would tend to have various analytical repositories around it used by different parts of the organisation for different purposes. Equally some of the enterprise data warehouses I have built have included specialist analytical repositories, e.g. to support pricing, or risk, or other areas.

Today a new paradigm is emerging. Under this, rather than being at the periphery, data and analytics are in the centre, operating in a more joined-up manner. Many companies have already banked the automation and standardisation benefits of technology and are now looking instead to exploit the (often considerably larger) information and insight benefits ^[1]. This places information and insight assets at the centre of the landscape. It also means that finally information needs can start to drive system design and selection, not the other way round.

Theme 8 – Data and Information need to be managed together

Data and Information in harness

We see a further parallel with the CAO vs CDO debate here ^[2]. After 27 years with at least one foot in IT (though often in hybrid roles with dual business / IT reporting) and 15 explicitly in the data and information space, I really fail to see how data and information are anything other than two sides of the same coin.

To people who say that the CAO is the one who really understands the business and the CDO worries instead about back-end data governance, I would reply that an engine is only as good as the fuel that you put into it. I’d over-extend the analogy (as is my wont ^[3]) by saying that the best engineers will have a thorough understanding of:

what purpose the engine will be applied to – racing car, or lorry (truck)
the parameters within which it is required to perform
the actual performance requirements
what that means in terms of designing the engine
what inputs the engine will have: petrol/diesel/bio-fuel/electricity
what outputs it will produce (with no reference to poor old Volkswagen intended)

It may be that the engineering team has experts in various areas from metallurgy, to electronics, to chemistry, to machining, to quality control, to noise and vibration suppression, to safety, to general materials science and that these are required to work together. But whoever is in charge of overall design, and indeed overall production, would need to have knowledge spanning all these areas and would in addition need to ensure that specialists under their supervision worked harmoniously together to get the best result.

Data is the basic building block of information. Information is the embodiment of things that people want or need to know. You cannot generate information (let alone insight) without a very strong understanding of data. You can neither govern, nor exploit, data in any useful way without knowledge of the uses to which it will be put. Like the chief product engineer, there is a need for someone who understands all of the elements, all of the experts working on these and can bring them together just as harmoniously ^[4]).

Theme 9 – Data Science is not enough

If you don't understand the notation, you've failed in your application to be a Data Scientist

In Part One of this article I repeated an assertion about the typical productivity of data scientists:

“Data Scientists are only 10-20% productive; if you start a week-long piece of work on Monday, the actual statistical analysis will commence on Friday afternoon; the rest of the time is battling with the data”

While the many data scientists I know would attest to the truth of this, there is a broader point to be made. That is the need for what can be described as Data Interpreters. This role is complementary to the data science community, acting as an interface between those with PhDs in statistics and the rest of the world. At IRM(UK) ED&BI one speaker even went so far as to present a photo graph of two ladies who filled these ying and yang roles at a European organisation.

More broadly, the advent of data science, while welcome, has not obviated the need to pass from data through information to get to insight for most of an organisation’s normal measurements. Of course an ability to go straight from data to insight is also a valuable tool, but it is not suitable for all situations. There are also a number of things to be aware of before uncritically placing full reliance on statistical models ^[5].

Theme 10 – Information is often a missing link between Business and IT strategies

Business => Information => IT

This was one of the most interesting topics of discussion at the forum and we devoted substantial time to exploring issues and opportunities in this area. The general sense was that – as all agreed – IT strategy needs to be aligned with business strategy ^[6]. However, there was also agreement that this can be hard and in many ways is getting harder. With IT leaders nowadays often consumed by the need to stay abreast of both technology opportunities (e.g. cloud computing) and technology threats (e.g. cyber crime) as well as inevitably having both extensive business as usual responsibilities and significant technology transformation programmes to run, it could be argued that some IT departments are drifting away from their business partners; not through any desire to do so, but just because of the nature (and volume) of current work. Equally with the increasing pace of business change, few non-IT executives can spend as much time understanding the role of technology as was once perhaps the case.

Given that successful information work must have a foot in both the business and technology camps (“what do we want to do with our data?” and “what data do we have available to work with?” being just two pertinent questions), the argument here was that an information strategy can help to build a bridge these two increasingly different worlds. Of course this chimes with the feedback on the primacy of strategy that I got on my earlier article from another delegate; and which I reference at the beginning of this piece. It also is consistent with my own view that the data → information → insight → action journey is becoming an increasingly business-focused one.

A couple of CDO Forum delegates had already been thinking about this area and went so far as to present models pertaining to a potential linkage, which they had either created or adapted from academic journals. These placed information between business and IT pillars not just with respect to strategy but also architecture and implementation. This is a very interesting area and one which I hope to return to in coming weeks.

Concluding thoughts

As I mentioned in Part One, the CDO Forum was an extremely useful and thought-provoking event. One thing which was of note is that – despite the delegates coming from many different backgrounds, something which one might assume would be a barrier to effective communication – they shared a common language, many values and comparable views on how to take the areas of data management and data exploitation forward. While of course delegates at an such an eponymous Forum might be expected to emphasise the importance of their position, it was illuminating to learn just how seriously a variety organisations were taking the CDO role and that CDOs were increasingly becoming agents of growth rather than just risk and compliance tsars.

Amongst the many other themes captured in this piece and its predecessor, perhaps a stand-out was how many organisations view the CDO as a firmly commercial / strategic role. This can only be a positive development and my hope is that CDOs can begin to help organisations to better understand the asset that their data represents and then start the process of leveraging this to unlock its substantial, but often latent, business value.

Notes

^[1]	See Measuring the benefits of Business Intelligence
^[2]	Someone really ought to write an article about that! UPDATE: They now have in: The Chief Data Officer “Sweet Spot” and Alphabet Soup
^[3]	See Analogies for some further examples as well as some of the pitfalls inherent in such an approach.
^[4]	I cover this duality in many places in this blog, for the reader who would like to learn more about my perspectives on the area, A bad workman blames his [Business Intelligence] tools is probably a good place to start; this links to various other resources on this site.
^[5]	I cover some of these here, including (in reverse chronological order): An Inconvenient Truth Patterns patterns everywhere – The Sequel – c/o xkcd.com Patterns patterns everywhere
^[6]	I tend to be allergic to the IT / Business schism as per: Business is from Mars and IT is from Venus (incidentally the first substantive article on I wrote for this site), but at least it serves some purpose in this discussion, rather than leading to unproductive “them and us” syndrome, that is sadly all to often the outcome.