The CDO – A Dilemma or The Next Big Thing?

28 Feb 201824 Feb 2018 Peter James Thomas chief data officer, data governance, data management forbes, infogix, lauren delisa coleman, new vantage partners, randy bean

It wasn’t so long ago that I last wrote about Forbes’s perspective on the data arena ^[1]. In this piece, I am going to compare and contrast two more recent Forbes articles. The first is 3 Reasons Why The Chief Data Officer Will Become The Next Big Thing by Lauren deLisa Coleman (@ultra_Lauren). The second is The Chief Data Officer Dilemma by Randy Bean (@RandyBeanNVP) ^[2].

While the contents of the two articles differ substantially – the first is positive about the future of the role, the second highlights some of its current challenges – there are interesting points made in each of them. In the midst of confusion about what a Chief Data Officer (CDO) is and what they do, it is perhaps not surprising that fundamentally different takes on the area can both contain seeds of truth.

In the first piece, deLisa Coleman refers to the twin drivers of meeting increasingly stringent regulatory demands ^[3] and leveraging data to drive enhanced business outcomes; noting that:

Expertise and full dedication is needed particularly since data is threaded into nearly all facets of today’s businesses ^[4].

She states that appointing a CDO is the canonical response of Executive teams, while noting that there is not full consensus on all facets of this role. In covering the title’s “three reasons” why organisations need CDOs, deLisa Coleman references a survey by Infogix ^[5]. This highlights the increasing importance of each of the following areas: Metadata, Data Governance and the Internet of Things.

Expanding on these themes, deLisa Coleman adds:

Those who seize success within these new parameters will be companies that not only adapt most quickly but those that can also best leverage their company’s data in a strategic manner in innovative ways while continuing to gathering massive amounts under flawless methods of protection.

So far, so upbeat. To introduce a note of caution, I am aware that, in the last few years – and no doubt in part driven by articles in Forbes, Harvard Business Review and their ilk – most companies have set forth a vision for becoming a “data-driven organisation” ^[6]. However, the number that have actually achieved this objective – or even taken significant steps towards it – is of course much smaller. The central reason for this is that it is not easy to become a “data-driven organisation”. As with most difficult things, reaching this goal requires hard-work, focus, perseverance and, it has to be said, innate aptitude. Some experience of what is involved is of course also invaluable and, even in 2018, this is a rare commodity.

A sub-issue within this over-arching problem is miracle-worker syndrome; we’ll hire a great CDO and then we don’t need to worry about data any more ^[7]. Of course becoming a “data-driven organisation” requires the whole organisation to change. A good CDO will articulate the need for change, generate enthusiasm for moving forward and and coordinate the necessary metamorphosis. What they cannot do however is enact such a fundamental change without the active commitment of all tiers of the organisation.

Of course this is where the second article becomes pertinent. Bean starts by noting the increasing prevalence of the CDO. He cites an annual study by his consultancy ^[8] which surveys Fortune 1000 companies. In 2012, this found that only 12% of the companies surveyed had appointed a CDO. By 2018, the figure has risen to over 63%, a notable trend ^[9].

However, he goes on to say that:

In spite of the common recognition of the need for a Chief Data Officer, there appears to be a profound lack of consensus on the nature of the role and responsibilities, mandate, and background that qualifies an executive to operate as a successful CDO. Further, because few organizations — 13.5% — have assigned revenue responsibility to their Chief Data Officers, for most firms the CDO role functions primarily as an influencer, not a revenue generator.

This divergence of opinion on CDO responsibilities, mandate, and importance of the role underscores why the Chief Data Officer may be the toughest job in the executive c-suite within many organizations, and why the position has become a hot seat with high turnover in a number of firms.

In my experience, while deLisa Coleman’s sunnier interpretation of the CDO environment both holds some truth and points to the future, Bean’s more gritty perspective is closer to the conditions currently experienced by many CDOs. This is reinforced by a later passage:

While 39.4% of survey respondents identify the Chief Data Officer as the executive with primary responsibility for data strategy and results within their firm, a majority of survey respondents – 60.6% — identify other C-Executives as the point person, or claim no single point of accountability. This is remarkable and highly significant, for it highlights the challenges that CDO’s face within many organizations.

Bean explains that some of this is natural, making a similar point to the one I advance above: the journey towards being “data-driven” is not a simple one and parts of organisations may both not want to take the trip and even dissuade colleagues from doing so. Passive or active resistance are things that all major transformations need to deal with. He adds that lack of clarity about the CDO role, especially around the involved / accountable question as it relates to strategy, planning and execution is a complicating factor.

Some particularly noteworthy points arose when the survey asked about the background and skills of a CDO. Findings included:

While 34% of executives believe the ideal CDO should be an external change agent (outsider) who brings fresh perspectives, an almost equivalent 32.1% of executives believe the ideal CDO should be an internal company veteran (insider) who understands the culture and history of the firm and knows how to get things done within that organization.

22.6% of executives […] indicated that the CDO must be either a data scientist or a technologist who is highly conversant with data. An additional 11.3% responded that a successful CDO must be a line-of-business executive who has been accountable for financial results.

The above may begin to sound somewhat familiar to some readers. It perhaps brings to mind the following figure ^[10]:

As I pointed out last year in A truth universally acknowledged… organisations sometimes take a kitchen sink approach to experience and expertise, a lengthy list of requirements that will never been found in one person. From the above survey, it seems that this approach probably reflects the thinking of different executives.

I endorse one of Bean’s final points:

The lack of consensus on the Chief Data Officer role aptly mirrors the diversity of opinion on the value and importance of data as an enterprise asset and how it should be managed.

Back in my more technologically flavoured youth, I used to say that organisations get the IT that they deserve. The survey findings suggest that the same aphorism can be applied to both CDOs and the data landscapes that they are meant to oversee.

So two contrasting pieces from the same site. The first paints what I believe is an accurate picture of the importance of the CDO role in fulfilling corporate objectives. The second highlights some of the challenges with the CDO role delivering on its promise. Each perspective is valid. I would recommend readers take a look at both articles and then blend some of the insights with their own opinions and ideas.

Acknowledgements

I would like to thank Lauren deLisa Coleman and Randy Bean for both reviewing this article and allowing me to quote their work. Their openness and helpfulness are very much appreciated.

Notes

^[1]	Draining the Swamp.
^[2]	Text is reproduced with the kind permission of the authors. Forbes has a limited free access policy for non-subscribers, this means that the number of articles you can view is restricted.
^[3]	To which I would add both customer and business partner expectations about how their data is treated and used by organisations.
^[4]	Echoing points from my two 2015 articles: 5 Themes from a Chief Data Officer Forum and 5 More Themes from a Chief Data Officer Forum, specifically: Theme 1 – Chief Data Officer is a full-time job Theme 6 – While some CDO roles have their genesis in risk mitigation, most are focussed on growth It’s gratifying to make predictions that end up coming to be.
^[5]	Infogix Identifies the Top Game Changing Data Trends for 2018.
^[6]	It would be much easier to list those who do not share this aspiration.
^[7]	Having been described as “the Messiah” in more than one organisation, I can empathise with the problems that this causes. Perhaps Moses – a normal man – leading his people out of the data dessert is a more apt Biblical metaphor, should you care for such things.
^[8]	New Vantage Partners.
^[9]	These are clearly figures for US companies and it is generally acknowledged that the US approach to data is more mature than elsewhere. In Europe, it may be that GDPR (plus, in my native UK, the dark clouds of Brexit) has tipped the compliance / leverage balance too much towards data introspection and away from revenue-generating data insights.
^[10]	This first version of this image appeared in 2016’s The Chief Data Officer “Sweet Spot”, with the latest version being published in 2017’s A Sweeter Spot for the CDO?.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

Follow @peterjthomas

Draining the Swamp

17 Jan 201818 Jan 2018 Peter James Thomas big data, data management dan woods, data marketplace, dave wells, eckerson group, forbes, paul barth, podium data

The title phrase of this article has entered the collective consciousness from political circles in recent months and years. Readers will be glad to hear that the political commentary content of this piece is precisely zero. Instead I am going to talk about Data Lakes, also referred to pejoratively by those who are not fans as Data Swamps.

Having started my relationship with Data matters back in the early days of Relational Databases and having driven corporate success through Data Warehouses and Business Intelligence, I have also done work in the Big Data arena since around 2013. A central concept in the Big Data paradigm is that of a Data Lake; a large Hadoop repository into which all data that an organisation might want to use is poured, often essentially as is. The thinking is that – in a Big Data implementation – storage is cheap ^[1] and you never fully know what data you might need in advance, so why not save it all?

It is probably fair to say that – much like many other major programmes of work over the years ^[2] – the creation of Data Lakes, or perhaps more accurately the leverage of their contents, has resulted in at best mixed results for the organisations that undertake such an endeavour. The thing with mixed results is that it is not all doom and gloom, some people are successful, others are not. The important thing is to determine what are the factors that lead to good and bad outcomes.

Well first of all, I would suggest that – like any other data programme – the formation of a Data Lake is subject to the types of potential issues that I review in my 2017 article, 20 Risks that Beset Data Programmes. Of these, Data Lakes are particularly susceptible to risk 16:

In the absence of [understanding key business decisions], the programme becoming a technology-driven one.

The business gets what IT or Change think that they need, not what is actually needed. There is more focus on shiny toys than on actionable information. The programme forgets the needs of its customers.

The issue here is that some people buy into the misconception that all you have to do is fill the Data Lake and sit back and wait for precious Data gems to flow from it. Understanding a business and its key decisions is tough and perhaps it is not surprising that people would like to skip this step and instead focus on easier activities. Sadly, this approach is not going to work for Data Lakes or anything else.

However Data Lakes also face some specific risks and in search of better understanding these, I turned to a recent Forbes article, Can Failed Data Lakes Succeed As Data Marketplaces? penned by Dan Woods (@danwoodsearly) ^[3]. Dan does not mince words in his introduction:

All over the world, data lake projects are foundering, not because they are not a step in the right direction, but because they are essentially uncompleted experiments.

he adds:

The main roadblock has been that once companies store their data in the data lake, they struggle to find a way to operationalize it. The data lake has never become a product like a data warehouse. Proof of concepts are tweaked to keep a desultory flow of signals going.

and finally states:

[…] for certain use cases, Hadoop and purpose-built data lake-like infrastructure are solving complex and high-value problems. But in most other businesses, the data lake got stuck at the proof of concept stage.

This chimes with my experience – the ability to synthesise and analyse vast troves of data is indispensable in addressing some business problems, but a sledge-hammer to crack a walnut for others. Data Lakes are no more universal panaceas than anything else we have invented to date. As always, the main issues are not technology, but good processes, consistent definitions, improved data quality and matching available data to real business questions.

In seeking salvation (Dan’s word) for Data Lakes, he sought the opinion of one of my LinkedIn contacts, Paul Barth (@BarthPS), CEO of Podium Data. Paul analyses the root causes of Data Lake issues, splitting these into three main ones ^[4]:

Polluted data lakes
Too many projects targeted at filling or exploiting the Data Lake kick off in parallel. This leads to an incoherent landscape and inaccessible / difficult to understand data.
Bottlenecked data lakes
Essentially treating the Data Lake as if it was a Data Warehouse where the technology is designed for different and less structured purposes. This leads to a quasi-warehouse that is less performant than actual warehouses.
Risky data lakes
Where there is a desire to quickly populate the Data Lake, not least to provide grist to the Data Science mill, appropriate controls on access to data can be neglected; particularly an issue where personally identifiable data is involved. This can lead to regulatory, legal and reputational peril.

Barth’s solution to these problems is the establishment of a Data Marketplace. This is a concept previously referenced on these pages in Predictions about Prediction, a review of consultancy Eckerson Group‘s views on Data and Analytics in 2017 ^[5]. Back then, Eckerson Group had the following to say about the area:

[An Enterprise Data Marketplace (EDM) is] an Amazon-like data marketplace where analysts can seek datasets, see reviews of others, and select the best-fit datasets for their needs helps to encourage dataset reuse, minimize redundancy, and prevent flawed analysis that results from working with less than ideal data. Data cataloging tools, data curation practices, data preparation technologies, and data services will be combined to create a marketplace for data seekers. Enterprise Data Marketplaces return us to the single-source vision that was once touted as the real benefit of Enterprise Data Warehouses.

So, as illustrated above, a Data Marketplace is essentially a collection of tagged data sets, which have in some cases been treated to increase consistency and utility, combined with information about their contents and usages. These are overlaid by what is essentially a “social media” layer where “shoppers” can search for data and provide feedback on its utility (e.g. a rating mechanism) and also add their own documentation. This means that useful data sets get highly rated and have more explanatory material attached to them.

Eckerson Group build on this concept in their white paper The Rise of the Data Marketplace (opens a PDF document), work commissioned in part by Podium Data. In this Eckerson’s Dave Wells (@_DaveWells_) characterises an Enterprise Data Marketplace as having the following attributes ^[6]:

Categorization organises the marketplace to simplify browsing. For example a shopper seeking budget data doesn’t need to browse through unrelated data sets about customers, employees or other data subjects. Categories complement tagging and smart search algorithms, offering a variety of ways to find data sets.

Curation is active management of the data sets that are available in the EDM. Curation selects and qualifies data sets, describes each data set, and collects and manages metadata about the collection and each individual data set.

Cataloging exposes data sets for data shoppers, including descriptions and metadata. The catalog is a view into the inventory of curated data sets. Rich metadata and powereful search are important catalog features.

Crowdsourcing is the equivalent of a social network for data. Data shoppers actively participate in catloging, curating and categorizing data. This virtuous cycle (a chain of events that reinforces outcomes through a feedback loop) continuously improves the quality and value of data in the marketplace.

Back in the Forbes article, Barth focuses on using the Data Marketplace’s interactive elements to identify the most valuable data (that which is searched for most frequently and has the best shopper rating). This data can then be the subject of focussed investment. Such investment is of the sort familiar in Data Warehouse activities, but it is directed by shoppers’ “social media” preferences rather than more formal requirements gathering exercises.

Dan Woods makes the pertinent observation that:

So, as the challenge now is not one of technology, but of setting a vision, companies have to decide how to incorporate a new set of requirements to get the most out of their data. […] Even within one company, there may be the need for multiple requirements to be met. Marketing may not need the precision that the accounting department requires. Groups with regulatory mandates may have strong compliance requirements that drive the need for data that is 100% accurate, while those doing exploration for product development purposes may prefer to have larger datasets to work with, and 90% accuracy is all that they require. The data lake must be able to employ multiple approaches as needed by different applications and groups of users.

His article finishes with the following clarion call to implement the Data Marketplace vision:

Companies achieve data transparency with data warehouses because of the use of canonical data models. Yet data in data warehouses was trapped in slow processes that lacked agility. The data warehouse data was well understood but couldn’t evolve at the speed of business. The data lake wasn’t able to correct this problem because companies didn’t implement lakes with a sufficiently comprehensive vision. That’s what they need to do now.

While when I hear about Data Warehouses that take months to change, poor design and a lack of automation both come to mind, it is unarguable that some Data Warehouses can be plagued by long turn-around times ^[7]. Equally I have seen enough Data Lakes turn into Grimpen Mire to perceive that there are some major issues inherent in an unmodified approach to this area ^[8]. The Data Marketplace idea is an intriguing one, a mash-up ^[9] of different approaches that may just yield some tangible results.

I also think that the inherent focus on users’ needs as opposed to technological considerations is the right way to go. I have been making this point for many years now ^[10] and have full confidence that I will still be doing so in ten years’ time. As with most aspects of life, it is with people, and how a programme interacts with them, that success and failure factors are most readily found. It seems to me that the Data Marketplace approach seeks to embrace this verity, which can only be a point in its favour.

Acknowledgements

I would like to thank each of Forbes / Dan Woods, Podium Data / Paul Barth and Eckerson Group / Dave Wells for both reviewing this article and allowing me to quote their work. Such generous behaviour is not as typical as one might like to think and always merits recognition.

Notes

^[1]	Though the total cost of saving such data extends beyond just disk costs and can become significant.
^[2]	See my earlier article Ever tried? Ever failed? for a treatment of what is clearly a fundamental physical constant – that 60- 70% of all types of major programmes don’t fully achieve their objectives (aka fail). Data Lakes appear to also be governed by this Law of Nature.
^[3]	You may need to navigate past a Forbes banner screen before you can access the actual article.
^[4]	The following is my take in Paul’s analysis, for his actual words, see the Forbes article.
^[5]	Watch this space for a review of Eckerson Group’s predictions for 2018.
^[6]	Which I reproduce with permission.
^[7]	By way of contrast, warehouses that my teams have built have been able to digest acquisitions and meet new and onerous regulatory requirements in a matter of weeks, not months.
^[8]	I should stress here a difference between Data Lakes, which seek to be all-embracing, and more focussed Big Data activities, e.g. the building of complex seismological or meteorological models to assess catastrophic insurance risk (see Hurricanes and Data Visualisation: Part II – Map Reading). I have helped the latter to be very successful myself and seen good results in other organisations.
^[9]	Do people still say “mash-up”?
^[10]	For example in my 2008 trilogy: Marketing Change Education and cultural transformation Sustaining Cultural Change

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

Follow @peterjthomas

25 Indispensable Business Terms

1 Apr 20173 Sep 2017 Peter James Thomas business April Fools, buffy the vampire slayer, business buzzword bingo, forbes

The first episode of Buffy the Vampire Slayer aired on 10^th March 1997. To commemorate its 20^th anniversary – and of course to celebrate 1^st April 2017 – peterjamesthomas.com is pleased to present this comprehensive – and wholly indispensable – illustrated list of business terminology, Slayer-style:

1. Stakeholder ^[1]

“The Freshman” – Season 4, Episode 1.

2. Stakeholder Management ^[2]

Promotional shot.

3. Stakeholder Engagement

“Something Blue” – Season 4, Episode 9.

4. Cross-functional

“Consequences” – Season 3, Episode 15.

5. Cross-functional Team

“The Wish” – Season 3, Episode 9.

6. Institutionalise

“Normal Again” – Season 6, Episode 17.

7. Presentation Deck

“First Date” – Season 7, Episode 14.

8. Playing Hardball

“The Gift” – Season 5, Episode 22.

9. Key Player

“Buffy vs Dracula” – Season 5, Episode 1.

10. Platform

“The Gift” – Season 5, Episode 22.

11. Think Outside the Box

“Bargaining, Part 1” – Season 6, Episode 1.

12. Vision Statement

“Restless” – Season 4, Episode 22.

13. Machine Learning

“Intervention” – Season 5, Episode 18.

14. Doing the Heavy Lifting

“The Gift” – Season 5, Episode 22.

15. Town Hall Meeting

“Band Candy” – Season 3, Episode 6.

16. Empower

“Chosen” – Season 7, Episode 22.

17. Drinking the Kool Aid

“Destiny” – Angel, Season 5, Episode 8 ^[3].

18. Bleeding Edge

“This Year’s Girl” – Season 4, Episode 15.

19. Best Practice

“Once More with Feeling” – Season 6, Episode 7.

20. Animal Spirits

“The Pack” – Season 1, Episode 6.

21. Change Management

“Checkpoint” – Season 5, Episode 12.

22. Face Time

“Who Are You?” – Season 4, Episode 16.

23. Scalable

“The Gift” – Season 5, Episode 22.

24. Take Offline

“I, Robot… You, Jane” – Season 1, Episode 8.

25. Wow Factor

“Tabula Rasa” – Season 6, Episode 8.

With inspiration drawn from Business Buzzword Bingo and Forbes Most Annoying Business Jargon, as well as the author’s own experience. With love (and of course enormous apologies) to everyone who worked at, or for, Mutant Enemy.

Except for the banner photo, which is © Entertainment Weekly / Time Inc., all images most likely © Warner Brothers Entertainment, but sourced from all over.

Notes

^[1]	I must admit that it was the word “stakeholder” which first planted the seed that grew into this article. Whenever I hear the rather vapid term in a business context, an image of Sarah Michelle Gellar wafts unbidden into my consciousness; which is no doubt what the person using the term intended all along of course.
^[2]	This image already crept into the notes section of Themes from a Chief Data Officer Forum – the 180 day perspective.
^[3]	OK, I may have cheated here, though it could be argued that Angel also “started” on 10^th March 1997 rather than 5^th October 1999. I’d welcome any suggestions for a more BTVS-themed image.

Follow @peterjthomas

Ed Sperling highlights the importance of the CIO understanding the business

16 Feb 20093 Nov 2014 Peter James Thomas business, it business alignment, technology business strategy, ed sperling, forbes, information technology, it business alignment, it management, it strategy

I was interested to read an article by Ed Sperling at Forbes.com. In this Ed states that:

In order to understand the flow of information, CIOs need to be intimately familiar with the direction of the business. This way, they can automate pieces of that business where it will do the most good. That can’t be done without a good understanding of how information moves through an organization, and the movement of information can’t be fully understood without understanding the business units.

It will come as no surprise to anyone who has read my earlier article about spurious distinctions between business and IT (Business is from Mars and IT is from Venus) that I strongly endorse this sentiment. Maybe the fact that mainstream commentators are talking about IT in business terms is indicative of IT beginning to come of age.

Ed Sperling is editor in chief of System-Level Design; and a contributing writer at Forbes.com.