An expanded and more mobile-friendly version of the Data & Analytics Dictionary

4 August 20204 August 2020 Peter James Thomas big data, business intelligence, chief data officer, data governance, data quality, data science, data visualisation, data warehousing, machine learning, management information

A revised and expanded version of the peterjamesthomas.com Data and Analytics Dictionary has been published.

The previous Dictionary was not the easiest to read on mobile devices. Because of this, the layout has been amended in this release and the mobile experience should now be greatly enhanced. Any feedback on usability would be welcome.

The new Dictionary includes 22 additional definitions, bringing the total number of entries to 220, totalling well over twenty thousand words. As usual, the new definitions range across the data arena: from Data Science and Machine Learning; to Information and Reporting; to Data Governance and Controls. They are as follows:

Please remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

If you would like to contribute a definition, which will of course be acknowledged, you can use the comments section here, or the dedicated form, we look forward to hearing from you ^[1].

If you have found The Data & Analytics Dictionary helpful, we would love to learn more about this. Please post something in the comments section or contact us and we may even look to feature you in a future article.

The Data & Analytics Dictionary will continue to be expanded in coming months.

Notes

^[1]	Please note that any submissions will be subject to editorial review and are not guaranteed to be accepted.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

Follow @peterjthomas

The latest edition of The Data & Analytics Dictionary is now out

2 August 20192 August 2019 Peter James Thomas big data, business analytics, business intelligence, chief data officer, dashboards, data architecture, data governance, data management, data quality, data science, data visualisation, data warehousing, infographics, machine learning, management information

After a hiatus of a few months, the latest version of the peterjamesthomas.com Data and Analytics Dictionary is now available. It includes 30 new definitions, some of which have been contributed by people like Tenny Thomas Soman, George Firican, Scott Taylor and and Taru Väre. Thanks to all of these for their help.

Analysis
Application Programming Interface (API)
Business Glossary (contributor: Tenny Thomas Soman)
Chart (Graph)
Data Architecture – Definition (2)
Data Catalogue
Data Community
Data Domain (contributor: Taru Väre)
Data Enrichment
Data Federation
Data Function
Data Model
Data Operating Model
Data Scrubbing
Data Service
Data Sourcing
Decision Model
Embedded BI / Analytics
Genetic Algorithm
Geospatial Data
Infographic
Insight
Management Information (MI)
Master Data – additional definition (contributor: Scott Taylor)
Optimisation
Reference Data (contributor: George Firican)
Report
Robotic Process Automation
Statistics
Self-service (BI or Analytics)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

If you would like to contribute a definition, which will of course be acknowledged, you can use the comments section here, or the dedicated form, we look forward to hearing from you ^[1].

The Data & Analytics Dictionary will continue to be expanded in coming months.

Notes

^[1]	Please note that any submissions will be subject to editorial review and are not guaranteed to be accepted.

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

Follow @peterjthomas

A Retrospective of 2018’s Articles

10 April 20192 August 2019 Peter James Thomas big data, business analytics, change management, chief data officer, cultural transformation, data architecture, data science, data visualisation, in-depth, Mathematics, Physics, project management, Pure Mathematics

This is the second year in which I have produced a retrospective of my blogging activity. As in 2017, I have failed miserably in my original objective of posting this early in January. Despite starting to write this piece on 18^th December 2018, I have somehow sneaked into the second quarter before getting round to completing it. Maybe I will do better with 2019’s highlights!

Anyway, 2018 was a record-breaking year for peterjamesthomas.com. The site saw more traffic than in any other year since its inception; indeed hits were over a third higher than in any previous year. This increase was driven in part by the launch of my new Maths & Science section, articles from which claimed no fewer than 6 slots in the 2018 top 10 articles, when measured by hits ^[1]. Overall the total number of articles and new pages I published exceeded 2017’s figures to claim the second spot behind 2009; our first year in business.

As with every year, some of my work was viewed by tens of thousands of people, while other pieces received less attention. This is my selection of the articles that I enjoyed writing most, which does not always overlap with the most popular ones. Given the advent of the Maths & Science section, there are now seven categories into which I have split articles. These are as follows:

General Data Articles
Data Visualisation
Statistics & Data Science
CDO perspectives
Programme Advice
Analytics & Big Data
Maths & Science

In each category, I will pick out one or two pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.


General Data Articles

	February A Brief History of Databases
An infographic spanning the history of Database technology from its early days in the 1960s to the landscape in the late 2010s..
	July How to Spot a Flawed Data Strategy
What alarm bells might alert you to problems with your Data Strategy; based on the author’s extensive experience of both developing Data Strategies and vetting existing ones.
	August Fact-based Decision-making
Fact-based decision-making sounds like a no brainer, but just how hard is it to generate accurate facts?

Data Visualisation

	August As Nice as Pie
A review of the humble Pie Chart, what it is good at, where it presents problems and some alternatives.

Statistics & Data Science

	August Data Science Challenges – It’s Deja Vu all over again!
A survey of more than 10,000 Data Scientists highlights a set of problems that will seem very, very familiar to anyone working in the data space for a few years.

CDO Perspectives

	February The CDO – A Dilemma or The Next Big Thing?
Two Forbes articles argue different perspectives about the role of Chief Data Officer. The first (by Lauren deLisa Coleman) stresses its importance, the second (by Randy Bean) highlights some of the challenges that CDOs face.
	May onwards The “In-depth” series of CDO interviews
Rather than a single article, this is a series of four talks with prominent CDOs, reflecting on the role and its challenges.
	October The Chief Marketing Officer and the CDO – A Modern Fable
Discussing an alt-facts / “fake” news perspective on the Chief Data Officer role.

Programme Advice

	June Building Momentum – How to begin becoming a Data-driven Organisation
Many companies want to become data driven, but getting started on the journey towards this goal can be tough. This article offers a framework for building momentum in the early stages of a Data Programme.

Analytics & Big Data

	January Draining the Swamp
A review of some of the problems that can beset Data Lakes, together with some ideas about what to do to fix these from Dan Woods (Forbes), Paul Barth (Podium Data) and Dave Wells (Eckerson Group).
	February Sic Transit Gloria Magnorum Datorum
In a world where the word has developed a very negative connotation, what’s so bad about being traditional?
	August Convergent Evolution
What the similarities (and differences) between Ichthyosaurs and Dolphins can tell us about different types of Data Architectures.

Maths & Science

	March Euler’s Number
A long and winding road with the destination being what is probably the most important number in Mathematics.
	August The Irrational Ratio
The number π is surrounded by a fog of misunderstanding and even mysticism. This article seeks to address some common misconceptions about π, to show that in many ways it is just like any other number, but also to demonstrate some of its less common properties.
	October Glimpses of Symmetry, Chapter 24 – Emmy
One of the more recent chapters in my forthcoming book on Group Theory and Particle Physics. This focuses on the seminal contributions of Mathematician Emmy Noether to the fundamentals of Physics and the connection between Symmetry and Conservation Laws.

Notes

^[1]

The 2018 Top Ten by Hits
1.	The Irrational Ratio
2.	A Brief History of Databases
3.	Euler’s Number
4.	The Data and Analytics Dictionary
5.	The Equation
6.	A Brief Taxonomy of Numbers
7.	When I’m 65
8.	How to Spot a Flawed Data Strategy
9.	Building Momentum – How to begin becoming a Data-driven Organisation
10.	The Anatomy of a Data Function – Part I

Another article from peterjamesthomas.com. The home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases.

Follow @peterjthomas

More Definitions in the Data and Analytics Dictionary

19 September 2018 Peter James Thomas big data, business analytics, business intelligence, chief data officer, dashboards, data governance, data management, data quality, data science, data visualisation, data warehousing, Statistics

The peterjamesthomas.com Data and Analytics Dictionary is an active document and I will continue to issue revised versions of it periodically. Here are 20 new definitions, including the first from other contributors (thanks Tenny!):

Artificial Intelligence Platform
Data Asset
Data Audit
Data Classification
Data Consistency
Data Controls
Data Curation (contributor: Tenny Thomas Soman)
Data Democratisation
Data Dictionary
Data Engineering
Data Ethics
Data Integrity
Data Lineage
Data Platform
Data Strategy
Data Wrangling (contributor: Tenny Thomas Soman)
Explainable AI (contributor: Tenny Thomas Soman)
Information Governance
Referential Integrity
Testing Data (Training Data)

Remember that The Dictionary is a free resource and quoting contents (ideally with acknowledgement) and linking to its entries (via the buttons provided) are both encouraged.

People are now also welcome to contribute their own definitions. You can use the comments section here, or the dedicated form. Submissions will be subject to editorial review and are not guaranteed to be accepted.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary, The Anatomy of a Data Function and A Brief History of Databases

Follow @peterjthomas

Sic Transit Gloria Magnorum Datorum

23 February 2018 Peter James Thomas big data, business analytics, business intelligence

It happens to all of us eventually I suppose.

Just the other day, I heard someone referring to “traditional Big Data”. Since when did Big Data become “traditional”, I didn’t get the e-mail? Of course, in the technology field, the epithet “traditional” is code for “broken”, “no longer of any use” and – most damningly of all – “deeply uncool”. The term is widely used, whether – with this connotation – it is either helpful or accurate is perhaps a matter for debate. This usage makes me recall the rather silly debate about Analytics versus “traditional” Business Intelligence that occurred around 2009 ^[1].

By way of context, the person talking about “traditional Big Data” was referring to the difference between some of the original denizens of the Hadoop ecosystem and more recent offerings like Databricks or Beam. They also had in mind the various quasi-proprietary flavours of Big Data and/or Big Data plug-ins offered by (that word again) “traditional” vendors. In this sense, the usage is probably appropriate, albeit somewhat jarring. In the more pejorative sense I refer to above, “traditional” is somewhat misleading when applied to either Big Data or – in the author’s opinion – several of its precursors.

While we inhabit a world which places a premium on innovation, favouring the new and the shiny ^[2], traditional methods have much to offer. If something – a technique or technology – has achieved “traditional” status, it means that it has become part of how things are done. While shaking up the status quo can be beneficial, “traditional” approaches have the not insignificant benefit of having been tried and tested. “Traditional” data tools are ones that have survived some time and are still used. While not guaranteeing success, it should at least be possible to be successful with such tools because other people have done this before.

Maybe, several years after its move into the mainstream, Big Data has become “traditional”. However I would take this as meaning “fit for purpose”, “useful” and “still pretty cool”. Then I think the same about many of the technologies that were described as “traditional” in contrast to Big Data. As ever, the main things that lead to either success or failure in data-centric work ^[3] have very little to do with technology, be that traditional or à la mode.

Notes

^[1]	If you have the stomach for it, see Business Analytics vs Business Intelligence and succeeding articles.
^[2]	See also 2009’s The latest and greatest versus the valuable.
^[3]	I itemise a few of these in last year’s 20 Risks that Beset Data Programmes.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

Follow @peterjthomas

A Brief History of Databases

6 February 201826 April 2018 Peter James Thomas big data, infographics, technology database, neil raden

Larger PDF version (opens in a new tab)

The pace of change in the field of database technology seems to be constantly accelerating. No doubt in five year’s time ^[1], Big Data and the Hadoop suite ^[2] will seem to be as old-fashioned as earlier technologies can appear to some people nowadays. Today there is a great variety of database technologies that are in use in different organisations for different purposes. There are also a lot of vendors, some of whom have more than one type of database product. I think that it is worthwhile considering both the genesis of databases and some of the major developments that have occurred between then and now.

The infographic appearing at the start of this article seeks to provide just such a perspective. It presents an abridged and simplified perspective on the history of databases from the 1960s to the late 2010s. It is hard to make out the text in the above diagram, so I would recommend that readers click on the link provided in order to view a much larger version with bigger and more legible text.

The infographic references a number of terms. Below I provide links to definitions of several of these, which are taken from The Data and Analytics Dictionary. The list progresses from the top of the diagram downwards, but starts with a definition of “database” itself:

To my mind, it is interesting to see just how long we have been grappling with the best way to set up databases. Also of note is that some of the Big Data technologies are actually relatively venerable, dating to the mid-to-late 2000s (some elements are even older, consisting of techniques for handling flat files on UNIX or Mainframe computers back in the day).

I hope that both the infographic and the definitions provided above contribute to the understanding of the history of databases and also that they help to elucidate the different types of database that are available to organisations today.

Acknowledgements

The following people’s input is acknowledged on the document itself, but my thanks are also repeated here:

Neil Raden (@NeilRaden) of Hired Brains Research both reviewed the infographic and make significant contributions to its contents.

Of course any errors and omissions remain the responsibility of the author.

Notes

^[1]	If not significantly before then.
^[2]	One of J K Rowling’s lesser-known works.

From: peterjamesthomas.com, home of The Data and Analytics Dictionary and The Anatomy of a Data Function

Follow @peterjthomas

A further extension of the Data and Analytics Dictionary

23 January 201823 January 2018 Peter James Thomas big data, business analytics data function

The peterjamesthomas.com Data and Analytics Dictionary is an active document and I will continue to issue revised versions of it periodically. A larger update is in the works, but for now here are a dozen new definitions:

As previously stated, ideas for what to include next would be more than welcome (any suggestions used will also be acknowledged).

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

Follow @peterjthomas

Draining the Swamp

17 January 201824 March 2021 Peter James Thomas big data, data management dan woods, data marketplace, dave wells, eckerson group, forbes, paul barth, podium data

The title phrase of this article has entered the collective consciousness from political circles in recent months and years. Readers will be glad to hear that the political commentary content of this piece is precisely zero. Instead I am going to talk about Data Lakes, also referred to pejoratively by those who are not fans as Data Swamps.

Having started my relationship with Data matters back in the early days of Relational Databases and having driven corporate success through Data Warehouses and Business Intelligence, I have also done work in the Big Data arena since around 2013. A central concept in the Big Data paradigm is that of a Data Lake; a large Hadoop repository into which all data that an organisation might want to use is poured, often essentially as is. The thinking is that – in a Big Data implementation – storage is cheap ^[1] and you never fully know what data you might need in advance, so why not save it all?

It is probably fair to say that – much like many other major programmes of work over the years ^[2] – the creation of Data Lakes, or perhaps more accurately the leverage of their contents, has resulted in at best mixed results for the organisations that undertake such an endeavour. The thing with mixed results is that it is not all doom and gloom, some people are successful, others are not. The important thing is to determine what are the factors that lead to good and bad outcomes.

Well first of all, I would suggest that – like any other data programme – the formation of a Data Lake is subject to the types of potential issues that I review in my 2017 article, 20 Risks that Beset Data Programmes. Of these, Data Lakes are particularly susceptible to risk 16:

In the absence of [understanding key business decisions], the programme becoming a technology-driven one.

The business gets what IT or Change think that they need, not what is actually needed. There is more focus on shiny toys than on actionable information. The programme forgets the needs of its customers.

The issue here is that some people buy into the misconception that all you have to do is fill the Data Lake and sit back and wait for precious Data gems to flow from it. Understanding a business and its key decisions is tough and perhaps it is not surprising that people would like to skip this step and instead focus on easier activities. Sadly, this approach is not going to work for Data Lakes or anything else.

However Data Lakes also face some specific risks and in search of better understanding these, I turned to a recent Forbes article, Can Failed Data Lakes Succeed As Data Marketplaces? penned by Dan Woods (@danwoodsearly) ^[3]. Dan does not mince words in his introduction:

All over the world, data lake projects are foundering, not because they are not a step in the right direction, but because they are essentially uncompleted experiments.

he adds:

The main roadblock has been that once companies store their data in the data lake, they struggle to find a way to operationalize it. The data lake has never become a product like a data warehouse. Proof of concepts are tweaked to keep a desultory flow of signals going.

and finally states:

[…] for certain use cases, Hadoop and purpose-built data lake-like infrastructure are solving complex and high-value problems. But in most other businesses, the data lake got stuck at the proof of concept stage.

This chimes with my experience – the ability to synthesise and analyse vast troves of data is indispensable in addressing some business problems, but a sledge-hammer to crack a walnut for others. Data Lakes are no more universal panaceas than anything else we have invented to date. As always, the main issues are not technology, but good processes, consistent definitions, improved data quality and matching available data to real business questions.

In seeking salvation (Dan’s word) for Data Lakes, he sought the opinion of one of my LinkedIn contacts, Paul Barth (@BarthPS), CEO of Podium Data. Paul analyses the root causes of Data Lake issues, splitting these into three main ones ^[4]:

Polluted data lakes
Too many projects targeted at filling or exploiting the Data Lake kick off in parallel. This leads to an incoherent landscape and inaccessible / difficult to understand data.
Bottlenecked data lakes
Essentially treating the Data Lake as if it was a Data Warehouse where the technology is designed for different and less structured purposes. This leads to a quasi-warehouse that is less performant than actual warehouses.
Risky data lakes
Where there is a desire to quickly populate the Data Lake, not least to provide grist to the Data Science mill, appropriate controls on access to data can be neglected; particularly an issue where personally identifiable data is involved. This can lead to regulatory, legal and reputational peril.

Barth’s solution to these problems is the establishment of a Data Marketplace. This is a concept previously referenced on these pages in Predictions about Prediction, a review of consultancy Eckerson Group‘s views on Data and Analytics in 2017 ^[5]. Back then, Eckerson Group had the following to say about the area:

[An Enterprise Data Marketplace (EDM) is] an Amazon-like data marketplace where analysts can seek datasets, see reviews of others, and select the best-fit datasets for their needs helps to encourage dataset reuse, minimize redundancy, and prevent flawed analysis that results from working with less than ideal data. Data cataloging tools, data curation practices, data preparation technologies, and data services will be combined to create a marketplace for data seekers. Enterprise Data Marketplaces return us to the single-source vision that was once touted as the real benefit of Enterprise Data Warehouses.

So, as illustrated above, a Data Marketplace is essentially a collection of tagged data sets, which have in some cases been treated to increase consistency and utility, combined with information about their contents and usages. These are overlaid by what is essentially a “social media” layer where “shoppers” can search for data and provide feedback on its utility (e.g. a rating mechanism) and also add their own documentation. This means that useful data sets get highly rated and have more explanatory material attached to them.

Eckerson Group build on this concept in their white paper The Rise of the Data Marketplace (opens a PDF document), work commissioned in part by Podium Data. In this Eckerson’s Dave Wells (@_DaveWells_) characterises an Enterprise Data Marketplace as having the following attributes ^[6]:

Categorization organises the marketplace to simplify browsing. For example a shopper seeking budget data doesn’t need to browse through unrelated data sets about customers, employees or other data subjects. Categories complement tagging and smart search algorithms, offering a variety of ways to find data sets.

Curation is active management of the data sets that are available in the EDM. Curation selects and qualifies data sets, describes each data set, and collects and manages metadata about the collection and each individual data set.

Cataloging exposes data sets for data shoppers, including descriptions and metadata. The catalog is a view into the inventory of curated data sets. Rich metadata and powereful search are important catalog features.

Crowdsourcing is the equivalent of a social network for data. Data shoppers actively participate in catloging, curating and categorizing data. This virtuous cycle (a chain of events that reinforces outcomes through a feedback loop) continuously improves the quality and value of data in the marketplace.

Back in the Forbes article, Barth focuses on using the Data Marketplace’s interactive elements to identify the most valuable data (that which is searched for most frequently and has the best shopper rating). This data can then be the subject of focussed investment. Such investment is of the sort familiar in Data Warehouse activities, but it is directed by shoppers’ “social media” preferences rather than more formal requirements gathering exercises.

Dan Woods makes the pertinent observation that:

So, as the challenge now is not one of technology, but of setting a vision, companies have to decide how to incorporate a new set of requirements to get the most out of their data. […] Even within one company, there may be the need for multiple requirements to be met. Marketing may not need the precision that the accounting department requires. Groups with regulatory mandates may have strong compliance requirements that drive the need for data that is 100% accurate, while those doing exploration for product development purposes may prefer to have larger datasets to work with, and 90% accuracy is all that they require. The data lake must be able to employ multiple approaches as needed by different applications and groups of users.

His article finishes with the following clarion call to implement the Data Marketplace vision:

Companies achieve data transparency with data warehouses because of the use of canonical data models. Yet data in data warehouses was trapped in slow processes that lacked agility. The data warehouse data was well understood but couldn’t evolve at the speed of business. The data lake wasn’t able to correct this problem because companies didn’t implement lakes with a sufficiently comprehensive vision. That’s what they need to do now.

While when I hear about Data Warehouses that take months to change, poor design and a lack of automation both come to mind, it is unarguable that some Data Warehouses can be plagued by long turn-around times ^[7]. Equally I have seen enough Data Lakes turn into Grimpen Mire to perceive that there are some major issues inherent in an unmodified approach to this area ^[8]. The Data Marketplace idea is an intriguing one, a mash-up ^[9] of different approaches that may just yield some tangible results.

I also think that the inherent focus on users’ needs as opposed to technological considerations is the right way to go. I have been making this point for many years now ^[10] and have full confidence that I will still be doing so in ten years’ time. As with most aspects of life, it is with people, and how a programme interacts with them, that success and failure factors are most readily found. It seems to me that the Data Marketplace approach seeks to embrace this verity, which can only be a point in its favour.

Acknowledgements

I would like to thank each of Forbes / Dan Woods, Podium Data / Paul Barth and Eckerson Group / Dave Wells for both reviewing this article and allowing me to quote their work. Such generous behaviour is not as typical as one might like to think and always merits recognition.

Notes

^[1]	Though the total cost of saving such data extends beyond just disk costs and can become significant.
^[2]	See my earlier article Ever tried? Ever failed? for a treatment of what is clearly a fundamental physical constant – that 60- 70% of all types of major programmes don’t fully achieve their objectives (aka fail). Data Lakes appear to also be governed by this Law of Nature.
^[3]	You may need to navigate past a Forbes banner screen before you can access the actual article.
^[4]	The following is my take in Paul’s analysis, for his actual words, see the Forbes article.
^[5]	Watch this space for a review of Eckerson Group’s predictions for 2018.
^[6]	Which I reproduce with permission.
^[7]	By way of contrast, warehouses that my teams have built have been able to digest acquisitions and meet new and onerous regulatory requirements in a matter of weeks, not months.
^[8]	I should stress here a difference between Data Lakes, which seek to be all-embracing, and more focussed Big Data activities, e.g. the building of complex seismological or meteorological models to assess catastrophic insurance risk (see Hurricanes and Data Visualisation: Part II – Map Reading). I have helped the latter to be very successful myself and seen good results in other organisations.
^[9]	Do people still say “mash-up”?
^[10]	For example in my 2008 trilogy: Marketing Change Education and cultural transformation Sustaining Cultural Change

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

Follow @peterjthomas

A Retrospective of 2017’s Articles

11 January 2018 Peter James Thomas astronomy, big data, business analytics, change management, chief data officer, data science, data visualisation, project management, Statistics

This article was originally intended for publication late in the year it reviews, but, as they ^[1] say, the best-laid schemes o’ mice an’ men gang aft agley…

In 2017 I wrote more articles ^[2] than in any year since 2009, which was the first full year of this site’s existence. Some were viewed by thousands of people, others received less attention. Here I am going to ignore the metric of popular acclaim and instead highlight a few of the articles that I enjoyed writing most, or sometimes re-reading a few months later ^[3]. Given the breadth of subject matter that appears on peterjamesthomas.com, I have split this retrospective into six areas, which are presented in decreasing order of the number of 2017 articles I wrote in each. These are as follows:

General Data Articles
Data Visualisation
Statistics & Data Science
CDO perspectives
Programme Advice
Analytics & Big Data

In each category, I will pick out two or three of pieces which I feel are both representative of my overall content and worth a read. I would be more than happy to receive any feedback on my selections, or suggestions for different choices.


General Data Articles

	August The Data and Analytics Dictionary
My attempt to navigate the maze of data and analytics terminology. Everything from Algorithm to Web Analytics.
	November & December The Anatomy of a Data Function: Part I, Part II and Part III
Three articles focussed on the structure and components of a modern Data Function and how its components interact with both each other and the wider organisation in order to support business goals.

Data Visualisation

	January Nucleosynthesis and Data Visualisation
How one of the most famous scientific data visualisations, the Periodic Table, has been repurposed to explain where the atoms we are all made of come from via the processes of nucleosynthesis.
	September & October Hurricanes and Data Visualisation: Part I – Rainbow’s Gravity and Part II – Map Reading
Two articles on how Data Visualisation is used in Meteorology. Part I provides a worked example illustrating some of the problems that can arise when adopting a rainbow colour palette in data visualisation. Part II grapples with hurricane prediction and covers some issues with data visualisations that are intended to convey safety information to the public.

Statistics & Data Science

	February Toast
What links Climate Change, the Manhattan Project, Brexit and Toast? How do these relate to the public’s trust in Science? What does this mean for Data Scientists? Answers provided by Nature, The University of Cambridge and the author.
	February How to be Surprisingly Popular
The wisdom of the crowd relies upon essentially democratic polling of a large number of respondents; an approach that has several shortcomings, not least the lack of weight attached to people with specialist knowledge. The Surprisingly Popular algorithm addresses these shortcomings and so far has out-performed existing techniques in a range of studies.
	October A Nobel Laureate’s views on creating Meaning from Data
The 2017 Nobel Prize for Chemistry was awarded to Structural Biologist Richard Henderson and two other co-recipients. What can Machine Learning practitioners learn from Richard’s observations about how to generate images from Cryo-Electron Microscopy data?

CDO Perspectives

	January Alphabet Soup
Musings on the overlapping roles of Chief Analytics Officer and Chief Data Officer and thoughts on whether there should be just one Top Data Job in an organisation.
	February A Sweeter Spot for the CDO?
An extension of my concept of the Chief Data Officer sweet spot, inspired by Bruno Aziza of AtScale.
	September A truth universally acknowledged…
Many Chief Data Officer job descriptions have a list of requirements that resemble Swiss Army Knives. This article argues that the CDO must be the conductor of an orchestra, not someone who is a virtuoso in every single instrument.

Programme Advice

	January Bumps in the Road
What the aftermath of repeated roadworks can tell us about the potentially deleterious impact of Change Programmes on Data Landscapes.
	February 20 Risks that Beset Data Programmes
A review of 20 risks that can plague data programmes. How effectively these are managed / mitigated can make or break your programme.
	March Ideas for avoiding Big Data failures and for dealing with them if they happen
Paul Barsch (EY & Teradata) provides some insight into why Big Data projects fail, what you can do about this and how best to treat any such projects that head off the rails. With additional contributions from Big Data gurus Albert Einstein, Thomas Edison and Samuel Beckett.

Analytics & Big Data

	February Bigger and Better (Data)?
Some examples of where bigger data is not necessarily better data. Provided by Bill Vorhies and Larry Greenemeier .
	March Elephants’ Graveyard?
Thoughts on trends in interest in Hadoop and Spark, featuring George Hill, James Kobielus, Kashif Saiyed and Martyn Richard Jones, together with the author’s perspective on the importance of technology in data-centric work.

and Finally…

I would like to close this review of 2017 with a final article, one that somehow defies classification:


	April 25 Indispensable Business Terms
An illustrated Buffyverse take on Business gobbledygook – What would Buffy do about thinking outside the box? To celebrate 20 years of Buffy the Vampire Slayer and 1st April 2017.

Notes

^[1]	“They” here obviously standing for Robert Burns.
^[2]	Thirty-four articles and one new page.
^[3]	Of course some of these may also have been popular, I’m not being masochistic here!

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

Follow @peterjthomas

The revised and expanded Data and Analytics Dictionary

20 September 201719 September 2018 Peter James Thomas big data, business analytics, business intelligence, chief data officer, dashboards, data governance, data management, data quality, data science, data visualisation, data warehousing, Statistics

Since its launch in August of this year, the peterjamesthomas.com Data and Analytics Dictionary has received a welcome amount of attention with various people on different social media platforms praising its usefulness, particularly as an introduction to the area. A number of people have made helpful suggestions for new entries or improvements to existing ones. I have also been rounding out the content with some more terms relating to each of Data Governance, Big Data and Data Warehousing. As a result, The Dictionary now has over 80 main entries (not including ones that simply refer the reader to another entry, such as Linear Regression, which redirects to Model).

The most recently added entries are as follows:

It is my intention to continue to revise this resource. Adding some more detail about Machine Learning and related areas is probably the next focus.

As ever, ideas for what to include next would be more than welcome (any suggestions used will also be acknowledged).

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

Follow @peterjthomas

	Welcome to: peterjamesthomas.com, a site which covers my thoughts on the confluence of business, technology and change.
+44 (0) 2088 956 826
	[Web-site Contact]
	[Commercial Contact]
	[Creative Commons]
	[Report Problems]

Peter James Thomas

Data & Analytics: Consultancy, Interim Services and Research

big data

An expanded and more mobile-friendly version of the Data & Analytics Dictionary

Like this:

The latest edition of The Data & Analytics Dictionary is now out

Like this:

A Retrospective of 2018’s Articles

Like this:

More Definitions in the Data and Analytics Dictionary

Like this:

Sic Transit Gloria Magnorum Datorum

Like this:

A Brief History of Databases

Like this:

A further extension of the Data and Analytics Dictionary

Like this:

Draining the Swamp

Like this:

A Retrospective of 2017’s Articles

Like this:

The revised and expanded Data and Analytics Dictionary

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: