The revised and expanded Data and Analytics Dictionary

The Data and Analytics Dictionary

Since its launch in August of this year, the peterjamesthomas.com Data and Analytics Dictionary has received a welcome amount of attention with various people on different social media platforms praising its usefulness, particularly as an introduction to the area. A number of people have made helpful suggestions for new entries or improvements to existing ones. I have also been rounding out the content with some more terms relating to each of Data Governance, Big Data and Data Warehousing. As a result, The Dictionary now has over 80 main entries (not including ones that simply refer the reader to another entry, such as Linear Regression, which redirects to Model).

The most recently added entries are as follows:

  1. Anomaly Detection
  2. Behavioural Analytics
  3. Complex Event Processing
  4. Data Discovery
  5. Data Ingestion
  6. Data Integration
  7. Data Migration
  8. Data Modelling
  9. Data Privacy
  10. Data Repository
  11. Data Virtualisation
  12. Deep Learning
  13. Flink
  14. Hive
  15. Information Security
  16. Metadata
  17. Multidimensional Approach
  18. Natural Language Processing (NLP)
  19. On-line Transaction Processing
  20. Operational Data Store (ODS)
  21. Pig
  22. Table
  23. Sentiment Analysis
  24. Text Analytics
  25. View

It is my intention to continue to revise this resource. Adding some more detail about Machine Learning and related areas is probably the next focus.

As ever, ideas for what to include next would be more than welcome (any suggestions used will also be acknowledged).
 


 

From: peterjamesthomas.com, home of The Data and Analytics Dictionary

 

The peterjamesthomas.com Data and Analytics Dictionary

The Data and Analytics Dictionary

I find myself frequently being asked questions around terminology in Data and Analytics and so thought that I would try to define some of the more commonly used phrases and words. My first attempt to do this can be viewed in a new page added to this site (this also appears in the site menu):

The Data and Analytics Dictionary

I plan to keep this up-to-date as the field continues to evolve.

I hope that my efforts to explain some concepts in my main area of specialism are both of interest and utility to readers. Any suggestions for new entries or comments on existing ones are more than welcome.
 

 

The need for collaboration between teams using the same data in different ways

The Data Warehousing Institute

This article is based on conversations that took place recently on the TDWI LinkedIn Group [1].

The title of the discussion thread posted was “Business Intelligence vs. Business Analytics: What’s the Difference?” and the original poster was Jon Dohner from Information Builders. To me the thread topic is something of an old chestnut and takes me back to the heady days of early 2009. Back then, Big Data was maybe a lot more than just a twinkle in Doug Cutting and Mike Cafarella‘s eyes, but it had yet to rise to its current level of media ubiquity.

Nostalgia is not going to be enough for me to start quoting from my various articles of the time [2] and neither am I going to comment on the pros and cons of Information Builders’ toolset. Instead I am more interested in a different turn that discussions took based on some comments posted by Peter Birksmith of Insurance Australia Group.

Peter talked about two streams of work being carried out on the same source data. These are Business Intelligence (BI) and Information Analytics (IA). I’ll let Peter explain more himself:

BI only produces reports based on data sources that have been transformed to the requirements of the Business and loaded into a presentation layer. These reports present KPI’s and Business Metrics as well as paper-centric layouts for consumption. Analysis is done via Cubes and DQ although this analysis is being replaced by IA.

[…]

IA does not produce a traditional report in the BI sense, rather, the reporting is on Trends and predictions based on raw data from the source. The idea in IA is to acquire all data in its raw form and then analysis this data to build the foundation KPI and Metrics but are not the actual Business Metrics (If that makes sense). This information is then passed back to BI to transform and generate the KPI Business report.

I was interested in the dual streams that Peter referred to and, given that I have some experience of insurance organisations and how they work, penned the following reply [3]:

Hi Peter,

I think you are suggesting an organisational and technology framework where the source data bifurcates and goes through two parallel processes and two different “departments”. On one side, there is a more traditional, structured, controlled and rules-based transformation; probably as the result of collaborative efforts of a number of people, maybe majoring on the technical side – let’s call it ETL World. On the other a more fluid, analytical (in the original sense – the adjective is much misused) and less controlled (NB I’m not necessarily using this term pejoratively) transformation; probably with greater emphasis on the skills and insights of individuals (though probably as part of a team) who have specific business knowledge and who are familiar with statistical techniques pertinent to the domain – let’s call this ~ETL World, just to be clear :-).

You seem to be talking about the two of these streams constructively interfering with each other (I have been thinking about X-ray Crystallography recently). So insights and transformations (maybe down to either pseudo-code or even code) from ~ETL World influence and may be adopted wholesale by ETL World.

I would equally assume that, if ETL World‘s denizens are any good at their job, structures, datasets and master data which they create (perhaps early in the process before things get multidimensional) may make work more productive for the ~ETLers. So it should be a collaborative exercise with both groups focused on the same goal of adding value to the organisation.

If I have this right (an assumption I realise) then it all seems very familiar. Given we both have Insurance experience, this sounds like how a good information-focused IT team would interact with Actuarial or Exposure teams. When I have built successful information architectures in insurance, in parallel with delivering robust, reconciled, easy-to-use information to staff in all departments and all levels, I have also created, maintained and extended databases for the use of these more statistically-focused staff (the ~ETLers).

These databases, which tend to be based on raw data have become more useful as structures from the main IT stream (ETL World) have been applied to these detailed repositories. This might include joining key tables so that analysts don’t have to repeat this themselves every time, doing some basic data cleansing, or standardising business entities so that different data can be more easily combined. You are of course right that insights from ~ETL World often influence the direction of ETL World as well. Indeed often such insights will need to move to ETL World (and be produced regularly and in a manner consistent with existing information) before they get deployed to the wider field.

Now where did I put that hairbrush?

It is sort of like a research team and a development team, but where both “sides” do research and both do development, but in complementary areas (reminiscent of a pair of entangled electrons in a singlet state, each of whose spin is both up and down until they resolve into one up and one down in specific circumstances – sorry again I did say “no more science analogies”). Of course, once more, this only works if there is good collaboration and both ETLers and ~ETLers are focussed on the same corporate objectives.

So I suppose I’m saying that I don’t think – at least in Insurance – that this is a new trend. I can recall working this way as far back as 2000. However, what you describe is not a bad way to work, assuming that the collaboration that I mention is how the teams work.

I am aware that I must have said “collaboration” 20 times – your earlier reference to “silos” does however point to a potential flaw in such arrangements.

Peter

PS I talk more about interactions with actuarial teams in: BI and a different type of outsourcing

PPS For another perspective on this area, maybe see comments by @neilraden in his 2012 article What is a Data Scientist and what isn’t?

I think that the perspective of actuaries having been data scientists long before the latter term emerged is a sound one.

I couldn't find a suitable image from Sesame Street :-o

Although the genesis of this thread dates to over five years ago (an aeon in terms of information technology), I think that – in the current world where some aspects of the old divide between technically savvy users [4] and IT staff with strong business knowledge [5] has begun to disappear – there is both an opportunity for businesses and a threat. If silos develop and the skills of a range of different people are not combined effectively, then we have a situation where:

| ETL World | + | ~ETL World | < | ETL World ∪ ~ETL World |

If instead collaboration, transparency and teamwork govern interactions between different sets of people then the equation flips to become:

| ETL World | + | ~ETL World | ≥ | ETL World ∪ ~ETL World |

Perhaps the way that Actuarial and IT departments work together in enlightened insurance companies points the way to a general solution for the organisational dynamics of modern information provision. Maybe also the, by now somewhat venerable, concept of a Business Intelligence Competency Centre, a unified team combining the best and brightest from many fields, is an idea whose time has come.
 
 
Notes

 
[1]
 
A link to the actual discussion thread is provided here. However You need to be a member of the TDWI Group to view this.
 
[2]
 
Anyone interested in ancient history is welcome to take a look at the following articles from a few years back:

  1. Business Analytics vs Business Intelligence
  2. A business intelligence parable
  3. The Dictatorship of the Analysts
 
[3]
 
I have mildly edited the text from its original form and added some new links and new images to provide context.
 
[4]
 
Particularly those with a background in quantitative methods – what we now call data scientists
 
[5]
 
Many of whom seem equally keen to also call themselves data scientists

 

 

Business Intelligence Competency Centres

Introduction

The subject of this article ought to be reasonably evident from its title. However there is perhaps some room for misinterpretation around even this. Despite the recent furore about definitions, most reasonable people should be comfortable with a definition of business intelligence. My take on this is that BI is simply using information to drive better business decisions. In this definition, the active verb “drive” and the subject “business decisions” are the key elements; something that is often forgotten in a rush for technological fripperies.
 
 
The central issue

Having hopefully addressed of the “BI” piece of the BICC acronym, let’s focus on the “CC” part. I’ll do this in reverse order, first of all considering what is meant by “centre”. As ever I will first refer to my trusted Oxford English Dictionary for help. In a discipline, such as IT, which is often accused of mangling language and even occasionally using it to obscure more than to clarify, a back-to-basics approach to words can sometimes yield unexpected insights.

  centre / séntər / n. & v. (US center) 3 a a place or group of buildings forming a central point in a district, city, etc., or a main area for an activity (shopping centre, town centre).
(O.E.D.)
 

Ignoring the rather inexcusable use of the derived adjective “central” in the definition of the noun “centre”, then it is probably the “main area for an activity” sense that is meant to be conveyed in the final “C” of BICC. However, there is also perhaps some illumination to be had in considering another meaning of the word:

Centre of a Sphere

  n. 1 a the middle point, esp. of a line, circle or sphere, equidistant from the ends, or from any point on the circumference or surface.
(O.E.D.)
 

As well as appealing to the mathematician in me, this meaning gives the sense that a BICC is physically central geographically, or metaphorically central with respect to business units. Of course this doesn’t meant than a BICC needs to be at the precise centre of gravity of an organisation, with each branch contributing a “weight” calculated by its number of staff, or revenue; but it does suggest that the competency centre is located at a specific point, not dispersed through the organisation.

Of course, not all organisations have multiple locations. The simplest may not have multiple business units either. However, there is a sense by which “centre” means that a BICC should straddle whatever diversity there is an organisation. If it is in multiple countries, then the BICC will be located in one of these, but serve the needs of the others. If a company has several different divisions, or business units, or product streams; then again the BICC should be a discrete area that supports all of them. Often what will make most sense is for the BICC to be located within an organisation’s Head Office function. There are a number of reasons for this:

  1. Head Office similarly straddles geographies and business units and so is presumably located in a place that makes sense to do this from (maybe in an organisation’s major market, certainly close to a transport hub if the organisation is multinational, and so on).
  2. If a BICC is to properly fulfil the first two letters of its abbreviation, then it will help if it is collocated with business decision-makers. Head Office is one place than many of these are found, including generally the CEO, the CFO, the Head of Marketing and Business Unit Managers. Of course key decision makers will also be spread throughout the organisation (think of Regional and Country Managers), but it is not possible to physically collocate with all of these.
  3. Another key manager who is hopefully located in Head Office is the CIO (though this is dispiritingly not always the case, with some CIOs confined to IT ghettos, far from the rest of the executive team and with a corresponding level of influence). Whilst business issues are pre-eminent in BI, of course there is a major technological dimension and a need to collaborate closely with those charged with running the organisation’s IT infrastructure and those responsible for care and feeding of source data systems.
  4. If a BI system is to truly achieve its potential, then it must become all pervasive; including a wide range of information from profitability, to sales, to human resources statistics, to expense numbers. This means that it needs to sit at the centre of a web of different systems: ERP, CRM, line of business systems, HR systems etc. Often the most convenient place to do this from will be Head Office.

Thusfar, I haven’t commented on the business benefits of a BICC. Instead I have confined myself to explaining what people mean by the second “C” in the name and why this might be convenient. Rather than making this an even longer piece, I am going to cover both the benefits and disadvantages of a BICC in a follow-on article. Instead let’s now move on to considering the first “C”: Competency.
 
 
Compos centris

Returning to our initial theme of generating insights via an examination of the meaning of words in a non-IT context, let’s start with another dictionary definition:

Motar board

  competence /kómpit’nss/ n. (also competency /kómpitənsi/) 1 (often foll. by for, or to + infin.) ability; the state of being competent.

and given the recursive reliance of the above on the definition of competent…
  competent /kómpit’nt/ adj. 1 a (usu. foll. by to + infin.) properly qualified or skilled (not competent to drive); adequately capable, satisfactory. b effective (a competent bastman*).
(O.E.D.)
 

* People who are not fully conversant with the mysteries of cricket may substitute “batter” here.

To me the important thing to highlight here is that, while it is to be hoped that a BICC will continue to become more competent once it is up and running, in order to successfully establish such a centre, a high degree of existing competence is a prerequisite. It is not enough to simply designate some floor space and allocate a number of people to your BICC, what you need is at least a core of seasoned professionals who have experience of delivering transformational information and know how to set about doing it.

There are many skills that will be necessary in such a group. These match the four main pillars of a BI implementation (I cover these in more depth in several places on the blog, including BI implementations are like icebergs and the middle section of Is outsourcing business intelligence a good idea?):

  1. Understand the important business decisions and what figures are necessary to support these.
  2. Understand the data available in the organisation, how it relates to other data and to business decisions.
  3. Transform the data to provide information answering business questions.
  4. Focus on embedding the use of information in the corporate DNA.

So a successful BICC must include: people with strong analytical skills and an understanding of general business practices; high-calibre designers; reliable and conscientious ETL and general programmers; experts in the care, feeding and design of databases; excellent quality assurance professionals; resource conversant with both whatever front-end tools you are using to deliver information and general web programming; staff with skills in technical project management; people who can both design and deliver training programmes; help desk personnel; and last, but by no means least, change managers.

Of course if your BI project is big enough, then you may be able to afford to have people dedicated to each of these roles. If resources are tighter (and where is this not the case nowadays?) then it is better to have people who can wear more than one hat: business analysts who can also design; BI programmers who will also take support calls; project managers who will also run training classes; and so on. This approach saves money and also helps to deal with the inevitable peaks and troughs of resource requirements at different stages in a project. I would recommend setting things up this way (or looking to stretch your people’s abilities into new areas) even if you have the luxury of a budget that would allow a more discrete approach. The challenge of course is going to be finding and retaining such multi-faceted staff.

Also, it hopefully goes without saying that BI is a very business-focussed area and some BICCs will explicitly include business people in them. Even if you do not go this far, then the BICC will have to form a strong partnership with key business stakeholders, often spread across multiple territories. The skill to manage this effectively is in itself a major requirement of the leading personnel of the centre.

Given all of the above, the best way to staff a BICC is with members of a team who have already been successful with a BI project within your organisation; maybe one that was confined to a given geographic region or business unit. If you have no such team, then starting with a BICC is probably a bridge too far. Instead my recommendation would be to build up some competency via a smaller BI project. Alternatively, if you have more than one successful BI team (and, despite the manifold difficulties in getting BI right, such things are not entirely unheard of) then maybe blending these together makes sense. This is unless there is some overriding reason not to (e.g. vastly different team cultures or methodologies. In this case, picking a “winner” may be a better course of action.

Such a team will already have the skills outlined above in abundance (else they could never have been successful). It is also likely that whatever information was needed in their region or business unit will be at least part of what is needed at the broader level of a BICC. Given that there are many examples of BI projects not delivering or consuming vastly more resource than anticipated, then leveraging those exceptional people who have managed to swim against this tide is eminently sensible. Such battle-hardened professionals will know what pitfalls to avoid, which areas are most important to concentrate on and can use their existing products to advertise the benefits of a wider system. If you have such people at the core of your BICC, then it will be easier to integrate new joiners and quickly shepherd them up the learning curve (something that can be particularly long in BI due to the many different aspects of the work).

Of course having been successful in one business unit or region is not enough to guarantee success on a larger scale. I spoke about some of the challenges of doing this in an earlier article, Developing an international BI strategy. Another issue that is likely to raise its head is the political dimension, in particular where different business units or regions already have a management information strategy at some stage of development. This is another area that I will also cover in more detail in a forthcoming piece.
 
 
Conclusions

It seems that simply musing on the normal meanings of the words “competency” and “centre” has led us into some useful discussions. As mentioned above, at least two other blog postings will expand upon areas that have been highlighted in this piece. For now what I believe we have learned so far is:

  • BICCs should (by definition) straddle multiple geographies and/or business units.
  • There are sound reasons for collocating the BICC with Head Office.
  • There is need for a wide range of skills in your BICC, both business-focussed and technical.
  • At least the core of your BICC should be made up of competent (and experienced) BI professionals .

More thoughts on the benefits and disadvantages of business intelligence competency centres and also the politcs that they have to negotiate will appear on this blog in future weeks.