Who should be accountable for data quality?

The cardinality of a countable set - ex-mathematicians are allowed the occasional pun

linkedin CIO Magazine CIO Magazine forum

Asking the wrong question

Once more this post is inspired by a conversation on LinkedIn.com, this time the CIO Magazine forum and a thread entitled BI tool[s] can not deliver the expected results unless the company focuses on quality of data posted by Caroline Smith (normal caveat: you must be a member of LinkedIn.com and the group to view the actual thread).

The discussion included the predictable references to GIGO, but conversation then moved on to who has responsibility for data quality, IT or the business.

My view on how IT and The Business should be aligned

As regular readers of this column will know, I view this as an unhelpful distinction. My belief is that IT is a type of business department, with specific skills, but engaged in business work and, in this, essentially no different to say the sales department or the strategy department. Looking at the question through this prism, it becomes tautological. However, if we ignore my peccadillo about this issue, we could instead ask whether responsibility for data quality should reside in IT or not-IT (I will manfully resist the temptation to write ~IT or indeed IT’); with such a change, I accept that this is now a reasonable question.
Answering a modified version of the question

In information technology, telecommunications, and related fields, handshaking is an automated process of negotiation that dynamically sets parameters of a communications channel established between two entities before normal communication over the channel begins. It follows the physical establishment of the channel and precedes normal information transfer.

My basic answer is that both groups will bring specific skills to the party and a partnership approach is the one that is most likely to end in success. There are however some strong arguments for IT playing a pivotal role and my aim is to expand on these in the rest of this article.

The four pillars of improved data quality

Before I enumerate these, one thing that I think is very important is that data quality is seen as a broad issue that requires a broad approach to remedy it. I laid out what I see as the four pillars of improving data quality in an earlier post: Using BI to drive improvements in data quality. This previous article goes into much more detail about the elements of a successful data quality improvement programme and its title provides a big clue as to what I see as the fourth pillar. More on this later.
1. The change management angle

Again, as with virtually all IT projects, the aim of a data quality initiative is to drive different behaviours. This means that change management skills are just as important in these types projects as in the business intelligence work that they complement. This is a factor to consider when taking decisions about who takes the lead in looking to improve data quality; who amongst the available resources have established and honed change management skills? The best IT departments will have a number of individuals who fit this bill, if not-IT has them as well, then the organisation is spoilt for choice.
2. The pan-organisational angle

Elsewhere I have argued that BI adds greatest value when it is all-pervasive. The same observations apply to data quality. If we assume that an organisation has a number of divisions, each with their own systems (due to the nature of their business and maybe also history), but also maybe sharing some enterprise applications. While it would undeniably be beneficial for Division A to get their customer files in order, it would be of even greater value if all divisions did this at the same time and with a consistent purpose. This would allow the dealings of Customer X across all parts of the business to be calculated and analysed. It could also drive cross-selling opportunities in particular market segments.

While it is likely that a number of corporate staff of different sorts will have a very good understanding about the high-level operations of each of the divisions, it is at least probable that only IT staff (specifically those engaged in collating detailed data from each division for BI purposes) will have an in-depth understanding of how transactions and master data are stored in different ways across the enterprise. This knowledge is a by-product of running a best practice BI project and the collateral intellectual property built up can be of substantial business value.
3. The BI angle

It was this area that formed the backbone of the earlier data quality article that I referenced above. My thesis was that you could turn the good data quality => good BI relationship on its head and use the BI tool to drive data quality improvements. The key here was not to sanitise data problems, but instead to expose them, also leveraging standard BI functionality like drill through to allow people to identify what was causing an issue.

One of the most pernicious data quality issues is of the valid, but wrong entry. For example a transaction is allocated a category code of X, which is valid, but the business event demands the value Y. Sometimes it is possible to guard against this eventuality by business rules, e.g. Product A can only be sold by Business Unit W, but this will not be possible for all such data. A variant of this issue is data being entered in the wrong field. Having spent a while in the Insurance industry, it was not atypical for a policy number to be entered as a claim value for example. Sometimes there is no easy systematic way to detect this type of occurrence, but exposing issues in a well-designed BI system is one way of noticing odd figures and then – crucially – being able to determine what is causing them.
4. The IT character angle

I was searching round for a way to put this nicely and then realised that Jim Harris had done the job for me in naming his excellent Obsessive-Compulsive Data Quality blog (OCDQ Blog). I’m an IT person, I may have general management experience and a reasonable understanding of many parts of business, but I remain essentially an IT person. Before that, I was a Mathematician. People in both of those lines of work tend to have a certain reputation; to put it positively, the ability to focus extremely hard on something for long periods is a common characteristic.

  Aside: for the avoidance of doubt, as I pointed out in Pigeonholing – A tragedy, the fact that someone is good at the details does not necessarily preclude them from also excelling at seeing the big picture – in fact without a grasp on the details the danger of painting a Daliesque big picture is perhaps all too real!  

Improving data quality is one of the areas where this personality trait pays dividends. I’m sure that there are some marketing people out there who have relentless attention to detail and whose middle name is “thoroughness”, however I suspect there are rather less of them than among the ranks of my IT colleagues. While leadership from the pertinent parts of not-IT is very important, a lot of the hard yards are going to be done by IT people; therefore it makes sense if they have a degree of accountability in this area.
In closing

Much like most business projects, improving data quality is going to require a cross-functional approach to achieve its goals. While you often hear the platitudinous statement that “the business must be responsible for the quality of its own data”, this ostensible truism hides the fact that one of the best ways for not-IT to improve the quality of an organisation’s data is to get IT heavily involved in all aspects of this work.

IT for its part can leverage both its role as one of the supra-business unit departments and its knowledge of how business transactions are recorded and move from one system to another to become an effective champion of data quality.

22 thoughts on “Who should be accountable for data quality?

  1. Peter,

    Good post.

    I think I agree with some aspects and slightly disagree with others (but only slightly).

    The visual illustrating where “business” and “IT” sit in relation to each other is a useful one and is as much an expression of the internal relationships of the organisation as a whole. The boundaries between business and IT are a lot less clear than they used to be in most organisations, but this still means that both functions should exist within the organisation as a whole. The key here is clariy and communication – there should be clarity to all on departmental responsibilities and communication between parties on all relevant matters.

    Your four pillars analogy and the different angles business people approach DQ problems is valuable and links well with the lively debate started by Henrik Sorenson at http://liliendahl.wordpress.com/2010/03/01/bad-word-data-owner/. I think that we are both in a similar position here in that the nature and complexity of the problem is so broad that any attempt to neatly pigeon-hole responsibilities into “data owner” or “Business/IT” is too simplistic.

    What is probably needed here is a widely agreed set of generic roles and descriptions that can be applied to data activities. If these titles were to become commonly understood in general business circles, then the overall approaches to improving data quality should become easier.

    Looking forward to a lively debate on this one….

    • Julian,

      Thanks for the reply – as you say it doesn’t sound as if we are a million miles away from each other – and without divergence of opinion the world would get awfully boring :-).


      • Peter,

        I agree that sometimes disagreement and debate makes for an interesting life (and can liven up blog debates etc.). However, we need to ensure that we do not get into a position where we are so involved in debate amongst the DQ community that we forget to engage with “the business”.

        If we can get to a point where there is general agreement and a clear message to sell, then the overall approaches to DQ management in businesses should improve.


  2. Peter

    This is an excellent post.

    A few things stand out:

    One of the most pernicious data quality issues is of the valid, but wrong entry.


    Second, I agree that it’s a partnership. Most LOB users don’t have the technical savvy to fix highly complex data- and system-related issues (at the risk of generalizing). Further, when IT and LOB folks work together to solve a problem, I’d argue that the mutual exchange of information and understanding benefits each person, department, and the organization as a whole.

    If this knowledge is transcribed, shared, and codified, then the organization will find itself with better DQ, BI, and ultimately results.

  3. Good one, Peter!!!

    The article cleanly depicts the manner in which IT and business users must work in conjunction with each other. I would like to add just one variation to this scenario where
    1. There is a business team in the IT departments who are domain experts and appreciate technology

    2. An IT team in the business departments who are tech geeks and understand the business

    If I were to represent it as a diagram these two teams would fall in the intersection of ‘IT’ and ‘The Business’ spheres of your diagram 1.

    The BI angle of performing root cause analysis for DQ problems is another point that is worthy of a mention. Traditionally organizations adopt a data profiling exercise before venturing into MDM/data quality/data warehousing applications adding a BI dimension to this is certainly worth it.

    “Data Quality is as good as the business rules that govern their entry, flow/processing, and exit” which necessitates a cross functional approach pooling resources from both the business and IT


    • Thank you for the comment Satesh,

      I prefer the model of IT being a subset of The Business, rather than a separate set intersecting with it. I do however realise where you are coming from.


  4. Great post Peter.

    I think as Julian says the lines are far more blurred than ever these days.

    I was reading a data quality survey of financial institutions recently, I think it quoted that in 36% of companies it was IT that was responsible for data quality. That clearly illustrates that major swathes of the organisation (which we can call “the business”) are now taking the reins far more than ever.

    What I think is important is to make the distinction between the types of data quality.

    Most data quality activities are, let’s be honest, data processing with bells on. In this case it makes far more sense for the IT team, who are often embedded within the business as you point out, to take control, implement the software, ramp up training, perform the ongoing maintenance etc.

    We then have the process-oriented data quality, change management initiatives and process re-engineering programmes which clearly don’t always fit in IT, nor should they as they are primarily about business change, technology can support but these should be business driven.

    Finally, I think DQ products are finally becoming easy enough for true business users to manage so I think the lines are going to be blurred even more.

    • Thanks for the feedback Dylan,

      I agree that there is more than one component to data quality (and you can also add data integrity and data consistency to the list) and that responsibility may sit in different places. I think we agree that all will require IT support and some will require IT leadership.

      My suggestion is that saying “the business is responsible for their own data” does not become IT reneging on their natural responsibilities.


  5. Peter

    I have enjoyed your eloquence and fully support you in promoting discussion on this topic. A few thoughts from me:-
    You discuss the roles of IT and non-IT in data quality ownership and point out that IT typically have the system knowledge, data manipulation skills and personal characteristics to fix the issues and should therefore have a pivotal, proactive role in leading DQ improvement. You also acknowledge this work requires business sponsorship and collaboration, other contributors speak of business owning the process bits etc. All good, no argument here but nothing radically new. Most companies would agree with this whilst also having DQ “issues”. Let’s be honest here, most companies can’t manage their own employee data nevermind their customers!
    Perhaps this is because fundamentally not much has changed with database data management. IT still provide the container which the business then fill. GIGO is still the joker card played by IT.
    So how do we start?
    My top 6 suggestions as a strawman for debate (the order of importance will vary with company):-
    1) Data dictionary. This is exactly between IT and the business because neither can solely complete. IT should construct it (on intranet) and populate the field names, data types etc but the business need to assist with business meaning. Let’ s have a column for owner.
    2) Reference data management. Too big for here and contextual but critical to DQ.
    3) Conforming Dimensions. Sometimes different versions of the truth are actually required in which case a definition of each must exist in the same document. (I mean IT should not try and impose a single definition of a term on the business).
    4) Precision over accuracy. This is where BI can really help.
    5) Tolerancing!!! Fully understood by engineers and pretty much ignored by IT. Ever seen a data dictionary with a tolerance column?
    6) Natural keys. Yes, I know why automatically generated sequential system keys are used but that does not excuse the table not having constraints to enforce the natural key. Whilst i’m there, lets make sure the natural key is defined in the data dictionary. (ie business meaning of a row in each database table).
    The above 6 points are all very well but will go nowhere without a sponsor. The sponsor needs to be someone with enough gravitas who can square upto the IT director and the Finance Director as they are ongoing BAU maintenance costs which are very very hard to business case as benefits can’t be easily quantified. I also suggest this person can not be in IT as their suggestions (eg adding constraints) would not be prioritised.
    Comments welcome!

    Jon Moore

    • Hi John,

      Some great ideas as ever. I’ll pen a more considered approach soon, but for now I have sort of built something related to tolerancing into BI.

      Specifically this was in using prior seasonal trends (within various combinations of dimensions) to provide a seasonally adjusted current year budget view. The standard deviation of the prior years’ seasonality (speaking rather loosely here I realise) was used to provide a “health check” on this year’s seasonality. If things were pretty stable in the past (and there was enough data) then the predicted seasonality got a “green”, if there were wild year-on-year gyrations historically (or not a lot of data from which to draw conclusions), the predicted seasonality got a “red”. “Amber” marked the middle ground.

      We used these colours in the cubes, so as you sliced an diced (and by definition got to smaller data sets) the uncertainty rose and the “reds” began to proliferate.

      Not 100% what you are suggesting, but I would submit related to it.


This site uses Akismet to reduce spam. Learn how your comment data is processed.