Trouble at the top

18 April 2011

IRM MDM/DG

Several weeks back now, I presented at IRM’s collocated European Master Data Management Summit and Data Governance Conference. This was my second IRM event, having also spoken at their European Data Warehouse and Business Intelligence Conference back in 2010. The conference was impeccably arranged and the range of speakers was both impressive and interesting. However, as always happens to me, my ability to attend meetings was curtailed by both work commitments and my own preparations. One of these years I will go to all the days of a seminar and listen to a wider variety of speakers.

Anyway, my talk – entitled Making Business Intelligence an Integral part of your Data Quality Programme – was based on themes I had introduced in Using BI to drive improvements in data quality and developed in Who should be accountable for data quality?. It centred on the four-pillar framework that I introduced in the latter article (yes I do have a fetish for four-pillar frameworks as per):

The four pillars of improved data quality

Given my lack of exposure to the event as a whole, I will restrict myself to writing about a comment that came up in the question section of my slot. As per my article on presenting in public, I try to always allow time at the end for questions as this can often be the most interesting part of the talk; for delegates and for me. My IRM slot was 45 minutes this time round, so I turned things over to the audience after speaking for half-an-hour.

There were a number of good questions and I did my best to answer them, based on past experience of both what had worked and what had been less successful. However, one comment stuck in my mind. For obvious reasons, I will not identify either the delegate, or the organisation that she worked for; but I also had a brief follow-up conversation with her afterwards.

She explained that her organisation had in place a formal data governance process and that a lot of time and effort had been put into communicating with the people who actually entered data. In common with my first pillar, this had focused on educating people as to the importance of data quality and how this fed into the organisation’s objectives; a textbook example of how to do things, on which the lady in question should be congratulated. However, she also faced an issue; one that is probably more common than any of us information professionals would care to admit. Her problem was not at the bottom, or in the middle of her organisation, but at the top.

So how many miles per gallon do you get out of that?

In particular, though data governance and a thorough and consistent approach to both the entry of data and transformation of this to information were all embedded into the organisation; this did not prevent the leaders of each division having their own people take the resulting information, load it into Excel and “improve” it by “adjusting anomalies”, “smoothing out variations”, “allowing for the impact of exceptional items”, “better reflecting the opinions of field operatives” and the whole panoply of euphemisms for changing figures so that they tell a more convenient story.

In one sense this was rather depressing, someone having got so much right, but still facing challenges. However, it also chimes with another theme that I have stressed many times under the banner of cultural transformation; it is crucially important than any information initiative either has, or works assiduously to establish, the active support of all echelons of the organisation. In some of my most successful BI/DW work, I have had the benefit of the direct support of the CEO. Equally, it is is very important to ensure that the highest levels of your organisation buy in before commencing on a stepped-change to its information capabilities.

I am way overdue employing another sporting analogy - odd however how must of my rugby-related ones tend to be non-explicit

My experience is that enhanced information can have enormous payback. But it is risky to embark on an information programme without this being explicitly recognised by the senior management team. If you avoid laying this important foundation, then this is simply storing up trouble for the future. The best BI/DW projects are totally aligned with the strategic goals of the organisation. Given this, explaining their objectives and soliciting executive support should be all the easier. This is something that I would encourage my fellow information professionals to seek without exception.
 


How to use your BI Tool to Highlight Deficiencies in Data

28 January 2011

My interview with Microsoft’s Bruno Aziza (@brunoaziza), which I trailed in Another social media-inspired meeting, was published today on his interesting and entertaining bizintelligence.tv site.

You can take a look at the canonical version here and the YouTube version appears below:

The interview touches on themes that I have discussed in:

 


Thanks to Jim Harris’ OCDQ Blog

12 January 2011

I would like to start 2011 by thanking Jim Harris for selecting one of my articles – Who should be accountable for data quality? – as a Best Data Quality Blog Post Of 2010 on his Obsessive Compulsive Data Quality blog.

I would recommend Jim’s excellent site as a great repository for current thinking and best practise in this crucial area.
 

tweet this Tweet this article on twitter.com
Bookmark this article with:
| Facebook | del.icio.us | digg | Reddit | Stumble

 


The Business Intelligence / Data Quality symbiosis

22 March 2010

The possible product of endosymbiosis of proteobacteria and eukaryots

As well as sounding like the title of an episode of The Big Bang Theory, the above phrase is one I just used when commenting on an article from the Data and Process Advantage Blog.

I rather like it and think it encapsulates the points that I have tried to make in my earlier post, Using BI to drive improvements in data quality.
 


 
I’m not sure whether Google evidence would stand up in court, but I may have coined a new phrase here:

Search google.com for “Business Intelligence Data Quality symbiosis”
 

tweet this Tweet this article on twitter.com
Bookmark this article with:
| Facebook | del.icio.us | digg | Reddit | Stumble

 


Who should be accountable for data quality?

7 March 2010

The cardinality of a countable set - ex-mathematicians are allowed the occasional pun

linkedin CIO Magazine CIO Magazine forum

Asking the wrong question

Once more this post is inspired by a conversation on LinkedIn.com, this time the CIO Magazine forum and a thread entitled BI tool[s] can not deliver the expected results unless the company focuses on quality of data posted by Caroline Smith (normal caveat: you must be a member of LinkedIn.com and the group to view the actual thread).

The discussion included the predictable references to GIGO, but conversation then moved on to who has responsibility for data quality, IT or the business.

My view on how IT and The Business should be aligned

As regular readers of this column will know, I view this as an unhelpful distinction. My belief is that IT is a type of business department, with specific skills, but engaged in business work and, in this, essentially no different to say the sales department or the strategy department. Looking at the question through this prism, it becomes tautological. However, if we ignore my peccadillo about this issue, we could instead ask whether responsibility for data quality should reside in IT or not-IT (I will manfully resist the temptation to write ~IT or indeed IT’); with such a change, I accept that this is now a reasonable question.
 
 
Answering a modified version of the question

In information technology, telecommunications, and related fields, handshaking is an automated process of negotiation that dynamically sets parameters of a communications channel established between two entities before normal communication over the channel begins. It follows the physical establishment of the channel and precedes normal information transfer.

My basic answer is that both groups will bring specific skills to the party and a partnership approach is the one that is most likely to end in success. There are however some strong arguments for IT playing a pivotal role and my aim is to expand on these in the rest of this article.

The four pillars of improved data quality

Before I enumerate these, one thing that I think is very important is that data quality is seen as a broad issue that requires a broad approach to remedy it. I laid out what I see as the four pillars of improving data quality in an earlier post: Using BI to drive improvements in data quality. This previous article goes into much more detail about the elements of a successful data quality improvement programme and its title provides a big clue as to what I see as the fourth pillar. More on this later.
 
 
1. The change management angle

Again, as with virtually all IT projects, the aim of a data quality initiative is to drive different behaviours. This means that change management skills are just as important in these types projects as in the business intelligence work that they complement. This is a factor to consider when taking decisions about who takes the lead in looking to improve data quality; who amongst the available resources have established and honed change management skills? The best IT departments will have a number of individuals who fit this bill, if not-IT has them as well, then the organisation is spoilt for choice.
 
 
2. The pan-organisational angle

Elsewhere I have argued that BI adds greatest value when it is all-pervasive. The same observations apply to data quality. If we assume that an organisation has a number of divisions, each with their own systems (due to the nature of their business and maybe also history), but also maybe sharing some enterprise applications. While it would undeniably be beneficial for Division A to get their customer files in order, it would be of even greater value if all divisions did this at the same time and with a consistent purpose. This would allow the dealings of Customer X across all parts of the business to be calculated and analysed. It could also drive cross-selling opportunities in particular market segments.

While it is likely that a number of corporate staff of different sorts will have a very good understanding about the high-level operations of each of the divisions, it is at least probable that only IT staff (specifically those engaged in collating detailed data from each division for BI purposes) will have an in-depth understanding of how transactions and master data are stored in different ways across the enterprise. This knowledge is a by-product of running a best practice BI project and the collateral intellectual property built up can be of substantial business value.
 
 
3. The BI angle

It was this area that formed the backbone of the earlier data quality article that I referenced above. My thesis was that you could turn the good data quality => good BI relationship on its head and use the BI tool to drive data quality improvements. The key here was not to sanitise data problems, but instead to expose them, also leveraging standard BI functionality like drill through to allow people to identify what was causing an issue.

One of the most pernicious data quality issues is of the valid, but wrong entry. For example a transaction is allocated a category code of X, which is valid, but the business event demands the value Y. Sometimes it is possible to guard against this eventuality by business rules, e.g. Product A can only be sold by Business Unit W, but this will not be possible for all such data. A variant of this issue is data being entered in the wrong field. Having spent a while in the Insurance industry, it was not atypical for a policy number to be entered as a claim value for example. Sometimes there is no easy systematic way to detect this type of occurrence, but exposing issues in a well-designed BI system is one way of noticing odd figures and then – crucially – being able to determine what is causing them.
 
 
4. The IT character angle

I was searching round for a way to put this nicely and then realised that Jim Harris had done the job for me in naming his excellent Obsessive-Compulsive Data Quality blog (OCDQ Blog). I’m an IT person, I may have general management experience and a reasonable understanding of many parts of business, but I remain essentially an IT person. Before that, I was a Mathematician. People in both of those lines of work tend to have a certain reputation; to put it positively, the ability to focus extremely hard on something for long periods is a common characteristic.

  Aside: for the avoidance of doubt, as I pointed out in Pigeonholing – A tragedy, the fact that someone is good at the details does not necessarily preclude them from also excelling at seeing the big picture – in fact without a grasp on the details the danger of painting a Daliesque big picture is perhaps all too real!  

Improving data quality is one of the areas where this personality trait pays dividends. I’m sure that there are some marketing people out there who have relentless attention to detail and whose middle name is “thoroughness”, however I suspect there are rather less of them than among the ranks of my IT colleagues. While leadership from the pertinent parts of not-IT is very important, a lot of the hard yards are going to be done by IT people; therefore it makes sense if they have a degree of accountability in this area.
 
 
In closing

Much like most business projects, improving data quality is going to require a cross-functional approach to achieve its goals. While you often hear the platitudinous statement that “the business must be responsible for the quality of its own data”, this ostensible truism hides the fact that one of the best ways for not-IT to improve the quality of an organisation’s data is to get IT heavily involved in all aspects of this work.

IT for its part can leverage both its role as one of the supra-business unit departments and its knowledge of how business transactions are recorded and move from one system to another to become an effective champion of data quality.
 

tweet this Tweet this article on twitter.com
Bookmark this article with:
| Facebook | del.icio.us | digg | Reddit | Stumble

 


Accuracy

20 July 2009

Micropipette

As might be inferred from my last post, certain sporting matters have been on my mind of late. However, as is becoming rather a theme on this blog, these have also generated some business-related thoughts.
 
 
Introduction

On Friday evening, the Australian cricket team finished the second day of the second Test Match on a score of 152 runs for the loss of 8 (out of 10) first innings wickets. This was still 269 runs behind the England team‘s total of 425.

In scanning what I realise must have been a hastily assembled end-of-day report on the web-site of one of the UK’s leading quality newspapers, a couple are glaring errors stood out. First, the Australian number 4 batsman Michael Hussey was described as having “played-on” to a delivery from England’s shy-and-retiring Andrew Flintoff. Second, the journalist wrote that Australia’s number six batsman, Marcus North, had been “clean-bowled” by James Anderson.

I appreciate that not all readers of this blog will be cricket aficionados and also that the mysteries of this most complex of games are unlikely to be made plain by a few brief words from me. However, “played on” means that the ball has hit the batsman’s bat and deflected to break his wicket (or her wicket – as I feel I should mention as a staunch supporter of the all-conquering England Women’s team, a group that I ended up meeting at a motorway service station just recently).

By contrast, “clean-bowled” means that the ball broke the batsman’s wicket without hitting anything else. If you are interested in learning more about the arcane rules of cricket (and let’s face it, how could you not be interested) then I suggest taking a quick look here. The reason for me bothering to go into this level of detail is that, having watched the two dismissals live myself, I immediately thought that the journalist was wrong in both cases.

It may be argued that the camera sometimes lies, but the cricinfo.com caption (whence these images are drawn) hardly ever does. The following two photographs show what actually happened:

Michael Hussey leaves one and is bowled, England v Australia, 2nd Test, Lord's, 2nd day, July 17, 2009

Michael Hussey leaves one and is bowled, England v Australia, 2nd Test, Lord's, 2nd day, July 17, 2009

Marcus North drags James Anderson into his stumps, England v Australia, 2nd Test, Lord's, 2nd day, July 17, 2009

Marcus North drags James Anderson into his stumps, England v Australia, 2nd Test, Lord's, 2nd day, July 17, 2009

As hopefully many readers will be able to ascertain, Hussey raised his bat aloft, a defensive technique employed to avoid edging the ball to surrounding fielders, but misjudged its direction. It would be hard to “play on” from a position such as he adopted. The ball arced in towards him and clipped the top of his wicket. So, in fact he was the one who was “clean-bowled”; a dismissal that was qualified by him having not attempted to play a stroke.

North on the other hand had been at the wicket for some time and had already faced 13 balls without scoring. Perhaps in frustration at this, he played an overly-ambitious attacking shot (one not a million miles from a baseball swing), the ball hit the under-edge of his horizontal bat and deflected down into his wicket. So it was North, not Hussey, who “played on” on this occasion.

So, aside from saying that Hussey had been adjudged out “handled the ball” and North dismissed “obstructed the field” (two of the ten ways in which a batsman’s innings can end – see here for a full explanation), the journalist in question could not have been more wrong.

As I said, the piece was no doubt composed quickly in order to “go to press” shortly after play had stopped for the day. Maybe these are minor slips, but surely the core competency of a sports journalist is to record what happened accurately. If they can bring insights and colour to their writing, so much the better, but at a minimum they should be able to provide a correct description of events.

Everyone makes mistakes. Most of my blog articles contain at least one typographical or grammatical error. Some of them may include errors of fact, though I do my best to avoid these. Where I offer my opinions, it is possible that some of these may be erroneous, or that they may not apply in different situations. However, we tend to expect professionals in certain fields to be held to a higher standard.

Auditors

For a molecular biologist, the difference between a 0.20 micro-molar solution and a 0.19 one may be massive. For a team of experimental physicists, unbelievably small quantities may mean the difference between confirming the existence of the Higgs Boson and just some background noise.

In business, it would be unfortunate (to say the least) if auditors overlooked major assets or liabilities. One would expect that law-enforcement agents did not perjure themselves in court. Equally politicians should never dissemble, prevaricate or mislead. OK, maybe I am a little off track with the last one. But surely it is not unreasonable to expect that a cricket journalist should accurately record how a batsman got out.
 
 
Twitter and Truth

twitter.com

I made something of a leap from these sporting events to the more tragic news of Michael Jackson’s recent demise. I recall first “hearing” rumours of this on twitter.com. At this point, no news sites had much to say about the matter. As the evening progressed, the self-styled celebrity gossip site TMZ was the first to announce Jackson’s death. Other news outlets either said “Jackson taken to hospital” or (perhaps hedging their bets) “US web-site reports Jackson dead”.

By this time the twitterverse was experiencing a cosmic storm of tweets about the “fact” of Jackson’s passing. A comparably large number of comments lamented how slow “old media” was to acknowledge this “fact”. Eventually of course the dinosaurs of traditional news and reporting lumbered to the same conclusion as the more agile mammals of Twitter.

In this case social media was proved to be both quick and accurate, so why am I now going to offer a defence of the world’s news organisations? Well I’ll start with a passage from one of my all-time favourite satires, Yes Minister, together with its sequel Yes Prime Minister.

In the following brief excerpt Sir Geoffrey Hastings (the head of MI5, the British domestic intelligence service) is speaking to The Right Honourable James Hacker (the British Prime Minister). Their topic of conversation is the recently revealed news that a senior British Civil Servant had in fact been a Russian spy:

Yes Prime Minister

Hastings: Things might get out. We don’t want any more irresponsible ill-informed press speculation.
Hacker: Even if it’s accurate?
Hastings: Especially if it’s accurate. There is nothing worse than accurate irresponsible ill-informed press speculation.

Yes Prime Minister, Vol. I by J. Lynn and A. Jay

Was the twitter noise about Jackson’s death simply accurate ill-informed speculation? It is difficult to ask this question as, sadly, the tweets (and TMZ) proved to be correct. However, before we garland new media with too many wreaths, it is perhaps salutary to recall that there was a second rumour of a celebrity death circulating in the febrile atmosphere of Twitter on that day. As far as I am aware, Pittsburgh’s finest – Jeff Goldblum – is alive and well as we speak. Rumours of his death (in an accident on a New Zealand movie set) proved to be greatly exaggerated.

The difference between a reputable news outlet and hordes of twitterers is that the former has a reputation to defend. While the average tweep will simply shrug their shoulders at RTing what they later learn is inaccurate information, misrepresenting the facts is a cardinal sin for the best news organisations. Indeed reputation is the main thing that news outlets have going for them. This inevitably includes annoying and time-consuming things such as checking facts and validating sources before you publish.

With due respect to Mr Jackson, an even more tragic set of events also sparked some similar discussions; the aftermath of the Iranian election. The Economist published an interesting artilce comparing old and new media responses to this entitiled: Twitter 1, CNN 0. Their final comments on this area were:

[...]the much-ballyhooed Twitter swiftly degraded into pointlessness. By deluging threads like Iranelection with cries of support for the protesters, Americans and Britons rendered the site almost useless as a source of information—something that Iran’s government had tried and failed to do. Even at its best the site gave a partial, one-sided view of events. Both Twitter and YouTube are hobbled as sources of news by their clumsy search engines.

Much more impressive were the desk-bound bloggers. Nico Pitney of the Huffington Post, Andrew Sullivan of the Atlantic and Robert Mackey of the New York Times waded into a morass of information and pulled out the most useful bits. Their websites turned into a mish-mash of tweets, psephological studies, videos and links to newspaper and television reports. It was not pretty, and some of it turned out to be inaccurate. But it was by far the most comprehensive coverage available in English. The winner of the Iranian protests was neither old media nor new media, but a hybrid of the two.

Aside from the IT person in me noticing the opportunity to increase the value of Twitter via improved text analytics (see my earlier article, Literary calculus?), these types of issues raise concerns in my mind. To balance this slightly negative perspective it is worth noting that both accurate and informed tweets have preceded several business events, notably the recent closure of BI start-up LucidEra.

Also main stream media seem to have swallowed the line that Google has developed its own operating system in Chrome OS (rather than lashing the pre-existing Linux kernel on to its browser); maybe it just makes a better story. Blogs and Twitter were far more incisive in their commentary about this development.

Considering the pros and cons, on balance the author remains something of a doubting Thomas (by name as well as nature) about placing too much reliance on Twitter for news; at least as yet.
 
 
Accuracy an Business Intelligence

A balancing act

Some business thoughts leaked into the final paragraph of the Introduction above, but I am interested more in the concept of accuracy as it pertains to one of my core areas of competence – business intelligence. Here there are different views expressed. Some authorities feel that the most important thing in BI is to be quick with information that is good-enough; the time taken to achieve undue precision being the enemy of crisp decision-making. Others insist that small changes can tip finely-balanced decisions one way or another and so precision is paramount. In a way that is undoubtedly familiar to regular readers, I straddle these two opinions. With my dislike for hard-and-fast recipes for success, I feel that circumstances should generally dictate the approach.

There are of course different types of accuracy. There is that which insists that business information reflects actual business events (often more a case for work in front-end business systems rather than BI). There is also that which dictates that BI systems reconcile to the penny to perhaps less functional, but pre-existing scorecards (e.g. the financial results of an organisation).

A number of things can impact accuracy, including, but not limited to: how data has been entered into systems; how that data is transformed by interfaces; differences between terminology and calculation methods in different data sources; misunderstandings by IT people about the meaning of business data; errors in the extract transform and load logic that builds BI solutions; and sometimes even the decisions about how information is portrayed in BI tools themselves. I cover some of these in my previous piece Using BI to drive improvements in data quality.

However, one thing that I think differentiates enterprise BI from departmental BI (or indeed predictive models or other types of analytics), is a greater emphasis on accuracy. If enterprise BI is to aspire to becoming the single version of the truth for an organisation, then much more emphasis needs to be placed on accuracy. For information that is intended to be the yardstick by which a business is measured, good enough may fall short of the mark. This is particularly the case where a series of good enough solutions are merged together; the whole may be even less than the sum of its parts.

A focus on accuracy in BI also achieves something else. It stresses an aspiration to excellence in the BI team. Such aspirations tend to be positive for groups of people in business, just as they are for sporting teams. Not everyone who dreams of winning an Olympic gold medal will do so, but trying to make such dreams a reality generally leads to improved performance. If the central goal of BI is to improve corporate performance, then raising the bar for the BI team’s own performance is a great place to start and aiming for accuracy is a great way to move forward.
 


 
A final thought: England went on to beat Australia by precisely 115 runs in the second Test at Lord’s; the final result coming today at precisely 12:42 pm British Summer Time. The accuracy of England’s bowling was a major factor. Maybe there is something to learn here.
 

tweet this Tweet this article on twitter.com
Bookmark this article with:
Technorati | del.icio.us | digg | Reddit | Stumble

 


A list of potential DW/BI pitfalls – by someone who has clearly been there

17 February 2009

pitfall

Browsing through my WordPress Tag Surfer today (a really nice feature by the way), I came across an interesting list of problems that can occur in a data warehousing / business intelligence project, together with suggestions for managing these. A link appears below:

Eight Reasons why Data Warehouse and, subsequently, Business Intelligence efforts fail

The author, Raphael Klebanov, has clearly lived the data warehousing process and a lot of what he says chimes closely with my own experience.

Some of his themes around business engagement, the alignment of BI delivery with business needs and the importance of education are echoed throughout my own writing. This article is definitely worth a read in my opinion.
 


 
Yes I know the illustraion ages me.
 

tweet this Tweet this article on twitter.com
Bookmark this article with:
Technorati | del.icio.us | digg | Reddit | NewsVine

 


Using BI to drive improvements in data quality

11 February 2009
linkedin Business Intelligence Professionals

Introduction

It is often argued that good BI is dependent on good data quality. This is essentially a truism which it is hard to dispute. However, in this piece I will argue that it is helpful to look at this the other way round. My assertion is that good BI can have a major impact on driving improvements in data quality.
 
 
Food for thought from LinkedIn.com

Again this article is inspired by some discussions on a Linkedin.com group, this time Business Intelligence Professionals (as ever you need to be a member of both LinkedIn.com and the group to read the discussions). The specific thread asked about how to prove the source and quality of data that underpins BI and suggested that ISO 8000 could help.

I made what I hope are some pragmatic comments as follows:

My experience is that the best way to implement a data quality programme is to take a warts-and-all approach to your BI delivery. If data errors stand out like a sore thumb on senior management reports, then they tend to get fixed at source. If instead the BI layer massages away all such unpleasantness, then the errors persist. Having an “unknown” or, worse, “all others” category in a report is an abomination. If oranges show up in a report about apples, then this should not be swept under the carpet (apologies for the mixed metaphor).

Of course it is helpful to attack the beast in other ways: training for staff, extra validation in front-end systems, audit reports and so on; but I have found that nothing quite gets data fixed as quickly as the bad entries ending up on something the CEO sees.

Taking a more positive approach; if a report adds value, but has obvious data flaws, then it is clear to all that it is worth investing in fixing these. As data quality improves over time, the report becomes more valuable and we have established a virtuous circle. I have seen this happen many times and I think it is a great approach.

and, in response to a follow-up about exception reports:

Exception reports are an important tool, but I was referring to showing up bad data in actual reports (or cubes, or dashboards) read by executives.

So if product X should only really appear in department Y’s results, but has been miscoded, then erroneous product X entries should still be shown on department Z’s reports, rather than being suppressed in an “all others” or “unknown” line. That way wherever the problem is that led to its inclusion (user error, a lack of validation in a front-end system, a problem with one or more interface, etc.) can get fixed, as opposed to being ignored.

The same comments would apply to missing data (no product code), invalid data (a product code that doesn’t exist) or the lazy person’s approach to data (‘x’ being entered in a descriptive field rather than anything meaningful as the user just wants to get the transaction off their hands).

If someone senior enough wants these problems fixed, they tend to get fixed. If they are kept blissfully unaware, then the problem is perpetuated.

I thought that it was worth trying to lay out more explicitly what I think is the best strategy for improving data quality.
 
 
The four pillars of a data quality improvement programme

I have run a number of programmes specifically targeted at improving data quality that focussed on training and auditing progress. I have also delivered acclaimed BI systems that led to a measurable improvement in data quality. Experience has taught me that there are a number of elements that combine to improve the quality of data:

  1. Improve how the data is entered
  2. Make sure your interfaces aren’t the problem
  3. Check how the data is entered / interfaced
  4. Don’t suppress bad data in your BI

As with any strategy, it is ideal to have the support of all four pillars. However, I have seen greater and quicker improvements through the fourth element than with any of the others. I’ll now touch on each area briefly.

(if you are less interesting in my general thoughts on data quality and instead want to cut to the chase, then just click on the link to point 4. above)
 
 
1. Improve how the data is entered

Of course if there are no problems with how data is entered then (taking interface issues to one side) there should be no problems with the information that is generated from it. Problems with data entry can take many forms. Particularly where legacy systems are involved, it can sometimes be harder to get data right than it is to make a mistake. With more modern systems, one would hope that all fields are validated and that many only provide you with valid options (say in a drop down). However validating each field is only the start, entries that are valid may be nonsensical. A typical example here is with dates. It is unlikely that an order was placed in 1901 for example, or – maybe more typically – that an item was delivered before it was ordered. This leads us into the issue of combinations of fields.

Two valid entries may make no sense whatsoever in combination (e.g. a given product may not be sold in a given territory). Business rules around this area can be quite complex. Ideally, fields that limit the values of other fields should appear first. Drop downs on the later fields should then only show values which work in combination with the earlier ones. This speeds user entry as well as hopefully improving accuracy.

However what no system can achieve, no matter how good its validation, is ensuring that what is recorded 100% reflects the actual event. If some information is not always available but important if known, it is difficult to police that it is entered. If ‘x’ will suffice as a textual description of a business event, do not be surprised if it is used. If there is a default for a field, which will pass validation, then it is likely that a significant percentage of records will have this value. At the point of entry, these types of issues can be best addressed by training. This should emphasise to the people using the system what are the most important fields and why they are important.

Some errors and omissions can also be picked up in audit reports, which is the subject of the section 3. below. But valid data in one system can still be mangled before it gets to the next one and I will deal with this issue next.
 
 
2. Make sure your interfaces aren’t the problem

In many organisations the IT architecture is much more complex than a simple flow of data from a front-end system to BI and other reporting applications (e.g. Accounts). History often means that modern front-end systems wrap older (often mainframe-based) legacy systems that are too expensive to replace, or too embedded into the fabric of an organisation’s infrastructure. Also, there may be a number of different systems dealing with different parts of a business transaction. In Insurance, an industry in which I have worked for the last 12 years, the chain might look like this:

Simplified schematic of selected systems and interfaces within an Insurance company

Simplified schematic of selected systems and interfaces within an Insurance company

Of course two or more of the functions that I have shown separately may be supported in a single, integrated system, but it is not likely that all of them will be. Also the use of “System(s)” in the diagram is telling. It is not atypical for each line of business to have its own suite of systems, or for these to change from region to region. Hopefully the accounting system is an area of consistency, but this is not always the case. Even legacy systems may vary, but one of the reasons that interfaces are maintained to these is that they may be one place where data from disparate front-end systems is collated. I have used Insurance here as an example, but you could draw similar diagrams for companies of a reasonable size in most industries.

There are clearly many problems that can occur in such an architecture and simplifying diagrams like the one above has been an aim of many IT departments in recent years. What I want to focus on here is the potential impact on data quality.

Where (as is typical) there is no corporate lexicon defining the naming and validation of fields across all systems, then the same business data and business events will be recorded differently in different systems. This means that data not only has to be passed between systems but mappings have to be made. Often over time (and probably for reasons that were valid at every step along the way) these mappings can become Byzantine in their complexity. This leads to a lack of transparency between what goes in to one end of the “machine” and what comes out of the other. It also creates multiple vulnerabilities to data being corrupted or meaning being lost along the way.

Let’s consider System A which has direct entry of data and System B which receives data from System A by interface. If the team supporting System A have a “fire and forget” attitude to what happens to their data once it leaves their system, this is a recipe for trouble. Equally if the long-suffering and ill-appreciated members of System B’s team lovingly fix the problems they inherit from System A, then this is not getting to the heart of the problem. Also, if System B people lack knowledge of the type of business events supported in System A and essentially guess how to represent these, then this can be a large issue. Things that can help here include: making sure that there is ownership of data and that problems are addressed at source; trying to increase mutual understanding of systems across different teams; and judicious use of exception reports to check that interfaces are working. This final area is the subject of the next section.
 
 
3. Check how the data is entered / interfaced

Exception or audit reports can be a useful tool in picking up on problems with data entry (or indeed interfaces). However, they need to be part of a rigorous process of feedback if they are to lead to improvement (such feedback would go to the people entering data or those building interfaces as is appropriate). If exception reports are simply produced and circulated, it is unlikely that anything much will change. Their content needs to be analysed to identify trends in problems. These in turn need to drive training programmes and systems improvements.

At the end of the day, if problems persist with a particular user (or IT team), then this needs to be taken up with them in a firm manner, supported by their management. Data quality is either important to the organisation, or it is not. There is either a unified approach by all management, or we accept that our data quality will always be poor. In summary, there needs to be a joined-up approach to the policing of data and people need to be made accountable for their own actions in this area.
 
 
4. Don’t suppress bad data in your BI

I have spent some time covering the previous three pillars. In my career I have run data quality improvement programmes that have essentially relied solely on these three approaches. While I have generally had success operating in this way, progress has generally been slow and vast reserves of perseverance have been necessary.

More recently, in BI programmes I have led, improvements in data quality have been quicker, easier and almost just a by-product of the BI work itself. Why has this happened?

The key is to always highlight data quality problems in your BI. The desire can be to deliver a flawless BI product and data that is less than pristine can compromise this. However the temptation to sanitise bad data, to exclude it from reports, to incorporate it in an innocuous “all others” line, or to try to guess which category it really should sit in are all to be resisted. As I mention in my LinkedIn.com comments, while this may make your BI system appear less trustworthy (is it built on foundations of sand?) any other approach is guaranteeing that it actually is untrustworthy. If you stop and think about it, the very act of massaging bad source data in a BI system is suppressing the truth. Perhaps is it a lesser crime than doctoring your report and accounts, but it is not far short. You are giving senior management an erroneous impression of what is happening in the company, the precise opposite of what good BI should do.

So the “warts and all” approach is the right one to adopt ethically (if that does not sound too pretentious), but I would argue that it is the right approach practically as well. When data quality issues are evident on the reports and analysis tools that senior management use and when they reduce the value or credibility of these, then there is likely to be greater pressure applied to resolve them. If senior management are deprived of the opportunity to realise that there are problems, how are they meant to focus their attention on resolving them or to lend their public support to remedial efforts?

This is not just a nice theoretical argument. I have seen the quality of data dramatically improve in a matter of weeks when Executives become aware of how it impinges on the information they need to run a business. Of course a prerequisite for this is that senior management places value on the BI they receive and realise the importance of its accuracy. However, if you cannot rely on these two things in your organisation, then your BI project has greater challenges to face that the quality of data it is based upon.
 

tweet this Tweet this article on twitter.com
Bookmark this article with:
Technorati | del.icio.us | digg | Reddit | NewsVine

 


Follow

Get every new post delivered to your Inbox.

Join 3,427 other followers

%d bloggers like this: