One of an occasional series  highlighting the genius of Randall Munroe. Randall is a prominent member of the international data community and apparently also writes some sort of web-comic as a side line .
In both cases I was highlighting that data-centric work is sometimes more like archaeology than the frequently employed metaphor of construction. To paraphrase myself, you never know what you will find until you start digging. The image suggested the unfortunate results of not making this distinction when approaching data projects.
So, perhaps I am arguing for less Data Architects and more Data Archaeologists; the whip and fedora are optional of course!
Well not that occasional as, to date, the list extends to:
IT people are familiar with a number of metaphors for their projects. The most typical relates to building; IT projects are compared to erecting a skyscraper. The IT literature is suffused with language derived from this metaphor. We build systems. We develop blueprints for them. We design architectures (two-for-one there). This analogy has some strength and there are indeed superficial similarities between the two areas. However, as with most metaphors, if over-extended their applicability often breaks down. I recall one CEO in particular who was obsessed by the “building team” moving on to the next “site”; regardless of the current one requiring further work and dedicated maintenance. One of his predecessors often referred to wanting a “diesel submarine” built, as opposed to a “nuclear one”. Before I fall into the same trap of over-exploiting the metaphor, let’s move hurriedly on.
As I mention above, aside from the occasional misapplication, the building analogy works reasonably well for many IT projects; does it also work for business intelligence? I think that there are some problems in applying the metaphor. Building tends to follow a waterfall project plan (as do many IT projects). Of course there may be some iterations, even many of them, but the idea is that the project is made up of discrete base-level tasks whose duration can be estimated with a degree of accuracy. Examples of such a task might include writing a functional specification, developing a specific code module, or performing integration testing between two sub-systems. Adding up all the base-level tasks and the rates of the people involved gets you a cost estimate. Working out the dependencies between the base-level tasks gets you an overall duration estimate.
The problem with BI projects is that some of the base-level tasks are a bit different. An example might be: develop an understanding of a legacy data table, how it relates to other legacy data sets and to more modern systems (this sits under area two of my model of BI development – see BI implementations are like icebergs). This is not an exercise that is very easy to estimate in advance. Indeed it may not be possible to produce an adequate estimate until a substantial amount of work has been done. Even at a late stage in the task, something may be discovered which expands the work required dramatically; surprises may lurk round every corner.
Why is this? Well with legacy data, the people who developed the system may have done so many years ago. Since then, they may have left the company or moved on to other areas, taking their knowledge with them. Their place may have been taken by successive tranches of new staff. Perhaps poor initial documentation meant that later workers did not fully understand the full nature of the system, but nevertheless did their best to build upon it. Perhaps the documentation was good at first, but has not been kept up-to-date and now describes a system that no longer exists.
By definition, legacy systems will have been around for some time and layers of changes will have accumulated on top of each other. Maybe, as a company has expanded, new data has been interfaced to the system from different business units and territories; perhaps each of these cases has its own dedicated interface code, each subtly different from those of other systems. Even where people exist in an organisation who preserve an “oral tradition” about the system, handed down to them over generations; these people may not appreciate how their data interacts with other data – even if the person who looks after another legacy system sits in the adjacent cubicle.
Although these challenges can also occur when trying to understand the data in more modern systems, they are particularly acute with older ones. For a start, the people who designed these systems are more likely to be around. Also legacy systems often sit at the centre of a Byzantine web of inter-connections, batch-processes, over-night jobs and the occasional more modern service. It can be a real mess and this is a situation with which any data analyst with a reasonable amount of experience will be very familiar.
The difficulty of estimating the duration of tasks such as properly analysing legacy data makes overall estimation of BI projects more of an art than a science. Of course techniques such as time-boxing tasks can be applied, but these are not always 100% appropriate. A 75% analysed data source (even assuming that the estimate that only 25% work is left is accurate) is not an analysed data source. Leaving dark corners of knowledge is likely to be reflected in BI cubes and reports that do not reconcile back to their sources. Probably the best way to deal with this problem is to be extremely open about the challenges up-front with executive sponsors and when submitting estimates. It helps to also stress the level of uncertainty in progress reports. The more honest you are initially, the better you will be able to explain any overruns and the more likely it is that you will be believed.
These types of issues mean that the – hopefully more orderly – process of constructing a building is not a fully accurate way to describe a BI project. That is unless the metaphor is extended to include an occurance that is all too common during construction in The City of London. Given the age of Londinium, whenever ground is broken on a new project, it is more likely than not that a mediaeval, Anglo-Saxon or Roman site is unearthed (often all three). These finds, while of enormous interest to academics, can result in projects being put on hold (sometimes for years) while the dig is fully assessed, artefacts are carefully removed and catalogued by experts and so on. Sometimes the remains are of such importance that a structure preserving and protecting them (and even allowing public viewing) has to be made part of the design of the foundations of new building. Many office blocks in The City have such viewing galleries in their basements. Such eventualities can create massive and unexpected overruns in central London building projects.
So this leads me to suggest a different metaphor for BI projects. Major elements of them are much more like archaeological digs than traditional building. The extent and importance of a dig is very difficult to ascertain before work starts and both may change during the course of a project. It is not atypical that an older site is discovered underneath an initial dig, doubling the amount of work required.
So, my belief is that BI professionals should not be likened to architects or structural engineers. Instead the epithet of archaeologist is much more appropriate. And if the fedora fits, wear it!
Apologies for the initial typo’ in the article heading, which persists in the perma-link. Of course I have the odd error scattered around the place, but in a heading! Maybe I need to employ a proof-reader.