The 23 Most Influential Business Intelligence Blogs

BI Software Insight

I was flattered to be included in the recent list of the 23 most influential BI bloggers published by BI Software Insight. To be 100% honest, I was also a little surprised as, due to other commitments, this blog has received very little of my attention in recent years. Taking a glass half full approach, maybe my content stands the test of time; it would be nice to think so.

BI Software Insight List

It was also good to be in the company of various members of the BI community whose work I respect and several of whom I have got to know on-line or in person. These include (as per the original article, in no particular order):

Blogger Blog
Augusto Albeghi Upstream Info
Bruno Aziza * His blog on Forbes
Howard Dresner Business Intelligence
Barney Finucane Business Intelligence Products and Trends
Marcus Borba Business Analytics News
Cindi Howson BI Scorecard

* You can see Bruno and me talking on Microsoft’s YouTube channel here.
 


 
BI Software Insight helps organizations make smarter purchasing decisions on Business Intelligence Software. Their team of experts helps organizations find the right BI solution with expert reviews, objective resource guides, and insights on the latest BI news and trends.
 

 

The Kindness of Strangers

Tennessee Williams
“Whoever you are, I have always depended on the kindness of strangers.” – A Street Car Named Desire by Tennessee Williams

It is so often stated that it has become a truism of sorts that on-line interactions, particularly those via social media, displace what is termed “real world” or “face to face” interactions. My view is that this perspective, rather than being self-evidently true, is actually apocryphal. I am sure that there are examples of people who have become more isolated (in a physical sense) through use of social media; those who are engaged in a zero-sum game where time spent on-line is at the expense of being around other humans. Most communications media can be accused of the same thing, though I am not aware that anyone ever told Jane Austen to stop wasting her time writing letters and instead get out and meet people. It wasn’t so long ago that people, particularly younger people, were berated for spending so much time on the ‘phone; even back when those were connected to a wall socket by a wire. The same barbs were thrown (and still are) at what we now call Video Games; another area which I admit has occupied a lot of my time in other periods of my life.

There is however a different way of looking at this supposed issue. As I explain in my now rather antiquated review of the Twitterverse:

I have been involved in running web-sites and various on-line communities since 1999.

[...]

I think that Twitter.com[1] can be an extremely useful way of interacting with people, expanding your network and coming into contact with interesting new people.

- Taken from New Adventures in Wi-Fi – Track 2: Twitter April 2010

I have indeed come in to contact with a wide range of different people through my, admittedly rather intermittent, use of what we now call social media. Importantly, a lot of these people are based in parts of the world, or even parts of my own country, where our paths would have been unlikely to cross. I suppose that a case could be made that any time I spend writing or reading blog articles, or talking to people on Twitter or LinkedIn, could instead have been more profitably employed sitting on a barstool; perhaps in the hope that someone with complementary interests would start talking to me. However, this does seem to be a doubtful assertion to make. As with most things in life (except chocolate of course) balance is the key. If you spend all of your time on social media (or indeed all of your time in bars) you will rule out some social experiences. If instead you spend some time on social media as part of a healthy, balanced diet, then this should lead to a wider range of associates and sometimes even friends. It is also a pretty frictionless way to find people who are passionate about the things that you are passionate about; or indeed to find out why people are passionate about areas that you think might be interesting.

I mention above that – despite the observations I make later in the same paragraph – my own use of social media has been sporadic[2]. Having made some progress in understanding some elements of the area in an earlier stage of its evolution, jumping back in as I am doing now can feel a little daunting. These fears have been somewhat ameliorated by reconnecting with a lot of people, who still seem interested in me and what I have to say[3]. I have also connected with some new people and acknowledging this second occurrence is the actual purpose of this article.
 
 
twelveskip

First, I’d like to offer thanks to Ontario-based Pauline Cabrera (@twelveskip) of twelveskip.com. Pauline describes herself thus on Twitter:

Savvy Digital Strategist / Blogger / Web Designer / Virtual Assistant (http://GeekyVA.com). I dig #SEO, blogging, social media & content marketing.

I found Pauline’s web-site when I was thinking about sprucing up my Twitter header and looking for some advice[4]. Pauline’s observations were clear and helpful, but while I get by OK in creating images (both in a business context and with many of the diagrams on this site), I am not a graphic designer. Given Pauline’s greater experience, I decided to reach out to her. The fruits of this interaction can now be viewed on my Twitter site, @peterjthomas.

Pauline and I reached a commercial arrangement, so I’m not here referring to the kindness of strangers always meaning doing stuff for free. However, while I am sure many other people provide the services that Pauline does, I’m equally confident that very few do it with such speed and professionalism. When you couple these attributes with her being ultra-friendly and displaying an evident delight in doing what she does, you end up with someone it is a pleasure to do business with.

I mentioned that Pauline resides in Canada, I live in the UK, we wouldn’t have bumped into each other without those modern inventions of the Internet, search engines, web-sites and (the subject of the search that allowed me to find Pauline) Twitter.
 
 
Michael Sandberg's Data Visualization Blog

Second, I recently composed an article with a Data Visualisation theme and as part of researching this looked at a number of blogs covering this area. One that stood out was Michael Sandberg’s Data Visualization Blog. Michael describes himself thus:

My main work-related areas of interest are in developing self-service interactive, dynamic reports for Web and Mobile (most notably iPad). I currently develop using MicroStrategy in the Cloud with Netezza.

Michael and I also share a mutual connection in Cindi Howson (@BIScorecard) of BI Scorecard. Despite this, I had not been aware of Michael’s work until recently. I did however connect with him via his web-site. Today he has been kind enough to feature the data visualisation piece I wrote on his blog. It is always gratifying when a fellow professional thinks that your work merits sharing with their network.

In this case, Michael is based in Arizona. The chances of us bumping in to each other, except though us both blogging, would have been slim as well.
 
 
Simon Barnes Author

The final person that I would like to mention is Simon Barnes, the award-winning sports and wildlife author and journalist. I based my recent blog article, Ten Million Aliens – More musings on BI-ology, on his book of a similar name. Aside from his articles for various newspapers being published on-line, Simon has not been noted for his social medial presence until recently. This has now been remedied via his blog Simon Barnes Author and Twitter account, @SimonBarnesWild; Simon has been using the former to showcase chapters from his book.

The kindness that I wanted to point out here is the diligence with which Simon responds to comments on his site. Of course, on a personal note, there is always a frisson of excitement when someone whose work you admire and who is also something of a public figure in the UK replies to you directly as Simon has to me. Politeness and consideration for others pre-date the Internet of course, but treating people reasonably gets you a long way in social media. As Simon seems to do this naturally, I am sure this characteristic will stand him in good stead.

I can’t claim that Simon lives a long way from me, his home in Norfolk is pretty adjacent to my current one in Cambridge. However, despite having read his articles for years, it was only once Simon established a web presence that the opportunity to correspond opened up.
 
 
So, in the couple of weeks during which I have dipped my toe back into the social media water, I have had the privilege to connect (in a number of different ways) with the three people that I mention above. Each of Pauline, Michael and Simon are on-line for different reasons and each have different things to say about very different areas. However, I am interested in what each of them does, as are many other people around the world. It’s hard to imagine an easier way in which I could have formed connections with these three people, one from Canada, one from the US and one from my native UK, than via the Internet and – in these cases – Twitter and Blogging. I think these are useful facts to remember in the face of accusations that social media makes people insular, closed-off and lonely. It may do that to some people, but this is a million miles away from my own experiences and – I strongly suspect – those of many of the people who are now able to access a wider world through their keyboards or touchscreens.
 
 
Notes

 
[1]
 
The “.com” was still in use back in 2010
 
[2]
 
This is something that I cover in another earlier article: Four [Social Media] Failures and a Success. The section describing the first failure (in this case a personal one) begins:

Failure 1 – Thinking that you can dip in and out of Social Media

Articles per month

 
[3]
 
Probably strongly correlated to me being interested in what they have to say of course.
 
[4]
 
I think that the actual search terms were the rather prosaic “twitter header dimensions“.

 

 

Scienceogram.org’s Infographic to celebrate the Phliae landing

Scienceogram.org Rosetta Infographic - Click to view the original in a new tab

© scienceogram.org 2014
Reproduced on this site under a Creative Commons licence
Original post may be viewed here

As a picture is said to paint a thousand words, I’ll (mostly) leave it to Scienceogram’s infographic to deliver the message.

However, The Center for Responsive Politics (I have no idea whether or not they have a political affiliation, they claim to be nonpartisan) estimates the cost of the recent US Congressional elections at around $3.67 bn (€2.93 bn). I found a lower (but still rather astonishing) figure of $1.34 bn (€1.07 bn) at the Federal Election Commission web-site, but suspect that this number excludes Political Action Committees and their like.

To make a European comparisson to a European space project, the Common Agriculture Policy cost €57.5 bn ($72.0 bn) in 2013 according to the BBC. Given that Rosetta’s costs were spread over nearly 20 years, it makes sense to move the decimal point rightwards one place in both the euro and dollar figures and then to double the resulting numbers before making comparisons (this is left as an exercise for the reader).

Of course I am well aware that a quick Google could easily produce figures (such as how many meals, or vaccinations, or so on you could get for €1.4 bn) making points that are entirely antipodal to the ones presented. At the end of the day we landed on a comet and will – fingers crossed – begin to understand more about the formation of the Solar System and potentially Life on Earth itself as a result. Whether or not you think that is good value for money probably depends mostly on what sort of person you are. As I relate in a previous article, infographics only get you so far.
 


 
Scienceogram provides précis [correct plural] of UK science spending, giving overviews of how investment in science compares to the size of the problems it’s seeking to solve.
 

 

Ten Million Aliens – More musings on BI-ology

Introduction

Ten Million Aliens by Simon Barnes

This article relates to the book Ten Million Aliens – A Journey Through the Entire Animal Kingdom by British journalist and author Simon Barnes, but is not specifically a book review. My actual review of this entertaining and informative work appears on Amazon and is as follows:

Having enjoyed Simon’s sport journalism (particularly his insightful and amusing commentary on Test Match cricket) for many years, I was interested to learn about this new book via his web-site. As an avid consumer of pop-science literature and already being aware of Simon’s considerable abilities as a writer, I was keen to read Ten Million Aliens. To be brief, I would recommend the book to anyone with an enquiring mind, an interest in the natural world and its endless variety, or just an affection for good science writing. My only sadness was that the number of phyla eventually had to come to an end. I laughed in places, I was better informed than before reading a chapter in others and the autobiographical anecdotes and other general commentary on the state of our stewardship of the planet added further dimensions. I look forward to Simon’s next book.

Instead this piece contains some general musings which came to mind while reading Ten Million Aliens and – as is customary – applies some of these to my own fields of professional endeavour.
 
 
Some Background

David Ivon Gower

Regular readers of this blog will be aware of my affection for Cricket[1] and also my interest in Science[2]. Simon Barnes’s work spans both of these passions. I became familiar with Simon’s journalism when he was Chief Sports Writer for The Times[3] an organ he wrote for over 32 years. Given my own sporting interests, I first read his articles specifically about Cricket and sometimes Rugby Union, but began to appreciate his writing in general and to consume his thoughts on many other sports.

There is something about Simon’s writing which I (and no doubt many others) find very engaging. He manages to be both insightful and amusing and displays both elegance of phrase and erudition without ever seeming to show off, or to descend into the overly-florid prose of which I can sometimes (OK often) be guilty. It also helps that we seem to share a favourite cricketer in the shape of David Gower, who appears above and was the most graceful bastman to have played for England in the last forty years. However, it is not Simon’s peerless sports writing that I am going to focus on here. For several years he also penned a wildlife column for The Times and is a patron of a number of wildlife charities. He has written books on, amongst other topics, birds, horses, his safari experiences and conservation in general.

Green Finch, Great Tit, Lesser Spotted Woodpecker, Tawny Owl, Magpie, Carrion Crow, Eurasian Jay, Jackdaw

My own interest in science merges into an appreciation of the natural world, perhaps partly also related to the amount of time I have spent in remote and wild places rock-climbing and bouldering. As I started to write this piece, some welcome November Cambridge sun threw shadows of the Green Finches and Great Tits on our feeders across the monitor. Earlier in the day, my wife and I managed to catch a Lesser Spotted Woodpecker, helping itself to our peanuts. Last night we stood on our balcony listening to two Tawny Owls serenading each other. Our favourite Corvidae family are also very common around here and we have had each of the birds appearing in the bottom row of the above image on our balcony at some point. My affection for living dinosaurs also extends to their cousins, the herpetiles, but that is perhaps a topic for another day.

Ten Million Aliens has the modest objectives, revealed by its sub-title, of saying something interesting about about each of the (at the last count) thirty-five phyla of the Animal Kingdom[4] and of providing some insights in to a few of the thousands of familes and species that make these up. Simon’s boundless enthusiasm for the life he sees around him (and indeed the life that is often hidden from all bar the most intrepid of researchers), his ability to bring even what might be viewed as ostensibly dull subject matter[5] to life and a seemingly limitless trove of pertinent personal anecdotes, all combine to ensure not only that he achieves these objectives, but that he does so with some élan.
 
 
Classifications and Hierarchies

Biological- Classification

Well having said that this article wasn’t going to be a book review, I guess it has borne a striking resemblance to one so far. Now to take a different tack; one which relates to three of the words that I referenced and provided links to in the last paragraph of the previous section: phylum, family and species. These are all levels in the general classification of life. At least one version of where these three levels fit into the overall scheme of things appears in the image above[6]. Some readers may even be able to recall a related mnemonic from years gone by: Kings Play Chess on Fine Green Sand[7].

The father of modern taxonomy, Carl Linnaeus, founded his original biological classification – not unreasonably – on the shared characteristics of organisms; things that look similar are probably related. Relations mean that like things can be collected together into groups and that the groups can be further consolidated into super-groups. This approach served science well for a long time. However when researchers began to find more and more examples of convergent evolution[8], Linnaeus’s rule of thumb was seen to not always apply and complementary approaches also began to be adopted.

Cladogram

One of these approaches, called Cladistics, focuses on common ancestors rather than shared physical characteristics. Breakthroughs in understanding the genetic code provided impetus to this technique. The above diagram, referred to as a cladogram, represents one school of thought about the relationship between avian dinosaurs, non-avian dinosaurs and various other reptiles that I mentioned above.

It is at this point that the Business Intelligence professional may begin to detect something somewhat familiar[9]. I am of course talking about both dimensions and organising these into hierarchies. Dimensions are the atoms of Business Intelligence and Data Warehousing[10]. In Biological Classification: H. sapiens is part of Homo , which is part of Hominidae, which is part of Primates, which is part of Mammalia, which is part of Chordata, which then gets us back up to Animalia[11]. In Business Intelligence: Individuals make up Teams, which make up Offices, which make up Countries and Regions.

Above I references different approaches to Biological Classification, one based on shared attributes, the other on homology of DNA. This also reminds me of the multiple ways to roll-up dimensions. To pick the most obvious, Day rolls up to Month, Quarter, Half-Year and Year; but also in a different manner to Week and then Year. Given that the aforementioned DNA evidence has caused a reappraisal of the connections between many groups of animals, the structures of Biological Classification are not rigid and instead can change over time[12]. Different approaches to grouping living organisms can provide a range of perspectives, each with its own benefits. In a similar way, good BI/DW design practices should account for both dimensions changing and the fact that different insights may well be provided by parallel dimension hierarchies.

In summary, I suppose what I am saying is that BI/DW practitioners, as well as studying the works of Inmon and Kimball, might want to consider expanding their horizons to include Barnes; to say nothing of Linnaeus[13]. They might find something instructive in these other taxonomical works.
 


 
Notes

 
[1]
 
Articles from this blog in which I intertwine Cricket and aspects of business, technology and change include (in chronological order):

 
[2]
 
Articles on this site which reference either Science or Mathematics are far too numerous to list in full. A short selection of the ones I enjoyed writing most would include (again in chronological order):

 
[3]
 
Or perhaps The London Times for non-British readers, despite the fact that it was the first newspaper to bear that name.
 
[4]
 
Here “Aninal Kingdom” is used in the taxonomical sense and refers to Animalia.
 
[5]
 
For an example of the transformation of initially unpromising material, perhaps check out the chapter of Ten Million Aliens devoted to Entoprocta.
 
[6]
 
With acknowledgment to The Font.
 
[7]
 
Though this elides both Domains and Johny-come-latelies like super-families, sub-genuses and hyper-orders [I may have made that last one up of course].
 
[8]
 
For example the wings of Pterosaurs, Birds and Bats.
 
[9]
 
No pun intended.
 
[10]
 
This metaphor becomes rather cumbersome when one tries to extend it to cover measures. It’s tempting to perhaps align these with fundamental forces, and thus bosons as opposed to combinations of fermions, but the analogy breaks down pretty quickly, so let’s conveniently forget that multidimensional data structures have fact tables at their hearts for now.
 
[11]
 
Here I am going to strive manfully to avoid getting embroiled in discussions about domains, superregnums, superkingdoms, empires, or regios and instead leave the interested reader to explore these areas themselves if the so desire. Ten Million Aliens itself could be one good starting point, as could the following link.
 
[12]
 
Science is yet to determine whether these slowly changing dimensions are of Type 1, 2, 3 or 4 (it has however been definitively established that they are not Type 6 / Hybrid).
 
[13]
 
Interesting fact of the day: Linnaeus’s seminal work included an entry for The Kraken, under Cephalopoda

 

 

Data Visualisation – A Scientific Treatment

Introduction

Diagram of the Causes of Mortality in the Army of the East (click to view a larger version in a new tab)

The above diagram was compiled by Florence Nightingale, who was – according to The Font – “a celebrated English social reformer and statistician, and the founder of modern nursing”. It is gratifying to see her less high-profile role as a number-cruncher acknowledged up-front and central; particularly as she died in 1910, eight years before women in the UK were first allowed to vote and eighteen before universal suffrage. This diagram is one of two which are generally cited in any article on Data Visualisation. The other is Charles Minard’s exhibit detailing the advance on, and retreat from, Moscow of Napoleon Bonaparte’s Grande Armée in 1812 (Data Visualisation had a military genesis in common with – amongst many other things – the internet). I’ll leave the reader to look at this second famous diagram if they want to; it’s just a click away.

While there are more elements of numeric information in Minard’s work (what we would now call measures), there is a differentiating point to be made about Nightingale’s diagram. This is that it was specifically produced to aid members of the British parliament in their understanding of conditions during the Crimean War (1853-56); particularly given that such non-specialists had struggled to understand traditional (and technical) statistical reports. Again, rather remarkably, we have here a scenario where the great and the good were listening to the opinions of someone who was barred from voting on the basis of lacking a Y chromosome. Perhaps more pertinently to this blog, this scenario relates to one of the objectives of modern-day Data Visualisation in business; namely explaining complex issues, which don’t leap off of a page of figures, to busy decision makers, some of whom may not be experts in the specific subject area (another is of course allowing the expert to discern less than obvious patterns in large or complex sets of data). Fortunately most business decision makers don’t have to grapple with the progression in number of “deaths from Preventible or Mitigable Zymotic diseases” versus ”deaths from wounds” over time, but the point remains.
 
 
Data Visualisation in one branch of Science

von Laue, Bragg Senior & Junior, Crowfoot Hodgkin, Kendrew, Perutz, Crick, Franklin, Watson & Wilkins

Coming much more up to date, I wanted to consider a modern example of Data Visualisation. As with Nightingale’s work, this is not business-focused, but contains some elements which should be pertinent to the professional considering the creation of diagrams in a business context. The specific area I will now consider is Structural Biology. For the incognoscenti (no advert for IBM intended!), this area of science is focussed on determining the three-dimensional shape of biologically relevant macro-molecules, most frequently proteins or protein complexes. The history of Structural Biology is intertwined with the development of X-ray crystallography by Max von Laue and father and son team William Henry and William Lawrence Bragg; its subsequent application to organic molecules by a host of pioneers including Dorothy Crowfoot Hodgkin, John Kendrew and Max Perutz; and – of greatest resonance to the general population – Francis Crick, Rosalind Franklin, James Watson and Maurice Wilkins’s joint determination of the structure of DNA in 1953.

photo-51

X-ray diffraction image of the double helix structure of the DNA molecule, taken 1952 by Raymond Gosling, commonly referred to as “Photo 51″, during work by Rosalind Franklin on the structure of DNA

While the masses of data gathered in modern X-ray crystallography needs computer software to extrapolate them to physical structures, things were more accessible in 1953. Indeed, it could be argued that Gosling and Franklin’s famous image, its characteristic “X” suggestive of two helices and thus driving Crick and Watson’s model building, is another notable example of Data Visualisation; at least in the sense of a picture (rather than numbers) suggesting some underlying truth. In this case, the production of Photo 51 led directly to the creation of the even more iconic image below (which was drawn by Francis Crick’s wife Odile and appeared in his and Watson’s seminal Nature paper[1]):

Odile and Francis Crick - structure of DNA

© Nature (1953)
Posted on this site under the non-commercial clause of the right-holder’s licence

It is probably fair to say that the visualisation of data which is displayed above has had something of an impact on humankind in the fifty years since it was first drawn.
 
 
Modern Structural Biology

The X-ray Free Electron Laser at Stanford

Today, X-ray crystallography is one of many tools available to the structural biologist with other approaches including Nuclear Magnetic Resonance Spectroscopy, Electron Microscopy and a range of biophysical techniques which I will not detain the reader by listing. The cutting edge is probably represented by the X-ray Free Electron Laser, a device originally created by repurposing the linear accelerators of the previous generation’s particle physicists. In general Structural Biology has historically sat at an intersection of Physics and Biology.

However, before trips to synchrotrons can be planned, the Structural Biologist often faces the prospect of stabilising their protein of interest, ensuring that they can generate sufficient quantities of it, successfully isolating the protein and finally generating crystals of appropriate quality. This process often consumes years, in some cases decades. As with most forms of human endeavour, there are few short-cuts and the outcome is at least loosely correlated to the amount of time and effort applied (though sadly with no guarantee that hard work will always be rewarded).
 
 
From the general to the specific

The Journal of Molecular Biology (October 2014)

At this point I should declare a personal interest, the example of Data Visualisation which I am going to consider is taken from a paper recently accepted by the Journal of Molecular Biology (JMB) and of which my wife is the first author[2]. Before looking at this exhibit, it’s worth a brief detour to provide some context.

In recent decades, the exponential growth in the breadth and depth of scientific knowledge (plus of course the velocity with which this can be disseminated), coupled with the increase in the range and complexity of techniques and equipment employed, has led to the emergence of specialists. In turn this means that, in a manner analogous to the early production lines, science has become a very collaborative activity; expert in stage one hands over the fruits of their labour to expert in stage two and so on. For this reason the typical scientific paper (and certainly those in Structural Biology) will have several authors, often spread across multiple laboratory groups and frequently in different countries. By way of example the previous paper my wife worked on had 16 authors (including a Nobel Laureate[3]). In this context, the fact the paper I will now reference was authored by just my wife and her group leader is noteworthy.

The reader may at this point be relieved to learn that I am not going to endeavour to explain the subject matter of my wife’s paper, nor the general area of biology to which it pertains (the interested are recommended to Google “membrane proteins” or “G Protein Coupled Receptors” as a starting point). Instead let’s take a look at one of the exhibits.

Click to view a larger version in a new tab

© The Journal of Molecular Biology (2014)
Posted on this site under a Creative Commons licence

The above diagram (in common with Nightingale’s much earlier one) attempts to show a connection between sets of data, rather than just the data itself. I’ll elide the scientific specifics here and focus on more general issues.

First the grey upper section with the darker blots on it – which is labelled (a) – is an image of a biological assay called a Western Blot (for the interested details can be viewed here); each vertical column (labelled at the top of the diagram) represents a sub-experiment on protein drawn from a specific sample of cells. The vertical position of a blot indicates the size of the molecules found within it (in kilodaltons); the intensity of a given blot indicates how much of the substance is present. Aside from the headings and labels, the upper part of the figure is a photographic image and so essentially analogue data[4]. So, in summary, this upper section represents the findings from one set of experiments.

At the bottom – and labelled (b) – appears an artefact familiar to anyone in business, a bar-graph. This presents results from a parallel experiment on samples of protein from the same cells (for the interested, this set of data relates to degree to which proteins in the samples bind to a specific radiolabelled ligand). The second set of data is taken from what I might refer to as a “counting machine” and is thus essentially digital. To be 100% clear, the bar chart is not a representation of the data in the upper part of the diagram, it pertains to results from a second experiment on the same samples. As indicated by the labelling, for a given sample, the column in the bar chart (b) is aligned with the column in the Western Blot above (a), connecting the two different sets of results.

Taken together the upper and lower sections[5] establish a relationship between the two sets of data. Again I’ll skip on the specifics, but the general point is that while the Western Blot (a) and the binding assay (b) tell us the same story, the Western Blot is a much more straightforward and speedy procedure. The relationship that the paper establishes means that just the Western Blot can be used to perform a simple new assay which will save significant time and effort for people engaged in the determination of the structures of membrane proteins; a valuable new insight. Clearly the relationships that have been inferred could equally have been presented in a tabular form instead and be just as relevant. It is however testament to the more atavistic side of humans that – in common with many relationships between data – a picture says it more surely and (to mix a metaphor) more viscerally. This is the essence of Data Visualisation.
 
 
What learnings can Scientific Data Visualisation provide to Business?

Scientific presentation (c/o Nature, but looks a lot like PhD Comics IMO)

Using the JMB exhibit above, I wanted to now make some more general observations and consider a few questions which arise out of comparing scientific and business approaches to Data Visualisation. I think that many of these points are pertinent to analysis in general.

Normalisation

Broadly, normalisation[6] consists of defining results in relation to some established yardstick (or set of yardsticks); displaying relative, as opposed to absolute, numbers. In the JMB exhibit above, the amount of protein solubilised in various detergents is shown with reference to the un-solubilised amount found in native membranes; these reference figures appear as 100% columns to the right and left extremes of the diagram.

The most common usage of normalisation in business is growth percentages. Here the fact that London business has grown by 5% can be compared to Copenhagen having grown by 10% despite total London business being 20-times the volume of Copenhagen’s. A related business example, depending on implementation details, could be comparing foreign currency amounts at a fixed exchange rate to remove the impact of currency fluctuation.

Normalised figures are very typical in science, but, aside from the growth example mentioned above, considerably less prevalent in business. In both avenues of human endeavour, the approach should be used with caution; something that increases 200% from a very small starting point may not be relevant, be that the result of an experiment or weekly sales figures. Bearing this in mind, normalisation is often essential when looking to present data of different orders on the same graph[7]; the alternative often being that smaller data is swamped by larger, not always what is desirable.

Controls

I’ll use and anecdote to illustrate this area from a business perspective. Imagine an organisation which (as you would expect) tracks the volume of sales of a product or service it provides via a number of outlets. Imagine further that it launches some sort of promotion, perhaps valid only for a week, and notices an uptick in these sales. It is extremely tempting to state that the promotion has resulted in increased sales[8].

However this cannot always be stated with certainty. Sales may have increased for some totally unrelated reason such as (depending on what is being sold) good or bad weather, a competitor increasing prices or closing one or more of their comparable outlets and so on. Equally perniciously, the promotion maybe have simply moved sales in time – people may have been going to buy the organisation’s product or service in the weeks following a promotion, but have brought the expenditure forward to take advantage of it. If this is indeed the case, an uptick in sales may well be due to the impact of a promotion, but will be offset by a subsequent decrease.

In science, it is this type of problem that the concept of control tests is designed to combat. As well as testing a result in the presence of substance or condition X, a well-designed scientific experiment will also be carried out in the absence of substance or condition X, the latter being the control. In the JMB exhibit above, the controls appear in the columns with white labels.

There are ways to make the business “experiment” I refer to above more scientific of course. In retail business, the current focus on loyalty cards can help, assuming that these can be associated with the relevant transactions. If the business is on-line then historical records of purchasing behaviour can be similarly referenced. In the above example, the organisation could decide to offer the promotion at only a subset of the its outlets, allowing a comparison to those where no promotion applied. This approach may improve rigour somewhat, but of course it does not cater for purchases transferred from a non-promotion outlet to a promotion one (unless a whole raft of assumptions are made). There are entire industries devoted to helping businesses deal with these rather messy scenarios, but it is probably fair to say that it is normally easier to devise and carry out control tests in science.

The general take away here is that a graph which shows some change in a business output (say sales or profit) correlated to some change in a business input (e.g. a promotion, a new product launch, or a price cut) would carry a lot more weight if it also provided some measure of what would have happened without the change in input (not that this is always easy to measure).

Rigour and Scrutiny

I mention in the footnotes that the JMB paper in question includes versions of the exhibit presented above for four other membrane proteins, this being in order to firmly establish a connection. Looking at just the figure I have included here, each element of the data presented in the lower bar-graph area is based on duplicated or triplicated tests, with average results (and error bars – see the next section) being shown. When you consider that upwards of three months’ preparatory work could have gone into any of these elements and that a mistake at any stage during this time would have rendered the work useless, some impression of the level of rigour involved emerges. The result of this assiduous work is that the authors can be confident that the exhibits they have developed are accurate and will stand up to external scrutiny. Of course such external scrutiny is a key part of the scientific process and the manuscript of the paper was reviewed extensively by independent experts before being accepted for publication.

In the business world, such external scrutiny tends to apply most frequently to publicly published figures (such as audited Financial Accounts); of course external financial analysts also will look to dig into figures. There may be some internal scrutiny around both the additional numbers used to run the business and the graphical representations of these (and indeed some companies take this area very seriously), but not every internal KPI is vetted the way that the report and accounts are. Particularly in the area of Data Visualisation, there is a tension here. Graphical exhibits can have a lot of impact if they relate to the current situation or present trends; contrawise if they are substantially out-of-date, people may question their relevance. There is sometimes the expectation that a dashboard is just like its aeronautical counterpart, showing real-time information about what is going on now[9]. However a lot of the value of Data Visualisation is not about the here and now so much as trends and explanations of the factors behind the here and now. A well-thought out graph can tell a very powerful story, more powerful for most people than a table of figures. However a striking graph based on poor quality data, data which has been combined in the wrong way, or even – as sometimes happens – the wrong datasets entirely, can tell a very misleading story and lead to the wrong decisions being taken.

I am not for a moment suggesting here that every exhibit produced using Data Visualisation tools must be subject to months of scrutiny. As referenced above, in the hands of an expert such tools have the value of sometimes quickly uncovering hidden themes or factors. However, I would argue that – as in science – if the analyst involved finds something truly striking, an association which he or she feels will really resonate with senior business people, then double- or even triple-checking the data would be advisable. Asking a colleague to run their eye over the findings and to then probe for any obvious mistakes or weaknesses sounds like an appropriate next step. Internal Data Visualisations are never going to be subject to peer-review, however their value in taking sound business decisions will be increased substantially if their production reflects at least some of the rigour and scrutiny which are staples of the scientific method.

Dealing with Uncertainty

In the previous section I referred to the error bars appearing on the JMB figure above. Error bars are acknowledgements that what is being represented is variable and they indicate the extent of such variability. When dealing with a physical system (be that mechanical or – as in the case above – biological), behaviour is subject to many factors, not all of which can be eliminated or adjusted for and not all of which are predictable. This means that repeating an experiment under ostensibly identical conditions can lead to different results[10]. If the experiment is well-designed and if the experimenter is diligent, then such variability is minimised, but never eliminated. Error bars are a recognition of this fundamental aspect of the universe as we understand it.

While de rigueur in science, error bars seldom make an appearance in business, even – in my experience – in estimates of business measures which emerge from statistical analyses[11]. Even outside the realm of statistically generated figures, more business measures are subject to uncertainty than might initially be thought. An example here might be a comparison (perhaps as part of the externally scrutinised report and accounts) of the current quarter’s sales to the previous one (or the same one last year). In companies where sales may be tied to – for example – the number of outlets, care is paid to making these figures like-for-like. This might include only showing numbers for outlets which were in operation in the prior period and remain in operation now (i.e. excluding sales from both closed outlets or newly opened ones). However, outside the area of high-volume low-value sales where the Law of Large Numbers[12] rules, other factors could substantially skew a given quarter’s results for many organisations. Something as simple as a key customer delaying a purchase (so that it fell in Q3 this year instead of Q2 last) could have a large impact on quarterly comparisons. Again companies will sometimes look to include adjustments to cater for such timing or related issues, but this cannot be a precise process.

The main point I am making here is that many aspects of the information produced in companies is uncertain. The cash transactions in a quarter are of course the cash transactions in a quarter, but the above scenario suggests that they may not always 100% reflect actual business conditions (and you cannot adjust for everything). Equally where you get in to figures that would be part of most companies’ financial results, outstanding receivables and allowance for bad debts, the spectre of uncertainty arises again without a statistical model in sight. In many industries, regulators are pushing for companies to include more forward-looking estimates of future assets and liabilities in their Financials. While this may be a sensible reaction to recent economic crises, the approach inevitably leads to more figures being produced from models. Even when these models are subject to external review, as is the case with most regulatory-focussed ones, they are still models and there will be uncertainty around the numbers that they generate. While companies will often provide a range of estimates for things like guidance on future earnings per share, providing a range of estimates for historical financial exhibits is not really a mainstream activity.

Which perhaps gets me back to the subject of error bars on graphs. In general I think that their presence in Data Visualisations can only add value, not subtract it. In my article entitled Limitations of Business Intelligence I include the following passage which contains an exhibit showing how the Bank of England approaches communicating the uncertainty inevitably associated with its inflation estimates:

Business Intelligence is not a crystal ball, Predictive Analytics is not a crystal ball either. They are extremely useful tools [...] but they are not universal panaceas.

The Old Lady of Threadneedle Street is clearly not a witch

[...] Statistical models will never give you precise answers to what will happen in the future – a range of outcomes, together with probabilities associated with each is the best you can hope for (see above). Predictive Analytics will not make you prescient, instead it can provide you with useful guidance, so long as you remember it is a prediction, not fact.

While I can’t see them figuring in formal financial statements any time soon, perhaps there is a case for more business Data Visualisations to include error bars.
 
 
In Summary

So, as is often the case, I have embarked on a journey. I started with an early example of Data Visualisation, diverted in to a particular branch of science with which I have some familiarity and hopefully returned, again as is often the case, to make some points which I think are pertinent to both the Business Intelligence practitioner and the consumers (and indeed commissioners) of Data Visualisations. Back in “All that glisters is not gold” – some thoughts on dashboards I made some more general comments about the best Data Visualisations having strong informational foundations underpinning them. While this observation remains true, I do see a lot of value in numerically able and intellectually curious people using Data Visualisation tools to quickly make connections which had not been made before and to tease out patterns from large data sets. In addition there can be great value in using Data Visualisation to present more quotidian information in a more easily digestible manner. However I also think that some of the learnings from science which I have presented in this article suggest that – as with all powerful tools – appropriate discretion on the part of the people generating Data Visualisation exhibits and on the part of the people consuming such content would be prudent. In particular the business equivalents of establishing controls, applying suitable rigour to data generation / combination and including information about uncertainty on exhibits where appropriate are all things which can help make Data Visualisation more honest and thus – at least in my opinion – more valuable.
 


 
Notes

 
[1]
 
Watson, J.D., Crick, F.H.C. (1953). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature.
 
[2]
 
Thomas, J.A., Tate, C.G. (2014). Quality Control in Eukaryotic Membrane Protein Overproduction. J. Mol. Biol. [Epub ahead of print].
 
[3]
 
The list of scientists involved in the development of X-ray Crystallography and Structural Biology which was presented earlier in the text encompasses a further nine such laureates (four of whom worked at my wife’s current research institute), though sadly this number does not include Rosalind Franklin. Over 20 Nobel Prizes have been awarded to people working in the field of Structural Biology, you can view an interactive time line of these here.
 
[4]
 
The intensity, size and position of blots are often digitised by specialist software, but this is an aside for our purposes.
 
[5]
 
Plus four other analogous exhibits which appear in the paper and relate to different proteins.
 
[6]
 
Normalisation has a precise mathematical meaning, actually (somewhat ironically for that most precise of activities) more than one. Here I am using the term more loosely.
 
[7]
 
That’s assuming you don’t want to get into log scales, something I have only come across once in over 25 years in business.
 
[8]
 
The uptick could be as compared to the week before, or to some other week (e.g. the same one last year or last month maybe) or versus an annual weekly average. The change is what is important here, not what the change is with respect to.
 
[9]
 
Of course some element of real-time information is indeed both feasible and desirable; for more analytic work (which encompasses many aspects of Data Visualisation) what is normally more important is sufficient historical data of good enough quality.
 
[10]
 
Anyone interested in some of the reasons for this is directed to my earlier article Patterns patterns everywhere.
 
[11]
 
See my series of three articles on Using historical data to justify BI investments for just one example of these.
 
[12]
 
But then 1=2 for very large values of 1

 

 

Patterns patterns everywhere – The Sequel

Back in 2010 I posted a piece called Patterns patterns everywhere which used the entry point of various articles on a number of web-sites relating to the, then current, Eyjafjallajokull eruption. I went ont to reference – amongst other phenomena, the weather.

The incomparable Randall Munroe from xkcd.com has just knocked my earlier work into a cocked hat with his (perhaps unsurprisingly) much more laconic observations from last Friday, which are instead inspired by the recent cold snaps in the US:

You see the same pattern all over. Take Detroit--' 'Hold on. Why do you know all these statistics offhand?' 'Oh, um, no idea. I definitely spend my evenings hanging out with friends, and not curating a REALLY NEAT database of temperature statistics. Because, pshh, who would want to do that, right? Also, snowfall records.
Copyright xkcd.com
This image has been rearranged to fit in to the confines of peterjamesthomas.com

 

 

The Great Divide – Worrying parallels between Windows 8 and the Xbox One

Yosemite Valley

Back in July 2012 in A William Tell Moment? I got a little carried away about the potential convergence between tablets and personal computers. Nearly a year later – and with the Surface Pro only becoming available in my native UK last month – I probably know better. The following is therefore a more balanced piece.

It’s been a while since I put finger-tip to keyboard on this web-site. The occurrence which motivated me to do so was the arrival of my first new home computer since 2008 (yes unfortunately dear reader, the author is that much of a Luddite). The time since 2008 has seen a lot of changes in the technology sphere, notably the rise of the tablet (at probably the third time of asking) and the near ubiquity of end user computing. Certainly in response to the former (and maybe with some influence from the latter) my new laptop (if you can so describe a 17.3” desktop replacement) came with Windows 8 pre-installed.

My new 'laptop'

I am obviously several months too late for my review of Microsoft’s latest OS to have much resonance and my brief comments here have no doubt been offered up by other pundits already. What I want to do instead is perhaps try to tie Windows 8 together with some broader trends and explore just how weird and polarised the technology market has become recently. However, some brief initial commentary on Windows 8 is perhaps pertinent.

The main thrust of Windows 8 is for Microsoft to remain relevant, perhaps not so much in its traditional arena of PC computing, but in the newer world of tablet and mobile computing. I’m sure some tablet fans may take issue with my observation, but my opinion is that Windows 8 is trying to do two, potentially incompatible, things: to be relevant to content creators and to content consumers. I am sure there are all sorts of examples of people creating amazing content on their iPads or Android tablets, however perhaps the surprise here is that it is done at all, rather than done well[1].

Regardless of some content creation doubtless occurring on tablets, I stand by my assertion that they are essentially platforms for the consumption of content; be that web-pages (sometimes masquerading as apps), games, videos, music, or increasingly feedback from the ever increasing range of sensors providing information about everything from the device’s location to its owner’s current heart rate. The content that is consumed on tablets is – in most cases – created on other types of devices; often the quotidian ones which have physical keyboards and pointing devices which allow for precision work.

Fitzgerald demonstrating that you can play two roles

In the past, the dichotomy between content creators and content consumers has been somewhat masked by them employing similar tools. Of course every content creator is also a content consumer, but it has always (“always” of course being an interesting word when what I probably mean is “since the Internet became mainstream”), been the case that there were significantly more of the latter than the former[2]. What was different historically was that both creators and consumers used the same kit; PCs of some flavour[3] (though maybe the former had better processors and more memory on their machines). The split in roles was evident (if it was evident at all) in computers that were only ever used to surf, do e-mail and write the occasional letter; there were probably an awful lot of these. We had a general purpose computing platform (the PC) which was being under-utilised by the majority of people who owned one.

The eventual adoption of tablets has changed this dynamic. Although of course many tablets have processors that previous generations of PCs could only have dreamed of, their focus is firmly on delivering only those elements of a PCs capabilities which most people use and eschewing those which the majority ignore. As always, specialisation and focus leads to superior execution. The author (no fan of Apple products in general) can confirm that an iPad is much more fit for purpose than a laptop when the purpose is watching a film or TV show on a train or plane. Laptops can of course do this, but they are over-engineered for the task and also pretty bulky if all you want is to watch something. Having played Angry Birds on each of Android, iOS and web-versions on a laptop, the experience is best on the smaller, lighter, touch-based devices.

PC and iPad

The reason that the sales of PCs have plummeted while those of tablets soar is not that tablets are better than PCs, nor is it even that they demystify computing in a way that their elder brethren fail to do (more on this later), but simply that tablets are more aligned with what the majority of people want from their computers; as above to be media platforms that allow basic surfing and e-mail. To borrow the phrase from the last paragraph, tablets are more fit for purpose if the purpose is consumption of content.

The flip side of this is what I am currently doing: namely writing this article, sourcing / editing / creating images to illustrate it and cutting some entry-level HTML in the process. I could of course do this on an iPad or Android tablet. However this is much like saying that you can (in extremis) use a foot-pump to re-inflate a car tyre, but why would you if you can make it to a garage / service station and get access to a machine that is dedicated to inflating tyres with greater efficiency. If there was no machine with a keypad to hand, then I might decide to write on an iPad, but it would be a frustrating and sub-optimal experience. PCs are more fit for purpose where the purpose is content creation.

Which market would you rather sell into?

However, we now reach a problem in economics. If we apply the Wikipedia percentages to content creators versus content consumers, then the split is (depending on which side of the fence you place editors) either 1 : 10 or 1 : 100. In either case, someone pitching hardware and software to a content creator is addressing a much smaller part of the marketplace than someone pitching hardware and software to content consumers; aka the mass market. This observation inexorably leads to the types of features and capabilities which will dominate any platforms aimed at general computer users; basically content consumers are king and content creators paupers.

Which returns me to Windows 8. The metro interface is avowedly designed for mobile devices with a touch-based interface. My new machine doesn’t have a touch screen. Why would I need one on a device that supports the much more efficient and precise input provided by a physical keyboard and mouse? Indeed, one of the nice things about my new laptop is its 1920×1080 screen, why would I want to cover this with as many annoying finger smudges as my iPad has when there are much better ways of interacting with the OS which also leave the monitor clean? In fact, on reflection, I guess that the majority of people and not just content creators would prefer a non-smeared screen most of the time.

There seem to be obvious usability snafus in Windows 8 as well. To highlight just one, if you move your mouse (aka finger) to the top right-hand side, one of the “charms” menus appears (I’d really like to know why Microsoft thought “charms” was a great name for this). But what is also at the top right-hand side of any maximised window? The close button of course. I have lost count of how many times I have wanted to close a programme and instead had the charming blue panel appear instead. I spent the first eight years of my career in commercial software development and fully appreciate that there is no such thing as bug-free code, however this type of glitch seems so avoidable that one has to question both Microsoft’s design and testing process.

An early adopter of Excel 2013

Anyway, enough on the faults of Windows 8. In time I’ll get used to it just as I did with Windows 95, 97, XP and 7. Just as I have got used to each version of Excel being harder to use than the last for anyone that has a track record with the application. Of course I’ll get used to Excel 2013, what choice do I have? But this leads us into another economic dichotomy. Microsoft don’t need to win me over to Excel, I’m going to put up with whatever silly thing they do to it in the latest version because that’s a lower hurdle than learning another spreadsheet; even assuming that something like Google Docs offers the same functionality. The renewal rates for products like Excel must be 95% plus, this means that a vendor like Microsoft focusses instead on getting new business from people who don’t use their applications. If this means making the application “easier” for new users, then who cares if existing users are inconvenienced, it’s not like they are going to stop using the application.

As I alluded to above, a general claim made for tablets (and for the iPad in particular) is that they demystify computing, making it accessible to “regular people” (as an aside here we have the entire cool dude versus nerd advertising encapsulated in “I’m a Mac, he’s a PC”, something which I think Microsoft are to be lauded for lampooning in their later campaign). Instead I would argue that tablets offer a limited slice of what computers can do (the genius being that it is the slice that 90% or 99% of content consumers seem to want). They don’t make computing easier or more accessible, they make it more limited and sell this as a benefit using words like “elegant”, “stripped-down” or “minimalist”.

Tablets clearly fill a large market need, I use them myself. However, my Window-centred gripe is when I have to buy a product (a PC) whose basic operation is dictated by a function (content consumption) for which the machine is over-engineered, whereas the function for which a PC is perfect (content creation) is symmetrically and even systematically compromised.

As things stand, maybe Microsoft should not be so concerned about losing the mobile and tablet market (perhaps for them it is already too late). Instead it could be argued that they should be more worried about, though a lack of attention to the needs of their core users, forfeiting the PC market which they have dominated for so long and in which their products (pre-Windows 8 at least) were the ones best suited to the job at hand.

Brothers in arms?

The recent launch of the Xbox One (whatever happened to sequential numbering by the way?) was roundly condemned by gamers as focussing too much on the new console being a media hub (again attracting new users) rather than a gaming platform (again ignoring the needs of existing users). At least one cannot accuse Microsoft of being inconsistent, but alienating existing customers is seldom a great long-term strategy for a business.


Notes

[1] Let’s glide seamlessly over Samuel Johnson’s original application of this image to comment on women preachers; the 18th Century is certainly a foreign country and I’m rather glad that we now [mostly] do things differently here.
[2] By way of illustration, Wikipedia tends to assume the 90-9-1 rule. 1% of users create content, 9% edit or otherwise modify content, the rest consume.[citation needed]
[3] Although maybe the term PC has become synonymous with Wintel based machines, I include here personal computers running flavours of UNIX such as Mac OS and Linux.