Michael Koploy of on-line technology consulting company Software Advice recently asked me, together with four other people from the Business Intelligence / Data Warehousing community, to contribute some definitions of commonly-used technology jargon pertinent to our field. The results can be viewed in his article, BI Buzzword Breakdown. Readers may be interested in the differing, but hopefully complementary, definitions that were offered.
In jockeying for space with my industry associates, only one of my definitions (that relating to Data Mining) was used. Here are two others, which were left on the cutting room floor. Maybe they’ll make it to the DVD extras.
|Big Data||Rather than having the entirely obvious meaning, has come to be associated with a set of technologies, some of them open source, that emerged from the needs of several of the major on-line businesses (Google, Yahoo, Facebook and Amazon) to analyse the large amount of data they had relating to how people interact with their web-sites. The area is often linked to Apache Hadoop, a low-cost technology that allows commodity servers to be combined to collectively to store large amounts of data, particularly where the structure of these varies considerably and particularly where there is a need to support unpredictably-growing volumes.|
|Data Warehouse||A collection of data, generally emanating from a number of different systems, which is combined to form a consistent structure suitable for the support of a variety of reporting and analytical needs. Most warehouses will have an element of data stored in a multi-dimensional format; i.e. one that is intended to support pivot-table like slicing and dicing. This is achieved using specific data structures: Fact tables, which hold figures, or measures (like profit, or sales, or growth); and dimension tables, which hold business entities, or dimensions (like countries, weeks, product lines, salesman etc.). The dimensions are often nested into hierarchies, such as Region => Country => City => Area. Warehouse data is generally leveraged using traditional reports, On-Line Analytical Processing (OLAP) and more advanced analytical approaches, such as data mining.|
The above comments are perhaps most notable for representing my first reference to the latest information hot topic, the rather misleadingly named Big Data. To date I have rather avoided the rampaging herd in this area – maybe through fear of being crushed in the stampede – but it is probably a topic to which I will return once there is less hype and more substance to comment on.