Approaches to storing and making available Big Data to consumers

There are 2 main approaches underway for dealing with the problem that an organisation has in needing to collate, cleanse, standardise and integrate data generated within and outside of the enterprise.

Enterprise Data Hub (aka a Data Lake or Data Integration Hub)

In this approach, the idea is to simply pull data from a variety of data sources in to a central data hub, clean the data and store it in a slightly purified raw form. Data is then integrated on a use case by use case basis.

Advantage of this approach

i) Relatively quick to implement, with just pre-processing/cleaning of data required before storage
ii) Can deliver data in near real time to downstream systems
iii) Allows integration on a use case by use case basis.

Disadvantages of this approach

i) For each use case, data integration scripts need to be created
ii) Since data integration isn’t centralised, each data integration solution may produce different results.
iii) The enterprise data hub can rapidly become an “integration hairball“.
iv) There is a tendency to not pre-process data using this method, creating a data swamp rather than a data lake.

Enterprise Data Warehouse (EDW)

EDW can mean different things to different people but here I’ll use it in it’s broadest sense where it holds transaction level data (often called an operational data store) as well as aggregated data and links transaction data to standardised master & reference data.

I’ll split this down in to 2 well established sub-processes

Bill Inmon Approach

In this approach, data is typically sourced from an enterprise data hub, but data from multiple sources and multiple business functions is integrated in to a single data model, from which business function specific data models can be generated (aka data marts).

Advantages of this approach

i) Once the EDW is complete, then all of the use cases can be met from an integrated data model which means they’ll produce consistent results.
ii) EDW is able to provide integrated data from across the enterprise
iii) No danger of an “integration hairball”

Disadvantages of this approach

i) For a small company, producing an EDW is relatively straightforward, but for large organisations especially those that are evolving rapidly, producing an integrated data model which covers the entire enterprise is a well known “money pit”.

ii) Even if the EDW is completed, for rapidly evolving organisations, an integrated data model is often time consuming to adjust, leading to a substantial lag time between when data starts to be produced at source and when development is complete to add the new data so that it’s available.

Ralph Kimball approach

Ralph Kimball proposed that an information management team should go for quick wins, by picking out those parts of the business which would yield the most dividends to the organisation first e.g. sales and creating a data mart for that business function first, and then moving on to the next business function, evolving an EDW by joining up multiple data marts.

Advantages of this approach

i) Relatively quick to implement (although slower than an enterprise data hub)
ii) Provides standardised data for a particular business function

Disadvantages of this approach

i) In order to report on data across multiple business functions, the dimensions (e.g. customer, geography, calendar date etc.) need to be conformed (made identical). This means re-working dimensions that were created for the 1st business function to add new attributes and data identified as required by subsequent data functions
ii) In order to provide cross-business function data, data has to be joined from multiple fact tables. This is relatively poorly performing in query times.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s