Poor reasons for adopting “Big Data”

In the Examples as to where companies use Big Data technology article, I discussed reasons for adopting a “Big Data” solution. The natural question then is to ask as to why you wouldn’t simply displace your existing relational database systems with new “Big data” solutions.

Although “Big Data” is certainly evolving rapidly, at the moment the following situations are areas better supported by traditional relational database systems over Big Data/NoSQL systems:-

1. Websites that need to update, rather than simply insert and delete data

The problem here is that NoSQL databases are able to write data rapidly by lowering the requirement to ensure that data is consistent when immediately saved. From an application’s perspective, the transaction appears to have completed, but the data has only been saved to the database logs in memory, and not yet saved to the database itself. This relaxed method of saving data is known as being “eventually consistent” rather than “immediately consistent” which is the case with traditional relational database systems. For some websites e.g. an ecommerce shopping application, the need for immediate consistency is not so strong as typically you step through the shopping experience and once you purchase the contents of your shopping basket, you cannot update it. For a hotel reservation system, however, users of the system clearly need to understand which rooms are available so eventual consistency is not satisfactory in that situation.

2. Whenever you need a relatively high level of data security.

The problem with Hadoop and most NoSQL databases is that they do not yet provide a high level of data security. For example, in a traditional relational database system such as Oracle, you can prevent site administrators from seeing production data, you can encrypt the data within the database and in your backup solution, you can mask data dynamically, permanently mask data an remove data either on a row or column level from being seen by an end user dependent on their privileges. This sophisticated level of data security is not easily available in “Big Data” solutions today. Regulators in a number of countries require that customer’s personal data is secured, so information such as name, date of birth, social security number, credit/debit card numbers, account information, addresses etc. are not disclosed to anybody.

3. Where there is a limited set of additional tools that will work with your chosen “Big Data” technology

Although website developers are accustomed to writing code to bolt together different systems, business users expect to be able to analyse data using reporting tools that they’re familiar with. In a large organisation data also has to be distributed to many other systems, so if your data integration software can’t read the data in your “Big Data” store then you have a problem.

4. Where real-time query access is important

This is being worked on, but until recently, Hadoop used batch-processing techniques to query data. This involved sending a request to all servers in a cluster that have data required by a query, have each server compute it’s own result, and return the query result which was then aggregated. This batch processing technique is referred to as MapReduce. As well as relatively high latency in returning query results, you are limited in the number of concurrent users which can be supported.

5. Where “Big Data” skills are in short supply within your organisation and can not be sourced at a cost effective price

At the moment, there is very strong demand for developers who are able to write MapReduce jobs in Hadoop, use associated Apache tools such as Storm, Kafka, Pig, Hive, Sqoop etc. or code in Python, R, Scala or Matlab. With strong demand and limited supply, development costs of implementing a Big Data solution can increase rapidly.

On a final note, as with every solution, before committing to purchasing a Big Data technology, you should always get vendor support to assist in a proof of concept to demonstrate that an end to end solution functionally works with no integration issues. A proof of concept typically uncovers function and feature issues which may not be highlighted by the vendor during the sales demonstration.

Advertisements

One thought on “Poor reasons for adopting “Big Data”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s