In the Examples as to where companies use Big Data technology article, I discussed reasons for adopting a “Big Data” solution. The natural question then is to ask as to why you wouldn’t simply displace your existing relational database systems with new “Big data” solutions.
Although “Big Data” is certainly evolving rapidly, at the moment the following situations are areas better supported by traditional relational database systems over Big Data/NoSQL systems:-
1. Websites that need to update, rather than simply insert and delete data
The problem here is that NoSQL databases are able to write data rapidly by lowering the requirement to ensure that data is consistent when immediately saved. From an application’s perspective, the transaction appears to have completed, but the data has only been saved to the database logs in memory, and not yet saved to the database itself. This relaxed method of saving data is known as being “eventually consistent” rather than “immediately consistent” which is the case with traditional relational database systems. For some websites e.g. an ecommerce shopping application, the need for immediate consistency is not so strong as typically you step through the shopping experience and once you purchase the contents of your shopping basket, you cannot update it. For a hotel reservation system, however, users of the system clearly need to understand which rooms are available so eventual consistency is not satisfactory in that situation.
2. Whenever you need a relatively high level of data security.
The problem with Hadoop and most NoSQL databases is that they do not yet provide a high level of data security. For example, in a traditional relational database system such as Oracle, you can prevent site administrators from seeing production data, you can encrypt the data within the database and in your backup solution, you can mask data dynamically, permanently mask data an remove data either on a row or column level from being seen by an end user dependent on their privileges. This sophisticated level of data security is not easily available in “Big Data” solutions today. Regulators in a number of countries require that customer’s personal data is secured, so information such as name, date of birth, social security number, credit/debit card numbers, account information, addresses etc. are not disclosed to anybody.
3. Where there is a limited set of additional tools that will work with your chosen “Big Data” technology
Although website developers are accustomed to writing code to bolt together different systems, business users expect to be able to analyse data using reporting tools that they’re familiar with. In a large organisation data also has to be distributed to many other systems, so if your data integration software can’t read the data in your “Big Data” store then you have a problem.
4. Where real-time query access is important
This is being worked on, but until recently, Hadoop used batch-processing techniques to query data. This involved sending a request to all servers in a cluster that have data required by a query, have each server compute it’s own result, and return the query result which was then aggregated. This batch processing technique is referred to as MapReduce. As well as relatively high latency in returning query results, you are limited in the number of concurrent users which can be supported.
5. Where “Big Data” skills are in short supply within your organisation and can not be sourced at a cost effective price
At the moment, there is very strong demand for developers who are able to write MapReduce jobs in Hadoop, use associated Apache tools such as Storm, Kafka, Pig, Hive, Sqoop etc. or code in Python, R, Scala or Matlab. With strong demand and limited supply, development costs of implementing a Big Data solution can increase rapidly.
On a final note, as with every solution, before committing to purchasing a Big Data technology, you should always get vendor support to assist in a proof of concept to demonstrate that an end to end solution functionally works with no integration issues. A proof of concept typically uncovers function and feature issues which may not be highlighted by the vendor during the sales demonstration.