What are NoSQL databases?

A term which is often spoken about alongside “Big Data” is NoSQL (“Not Only SQL”). Most NoSQL databases share the distributed, horizontally scalable, high availability features of the Hadoop/Spark stack, but allow random access, thus allowing low latency querying of individual records.

NoSQL databases started to gain popularity when dynamic web content needed to be stored and retrieved from a data store, and addressed problems with relational database management systems (rdbms) had at the time, in being able to flexibly adapt data models, as quickly as a web development team could develop code.

Relational Database Tables

To query all of the information about James Dey, in the example below, you’d type:-

select * from employees e, addresses a where employee_first_name = 'James' and employee_surname='Dey' and a.employee_id = e.employee_id;

Although a number of databases are collectively referred to as “NoSQL, there are 4 distinct NoSQL families which have different features and meet different use cases:-

Document stores

MongoDB is the most popular document store. Document stores allow flexible storage of documents, where each record can have the following features:-

  • A record contains a document
  • It is variable length
  • Types of values of a particular column can vary from record to record
  • A column can contain an array
  • A document can have a nested structure

Document Store Collections

To query all of the information about James Dey, you’d type (if using MongoDB):-

db.employees.find({employee_first_name:James, employee_surname:Dey});

Document stores have become very popular amongst web application developers as documents can be generated containing nested hierarchical data using formats such as JSON and XML, which more closely matches the way in which web application developers code than relational databases do.

Wide column stores

Cassandra and HBase are currently the 2 most popular wide column stores.

A column store refers to the fact that data is stored in columns which reference row keys, as opposed to being stored in rows containing columns which is the way that relational databases usually store data.

A row-based approach is advantageous where you’re selecting individual records for update. For example, if you want to update details for employee “James Dey”, it makes sense to retrieve the entire record for that employee.

Column stores are advantageous, where you want to search a large set of records based on a particular column value. For example, if you want to count all employees who live in London, a column store has an advantage since you simply have to retrieve the record where column=’London’ and count the referenced row keys, whereas in an unindexed row-based rdbms, you’d have to retrieve all employee records, check to see if the city was London and then increment the count.

A wide column store has the following characteristics:-

  • Data is stored in columns within column families within a row.
  • A column family is a set of columns which make up all or part of that row.
  • Within each column family, there are 1 or more columns.
  • Data within a column family is physically stored together
  • Each cell consists of a key-value pair, where the key is a combination of row key, column family and column (qualifier).

HBase table

To query all of the information about James Dey, using the HBase Java Api, your code would look like this:-

Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "Employees");

SingleColumnValueFilter f1 = new SingleColumnValueFilter(Bytes.toBytes("Employee"), Bytes.toBytes("Employee First Name"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("James"))

SingleColumnValueFilter f2 = new SingleColumnValueFilter(Bytes.toBytes("Employee"), Bytes.toBytes("Employee Surname"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("Dey"));

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);

Scan scan = new Scan();
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {

Wide column stores have become popular amongst the Apache Hadoop community, since HDFS only allows sequential access to data, which is only suitable for high latency batch processing. HBase & Cassandra can sit on top of HDFS and provide indexing and random access, allowing individual records to be retrieved with low latency.

Key/Value stores

Redis is currently the most popular key/value store. A key/value store is the simplest form of store, consisting simply of a key and a value for every attribute stored.

Key Value Store

Key/value stores are popular mainly due to their simplicity and hence relative speed in persisting data. They’re great for persisting web form data, for example, so as to allow multiple application servers to pick up the data.

Graph data stores

Currently, the most popular graph data store is Neo4J

In a relational database, relationships are actually not handled particularly well, with bridging tables required to handle many to many relationships. Entities within graph data stores hold pointers to records in other entities to which they are related. This means that the costly lookup which compares a foreign key column in 1 entity with a primary key column in a related entity is no longer required, as the entity (node) knows how all records contained within it are related to all other records in the data model (graph).

Due to the speed at which relationships between entities can be queried, graph data stores are popular in social media sites such as twitter, facebook and linkedin, where establishing a multitude of complex relationships between users is very important.

Graph Store


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s