Big Data is a term which refers to the need for information management systems to handle large volumes (typically petabytes i.e. 1,000,000+ gigabytes) of structured, unstructured and semi-structured data and supply this to data consumers as quickly as possible.
Of the many solutions to this problem, the Apache Open Source project – Hadoop – and associated projects (commonly referred to as the Hadoop ecosystem) are probably the best known examples.
The mechanism by which a federated data warehouse or Hadoop brings to solve the problem of Big Data is to partition data and spread it over a distributed server cluster. This “Divide & Conquer” methodology means that if you run a query, tasks are run on each partition to retrieve and process the data and then the typically much smaller result set from each task is shipped back to the coordinating node, where the individual results sets from each task are aggregated for presentation back to the client.
Associated with “Big Data” are the following common terms:-
NoSQL – refers to a set of database systems which store data in a format that does not require a data model to be created (e.g. Key value pair, XML, JSON). Popular NoSQL databases include MongoDB, Cassandra, Redis, Base, CouchDb, MarkLogic. Although these are collectively called NoSQL databases, they all have different ways of storing data unique to each system.
Data Science refers to the use of an analytical programming language to create models which pull out identifying items of data which can be used to identify patterns of data, and algorithms which attempt to successfully predict consumer behaviour. Common programming languages used by Data Scientists include R, Matlab & Scala
Note: Although NoSQL and Data Science are terms commonly associated with Big Data, you can store small volumes of data in a NoSQL database or write data science algorithms against standard relational databases. It is also very common to use Big Data solutions when data volumes remain relatively small.
I discuss common reasons why organisations choose to incorporate big data solutions in their information management solution architecture in the Examples as to where companies use Big Data technology article