The key benefits of the Hadoop Distributed File System over previous storage solutions are:-
1. Highly scalable
2. Highly available
So how is this achieved? The diagram below shows the steps which take place when I client application requests that a file be written to HDFS.
In the diagram, the key HDFS components are:-
1. The Name node which keeps a directory of where each data block is located on which data node on which rack.
2. The Data nodes which keep the Name node informed as to which data blocks are located on their data node and via a heartbeat process keep the Name node informed that they’re still alive.
The high scalability is achieved by:-
1. Spreading the data blocks over multiple data nodes
The high availability is achieved by:-
1. Maintaining a standby name node which maintains a copy of the data block directory (namespace) located on the active name node.
2. Each data block which is written to a data node is also written to x other data nodes.
3. Should a data node fail, it will stop sending a heartbeat signal to the active name node. If the data blocks on the failed data node are then available on 2 few other data nodes, the name node will tell 1 of the data nodes holding a copy to replicate to another data node.
If you want further information, this is available at the Hadoop HDFS Architecture page