data:image/s3,"s3://crabby-images/5e0d6/5e0d6170cd01e66e9c8a1e0071f63e932f07573f" alt="Big Data Analytics with Hadoop 3"
High availability
The loss of NameNodes can crash the cluster in both Hadoop 1.x as well as Hadoop 2.x. In Hadoop 1.x, there was no easy way to recover, whereas Hadoop 2.x introduced high availability (active-passive setup) to help recover from NameNode failures.
The following diagram shows how high availability works:
data:image/s3,"s3://crabby-images/da7dd/da7dd6d51a6f34c291982e7a1866cc49e86e854d" alt=""
In Hadoop 3.x you can have two passive NameNodes along with the active node, as well as five JournalNodes to assist with recovery from catastrophic failures:
NameNode machines: The machines on which you run the active and standby NameNodes. They should have equivalent hardware to each other and to what would be used in a non-HA cluster.
JournalNode machines: The machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager.