Big Data Analytics with Hadoop 3
上QQ阅读APP看书,第一时间看更新

Introduction to Hadoop

This chapter introduces the reader to the world of Hadoop and the core components of Hadoop, namely the Hadoop Distributed File System (HDFS) and MapReduce. We will start by introducing the changes and new features in the Hadoop 3 release. Particularly, we will talk about the new features of HDFS and Yet Another Resource Negotiator (YARN), and changes to client applications. Furthermore, we will also install a Hadoop cluster locally and demonstrate the new features such as erasure coding (EC) and the timeline service. As as quick note, Chapter 10Visualizing Big Data shows you how to create a Hadoop cluster in AWS.

In a nutshell, the following topics will be covered throughout this chapter:

  • HDFS
    • High availability
    • Intra-DataNode balancer
    • EC
    • Port mapping
  • MapReduce
    • Task-level optimization
  • YARN
    • Opportunistic containers
    • Timeline service v.2
    • Docker containerization
  • Other changes
  • Installation of Hadoop 3.1
    • HDFS
    • YARN
    • EC
    • Timeline service v.2