上QQ阅读APP看书，第一时间看更新

Introduction to Hadoop

This chapter introduces the reader to the world of Hadoop and the core components of Hadoop, namely the Hadoop Distributed File System (HDFS) and MapReduce. We will start by introducing the changes and new features in the Hadoop 3 release. Particularly, we will talk about the new features of HDFS and Yet Another Resource Negotiator (YARN), and changes to client applications. Furthermore, we will also install a Hadoop cluster locally and demonstrate the new features such as erasure coding (EC) and the timeline service. As as quick note, Chapter 10, Visualizing Big Data shows you how to create a Hadoop cluster in AWS.

In a nutshell, the following topics will be covered throughout this chapter:

HDFS
- High availability
- Intra-DataNode balancer
- EC
- Port mapping
MapReduce
- Task-level optimization
YARN
- Opportunistic containers
- Timeline service v.2
- Docker containerization
Other changes
Installation of Hadoop 3.1
- HDFS
- YARN
- EC
- Timeline service v.2