更新时间:2021-06-25 21:27:11
封面
版权信息
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introduction to Hadoop
Hadoop Distributed File System
High availability
Intra-DataNode balancer
Erasure coding
Port numbers
MapReduce framework
Task-level native optimization
YARN
Opportunistic containers
Types of container execution
YARN timeline service v.2
Enhancing scalability and reliability
Usability improvements
Architecture
Other changes
Minimum required Java version
Shell script rewrite
Shaded-client JARs
Installing Hadoop 3
Prerequisites
Downloading
Installation
Setup password-less ssh
Setting up the NameNode
Starting HDFS
Setting up the YARN service
Erasure Coding
Installing YARN timeline service v.2
Setting up the HBase cluster
Simple deployment for HBase
Enabling the co-processor
Enabling timeline service v.2
Running timeline service v.2
Enabling MapReduce to write to timeline service v.2
Summary
Overview of Big Data Analytics
Introduction to data analytics
Inside the data analytics process
Introduction to big data
Variety of data
Velocity of data
Volume of data
Veracity of data
Variability of data
Visualization
Value
Distributed computing using Apache Hadoop
The MapReduce framework
Hive
Downloading and extracting the Hive binaries
Installing Derby
Using Hive
Creating a database
Creating a table
SELECT statement syntax
WHERE clauses
INSERT statement syntax
Primitive types
Complex types
Built-in operators and functions
Built-in operators
Built-in functions
Language capabilities
A cheat sheet on retrieving information
Apache Spark
Visualization using Tableau
Big Data Processing with MapReduce
Dataset
Record reader
Map
Combiner
Partitioner
Shuffle and sort
Reduce
Output format
MapReduce job types
Single mapper job
Single mapper reducer job