Big Data Analytics with Hadoop 3

更新时间：2021-06-25 21:27:11

最新章节：Summary

封面

版权信息

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Introduction to Hadoop

Hadoop Distributed File System

High availability

Intra-DataNode balancer

Erasure coding

Port numbers

MapReduce framework

Task-level native optimization

YARN

Opportunistic containers

Types of container execution

YARN timeline service v.2

Enhancing scalability and reliability

Usability improvements

Architecture

Other changes

Minimum required Java version

Shell script rewrite

Shaded-client JARs

Installing Hadoop 3

Prerequisites

Downloading

Installation

Setup password-less ssh

Setting up the NameNode

Starting HDFS

Setting up the YARN service

Erasure Coding

Intra-DataNode balancer

Installing YARN timeline service v.2

Setting up the HBase cluster

Simple deployment for HBase

Enabling the co-processor

Enabling timeline service v.2

Running timeline service v.2

Enabling MapReduce to write to timeline service v.2

Summary

Overview of Big Data Analytics

Introduction to data analytics

Inside the data analytics process

Introduction to big data

Variety of data

Velocity of data

Volume of data

Veracity of data

Variability of data

Visualization

Value

Distributed computing using Apache Hadoop

The MapReduce framework

Hive

Downloading and extracting the Hive binaries

Installing Derby

Using Hive

Creating a database

Creating a table

SELECT statement syntax

WHERE clauses

INSERT statement syntax

Primitive types

Complex types

Built-in operators and functions

Built-in operators

Built-in functions

Language capabilities

A cheat sheet on retrieving information

Apache Spark

Visualization using Tableau

Summary

Big Data Processing with MapReduce

The MapReduce framework

Dataset

Record reader

Map

Combiner

Partitioner

Shuffle and sort

Reduce

Output format

MapReduce job types

Single mapper job

Single mapper reducer job