Big Data Analytics with Hadoop 3
上QQ阅读APP看书,第一时间看更新

Dataset

The first dataset is a table of cities containing the city ID and the name of the City:

Id,City
1,Boston
2,New York
3,Chicago
4,Philadelphia
5,San Francisco
7,Las Vegas

This file, cities.csv, is available as a download, and, once downloaded, you can move it into hdfs by running the command, as shown in the following code:

hdfs dfs -copyFromLocal cities.csv /user/normal

The second dataset is that of daily temperature measurements for a city, and this contains the Date of measurement, the city ID, and the Temperature on the particular date for the specific city:

Date,Id,Temperature
2018-01-01,1,21
2018-01-01,2,22
2018-01-01,3,23
2018-01-01,4,24
2018-01-01,5,25
2018-01-01,6,22
2018-01-02,1,23
2018-01-02,2,24
2018-01-02,3,25

This file, temperatures.csv, is available as a download, and, once downloaded, you can move it into hdfs by running the command, as shown in the following code:

hdfs dfs -copyFromLocal temperatures.csv /user/normal

The following are the programming components of a MapReduce program: