
上QQ阅读APP看书,第一时间看更新
Dataset
The first dataset is a table of cities containing the city ID and the name of the City:
Id,City
1,Boston
2,New York
3,Chicago
4,Philadelphia
5,San Francisco
7,Las Vegas
This file, cities.csv, is available as a download, and, once downloaded, you can move it into hdfs by running the command, as shown in the following code:
hdfs dfs -copyFromLocal cities.csv /user/normal
The second dataset is that of daily temperature measurements for a city, and this contains the Date of measurement, the city ID, and the Temperature on the particular date for the specific city:
Date,Id,Temperature
2018-01-01,1,21
2018-01-01,2,22
2018-01-01,3,23
2018-01-01,4,24
2018-01-01,5,25
2018-01-01,6,22
2018-01-02,1,23
2018-01-02,2,24
2018-01-02,3,25
This file, temperatures.csv, is available as a download, and, once downloaded, you can move it into hdfs by running the command, as shown in the following code:
hdfs dfs -copyFromLocal temperatures.csv /user/normal
The following are the programming components of a MapReduce program:
