Big Data Analytics with Hadoop 3
上QQ阅读APP看书,第一时间看更新

Reduce

The reducer takes the grouped data as input and runs a reduce function once per key grouping. The function is passed the key and an iterator over all of the values associated with that key. A wide range of processing can happen in this function, as we'll see in many of our patterns. The data can be aggregated, filtered, and combined in a number of ways. Once the reduce function is done, it sends zero or more key/value pairs to the final step, the output format. Like the map function, the reduce function will change from job to job since it is a core piece of logic in the solution. The reducer can have a lot of customization including writing output to HDFS, output to Elasticsearch index, and output to RDBMS or a NoSQL such as Cassandra, HBase, and so on.