
Task-level native optimization
MapReduce has added support for a native implementation of the map output collector. This new support can result in a performance improvement of about 30% or more, particularly for shuffle-intensive jobs.
The native library will build automatically with Pnative. Users may choose the new collector on a job-by-job basis by setting mapreduce.job.map.output.collector.class=org.apache.hadoop.mapred.
nativetask.NativeMapOutputCollectorDelegator in their job configuration.
The basic idea is to be able to add a NativeMapOutputCollector in order to handle key/value pairs emitted by mapper. As a result of this sort, spill, and IFile serialization can all be done in native code. A preliminary test (on Xeon E5410, jdk6u24) showed promising results as follows:
- sort is about 3-10 times faster than Java (only binary string compare is supported)
- IFile serialization speed is about three times faster than Java: about 500 MB per second. If CRC32C hardware is used, things can get much faster in the range of 1 GB or higher per second
- Merge code is not completed yet, so the test uses enough io.sort.mb to prevent mid-spill