Tuesday, January 20, 2015

part files in Hadoop Map Reduce output

Hadoop MapReduce program creates the 2 output files are part-r-00000 and _SUCCESS.

To delete the empty part files we can use LazyOutputFormat class.
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
LazyOutputFormat looks for the empty part-r-00000 files. and if it's empty, delete the file.

you no need to generate any part-r-00000 files in case of MultipleOutputs.write, then you can use
job.setOutputFormat(NullOutputFormat.class);
because you are not even calling context.write(). so you can ignore the output files by using NullOutputFormat.class.


No comments:

Post a Comment