Hadoop MapReduce program creates the 2 output files are part-r-00000 and _SUCCESS.
To delete the empty part files we can use LazyOutputFormat class.
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
LazyOutputFormat looks for the empty part-r-00000 files. and if it's empty, delete the file.
you no need to generate any part-r-00000 files in case of MultipleOutputs.write, then you can use
job.setOutputFormat(NullOutputFormat.class);
because you are not even calling context.write(). so you can ignore the output files by using NullOutputFormat.class.
To delete the empty part files we can use LazyOutputFormat class.
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
LazyOutputFormat looks for the empty part-r-00000 files. and if it's empty, delete the file.
you no need to generate any part-r-00000 files in case of MultipleOutputs.write, then you can use
job.setOutputFormat(NullOutputFormat.class);
because you are not even calling context.write(). so you can ignore the output files by using NullOutputFormat.class.
No comments:
Post a Comment