Tuesday, January 20, 2015

Overwite / create the existing output path to hadoop jobs or Mapreduce


Overwite / create the existing output path to hadoop jobs or Mapreduce

Generally when we will get the below exception means The output path already exists.
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/outputfile already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:937)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)

Solution:
1. Use MulipleOutputs class to create custom output path by adding sysdate
2. if output file exists, then delete file using FileSystem
               Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
if(fs.exists(new Path(args[1]))){
  fs.delete(new Path(args[1]),true);
}
    For Local fileSystem you can use:
               //FileSystem.getLocal(conf).delete(outputDir, true);
    or HDFS 
   //Path outputDir = new Path( args[2] );

   // outputDir.getFileSystem( jobConf ).delete( outputDir, true );
3. Writing custom output format class, override setOutputPath to change the output path and assign that object to JobContext.

No comments:

Post a Comment