Overwite / create the existing output path to hadoop jobs or Mapreduce
Generally when we will get the below exception means The output path already exists.
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/outputfile already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:937)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
Solution:
1. Use MulipleOutputs class to create custom output path by adding sysdate
2. if output file exists, then delete file using FileSystem
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
if(fs.exists(new Path(args[1]))){
fs.delete(new Path(args[1]),true);
}
For Local fileSystem you can use:
//FileSystem.getLocal(conf).delete(outputDir, true);
or HDFS
//Path outputDir = new Path( args[2] );
// outputDir.getFileSystem( jobConf ).delete( outputDir, true );
3. Writing custom output format class, override setOutputPath to change the output path and assign that object to JobContext.
No comments:
Post a Comment