Showing posts with label Map Reduce. Show all posts
Showing posts with label Map Reduce. Show all posts

Tuesday, January 20, 2015

part files in Hadoop Map Reduce output

Hadoop MapReduce program creates the 2 output files are part-r-00000 and _SUCCESS.

To delete the empty part files we can use LazyOutputFormat class.
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
LazyOutputFormat looks for the empty part-r-00000 files. and if it's empty, delete the file.

you no need to generate any part-r-00000 files in case of MultipleOutputs.write, then you can use
job.setOutputFormat(NullOutputFormat.class);
because you are not even calling context.write(). so you can ignore the output files by using NullOutputFormat.class.


create dynamic output directory from reducer using Map Reduce

Using MultipleOutputs class we can create our own output directory.
For example create the Folderpath as /<output path>/<System date>/<Namedoutput>

In Main Driver we need to mention the actual output path:
MultipleOutputs.addNamedOutput(job, "Combined", TextOutputFormat.class, Text.class, Text.class);
        MultipleOutputs.addNamedOutput(job, "UnProcessed", TextOutputFormat.class, Text.class, Text.class);

Reducer Class:
public class Reducer extends Reducer<Text, TextArrayWritable, Text, Text> {
MultipleOutputs<Text, Text> mos;
String currentDate= "";

public void setup(Context context) {
mos = new MultipleOutputs<Text, Text>(context);
DateFormat dateFormat = new SimpleDateFormat("yyyyMMdd-HHmmss");
  //get current date time with Date()
Date date = new Date();
currentDate = dateFormat.format(date);
}

public void reduce(Text key, Iterable<TextArrayWritable> values,
Context context) throws IOException, InterruptedException {
  
ArrayList<Text[]> sortedList = new ArrayList<Text[]>();
//divided into MsgReq and MsgResponse Lists
for (TextArrayWritable value : values) {
Text[] createList=(Text[])value.toArray();
if (!(createList[0].toString().contains("0000000000000000"))) {
mos.write("Dummy", key,new Text(createList[0].toString()),currentDate+"/Dummy");
}
sortedList.add((Text[]) value.toArray());
}
}

protected void cleanup(Context context) throws IOException,
InterruptedException {
mos.close();
}

}

Here:
mos.write("Dummy", key,new Text(createList[0].toString()),currentDate+"/Dummy");
In the above statement 
First argument: NamedOutput is "Dummy"
Second argument: Key is key
Third argument: value is new Text(createList[0].toString())
Fourth argument: basOutputPath is currentDate+"/Dummy" <sysdate+"/"+NamedOutput>

Result:
sysdate/Dummy-r-00000

Have a nice day...............:-)