Technical Tweets

Tuesday, October 13, 2015

Canary test failed to establish a connection or a client session to the ZooKeeper

Canary test failed to establish a connection or a client session to the ZooKeeper

RN ZooKeeperHiveLockManager: Unexpected ZK exception when creating parent node /hive_zookeeper_namespace_hive org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hive_zookeeper_namespace_hive at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.setContext(ZooKeeperHiveLockManager.java:120) at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.getLockManager(DummyTxnManager.java:71) at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:100) at org.apache.hadoop.hive.ql.Driver.acquireReadWriteLocks(Driver.java:918) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1128) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145) at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)

solution: restart the cluster or restart the zookeeper and hive service.

Tuesday, January 20, 2015

part files in Hadoop Map Reduce output

Hadoop MapReduce program creates the 2 output files are part-r-00000 and _SUCCESS.

To delete the empty part files we can use LazyOutputFormat class.
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
LazyOutputFormat looks for the empty part-r-00000 files. and if it's empty, delete the file.

you no need to generate any part-r-00000 files in case of MultipleOutputs.write, then you can use
job.setOutputFormat(NullOutputFormat.class);
because you are not even calling context.write(). so you can ignore the output files by using NullOutputFormat.class.

Overwite / create the existing output path to hadoop jobs or Mapreduce

Overwite / create the existing output path to hadoop jobs or Mapreduce

Generally when we will get the below exception means The output path already exists.
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/outputfile already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:937)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)

Solution:
1. Use MulipleOutputs class to create custom output path by adding sysdate
2. if output file exists, then delete file using FileSystem
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
if(fs.exists(new Path(args[1]))){
fs.delete(new Path(args[1]),true);
}
For Local fileSystem you can use:
//FileSystem.getLocal(conf).delete(outputDir, true);
or HDFS
//Path outputDir = new Path( args[2] );

// outputDir.getFileSystem( jobConf ).delete( outputDir, true );
3. Writing custom output format class, override setOutputPath to change the output path and assign that object to JobContext.