Monday, May 16, 2016

Flume File Channel Problems

Flume File channel Problems:

Error appending event to channel. Channel might be full. Consider increasing the channel capacity or make sure the sinks perform faster.
org.apache.flume.ChannelException: Commit failed due to IO error [channel=channel6]
        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:616)
        at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
        at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
        at org.apache.flume.source.jms.JMSSource.doProcess(JMSSource.java:258)
        at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:54)
        at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:139)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Usable space exhaused, only 524054528 bytes remaining, required 524288026 bytes
        at org.apache.flume.channel.file.Log.rollback(Log.java:712)
        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:614)
        ... 6 more
15/04/01 07:25:27 ERROR file.Log$BackgroundWorker: Error doing checkpoint
java.io.IOException: Usable space exhaused, only 524054528 bytes remaining, required 524288000 bytes
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:985)
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:968)
        at org.apache.flume.channel.file.Log.access$200(Log.java:75)
        at org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:1183)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Observations: 
The File Channel deletes the files from all the events which have been removed at the time
of checkpoint. But, the channel will keep 2 log files per data directory even if all its
Events are taken out. Once all events from log-1, log-2 are taken out and then events going to write to log-3, log-1 will be deleted at the next checkpoint. Unless a log-3 is created,
both log-1 and log-2 will not be deleted.
The interval between two consecutive checkpoints is set to 30 seconds by default, though it is configurable.
The File Channel writes out a checkpoint periodically to make restart or recovery faster. The checkpoint is written to the directory specified as the value of the checkpointDir parameter. If the channel is stopped while it is checkpointing, the checkpoint may be incomplete or corrupt. A corrupt or incomplete checkpoint could make the restart of the channel extremely slow, as the channel would need to read and replay all data files.

To avoid this problem, it is recommended that the use DualCheckpoints parameter be set to true and that backupCheckpointDir be set. It is recommended that this directory be on a different disk than the one where the original checkpoint is stored. When these parameters are set, Flume will back up the checkpoint to the backupCheckpointDir as soon as it is completed. This ensures that once the channel has been in operation for a brief period of time (enough time for the first checkpoint to be written out and backed up), it will be able to restart from a checkpoint, even if the newest one is corrupt or incomplete, reducing the restart time drastically. The time period between consecutive checkpoints is controlled by the checkpointInterval parameter.

To ensure that the channel does not write to disks with low disk space, the minimumRequiredSpace parameter can be configured. Once the disk space on a specific disk goes down to the value set by this parameter (or 1 MB, whichever is higher), the channel ceases operation. To conserve system resources and not affect performance, the channel does not check the disk space on every write, but does it periodically, maintaining counters internally to calculate the space available on disk. This makes the assumption that no other channel or process is writing to this disk, which would make the available disk space decrease faster.

The File Channel is implemented as a Write Ahead Log. The channel keeps a reference count of the number of events in a particular data file which needs to be taken by the sink. Once all the events in a file are taken, the file will be deleted after the next checkpoint. If you want the files to get deleted faster you can reduce the maximum size of a data file through the config parameter "maxFileSize"(this is the maximum size you want each individual log file to grow to - in bytes). By default, the maxFileSize is around 1.5G. As an experiment you can reduce the file size and see if the files are getting deleted (each directory will have at least 2 files even if all events have been taken). 


No comments:

Post a Comment