Flume File channel
Problems:
Error appending event to
channel. Channel might be full. Consider increasing the channel capacity or
make sure the sinks perform faster.
org.apache.flume.ChannelException:
Commit failed due to IO error [channel=channel6]
at
org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:616)
at
org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
at org.apache.flume.source.jms.JMSSource.doProcess(JMSSource.java:258)
at
org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:54)
at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:139)
at java.lang.Thread.run(Thread.java:745)
Caused by:
java.io.IOException: Usable space exhaused, only 524054528 bytes remaining,
required 524288026 bytes
at org.apache.flume.channel.file.Log.rollback(Log.java:712)
at
org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:614)
... 6 more
15/04/01 07:25:27 ERROR
file.Log$BackgroundWorker: Error doing checkpoint
java.io.IOException:
Usable space exhaused, only 524054528 bytes remaining, required 524288000 bytes
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:985)
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:968)
at org.apache.flume.channel.file.Log.access$200(Log.java:75)
at org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:1183)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Observations:
The File Channel deletes
the files from all the events which have been removed at the time
of checkpoint. But, the
channel will keep 2 log files per data directory even if all its
Events are taken out.
Once all events from log-1, log-2 are taken out and then events going to write
to log-3, log-1 will be deleted at the next checkpoint. Unless a log-3 is
created,
both log-1 and log-2
will not be deleted.
The interval between two
consecutive checkpoints is set to 30 seconds by default, though it is
configurable.
The File Channel writes out a checkpoint periodically to make
restart or recovery faster. The checkpoint is written to the directory
specified as the value of the checkpointDir parameter. If the channel is stopped while it
is checkpointing, the checkpoint may be incomplete or corrupt. A corrupt or
incomplete checkpoint could make the restart of the channel extremely slow, as
the channel would need to read and replay all data files.
To avoid this problem, it is recommended that the use DualCheckpoints parameter
be set to true and that backupCheckpointDir be set. It is recommended that
this directory be on a different disk than the one where the original
checkpoint is stored. When these parameters are set, Flume will back up the
checkpoint to the backupCheckpointDir as soon as it is completed.
This ensures that once the channel has been in operation for a brief period of
time (enough time for the first checkpoint to be written out and backed up), it
will be able to restart from a checkpoint, even if the newest one is corrupt or
incomplete, reducing the restart time drastically. The time period between
consecutive checkpoints is controlled by the checkpointInterval parameter.
To ensure that the channel does not write to disks with low disk
space, the minimumRequiredSpace parameter
can be configured. Once the disk space on a specific disk goes down to the
value set by this parameter (or 1 MB, whichever is higher), the channel ceases
operation. To conserve system resources and not affect performance, the channel
does not check the disk space on every write, but does it periodically,
maintaining counters internally to calculate the space available on disk. This
makes the assumption that no other channel or process is writing to this disk,
which would make the available disk space decrease faster.
The File Channel is implemented as a Write Ahead Log. The channel
keeps a reference count of the number of events in a particular data file which
needs to be taken by the sink. Once all the events in a file are taken, the
file will be deleted after the next checkpoint. If you want the files to get
deleted faster you can reduce the maximum size of a data file through the
config parameter "maxFileSize"(this is the maximum size you want each
individual log file to grow to - in bytes). By default, the maxFileSize is
around 1.5G. As an experiment you can reduce the file size and see if the
files are getting deleted (each directory will have at least 2 files even if
all events have been taken).