Operations FAQ

This is a FAQ for common questions that occur when debugging the operations of a running Flume cluster.

I am getting a lot of duplicated event data. Why is this happening and what can I do to make this go away?

tail/multiTail have been reported to restart file reads from the beginning of files if the modification rate reaches a certain rate. This is a fundamental problem with a non-native implementation of tail. A work around is to use the OS's tail mechanism in an exec source (exec("tail -n +0 -F filename")). Alternately many people have modified their applications to push to a Flume agent with an open rpc port such as syslogTcp or thriftSource, avroSource.

In E2E mode, agents will attempt to retransmit data if no acks are recieved after flume.agent.logdir.retransmit milliseconds have expried (this is a flume-site.xml property). Acks do not return until after the collector's roll time, flume.collector.roll.millis , expires (this can be set in the flume-site.xml file or as an argument to a collector) . Make sure that the retry time on the agents is at least 2x that of the roll time on the collector.

If that was in E2E mode goes down, it will attempt to recover and resend data that did not receive acknowledgements on restart. This may result in some duplicates.

Child pages

Operations FAQ

I am getting a lot of duplicated event data. Why is this happening and what can I do to make this go away?