Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is a FAQ for common questions that occur when debugging the operations of a running Flume cluster.

How can I get metrics from a node?

Flume nodes report metrics which you can use to debug and to see progress. You can look at a node's status web page by pointing your browser to port 35862. (http://<node>:35862).

How can I tell if data is arriving at the collector?

When events arrive at a collector, the source counters should be incremented on the node's metric page. For example, if you have a node called foo you should see the following fields have growing values when you refresh the page.

  • LogicalNodeManager.foo.source.CollectorSource.number of bytes
  • LogicalNodeManager.foo.source.CollectorSource.number of events

How can I tell if data is being written to HDFS?

Data in hdfs doesn't "arrive" in hdfs until the file is closed or certain size thresholds are met. As events are written to hdfs, the sink counters on the collector's metric page should be incrementing. In particular look for fields that match the following names:

...

*.appendSuccesses are successful writes. If other values like appendRetries or appendGiveups are incremented, they indicate a problem with the attemps to write.

I am getting a lot of duplicated event data. Why is this happening and what can I do to make this go away?

tail/multiTail have been reported to restart file reads from the beginning of files if the modification rate reaches a certain rate. This is a fundamental problem with a non-native implementation of tail. A work around is to use the OS's tail mechanism in an exec source (exec("tail -n +0 -F filename")). Alternately many people have modified their applications to push to a Flume agent with an open rpc port such as syslogTcp or thriftSource, avroSource.

...

If that was in E2E mode goes down, it will attempt to recover and resend data that did not receive acknowledgements on restart. This may result in some duplicates.

I have encountered a "Could not increment version counter" error message.

This is a zookeeper issue that seems related to virtual machines or machines that change IP address while running. This should only occur in a development environment – the work around here is to restart the master.

I have encountered a IllegalArgumentException related to checkArgument and EventImpl.

Here's an example stack trace:

...