Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You should use the collectorSink. It is sufficient for most users and greatly simplifies configuration. The sinks mentioned above are "low-level" and exposed for advanced users. HDFS files are not durable until they close or are synced, and these sinks do not automatically do this. The collectorSink is smarter and handles periodic closing of files.

Agent Side

I'm generating events from my application and sending it to a flume agent listening for Thrift/Avro RPCs and my timestamps seem to be in the 1970s.

Event generated is expected to have unix time in milliseconds. If the data is being generated by an external application, this application must generated data in terms of milliseconds.

For example, 1305680461000 should result in 5/18/11 01:01:01 GMT, but 1305680461 will result in something like 1/16/70 2:41:20 GMT

Collector Side

Can I control the level of HDFS replication / block size / other client HDFS property?

...