...
You should use the collectorSink
. It is sufficient for most users and greatly simplifies configuration. The sinks mentioned above are "low-level" and exposed for advanced users. HDFS files are not durable until they close or are synced, and these sinks do not automatically do this. The collectorSink
is smarter and handles periodic closing of files.
Agent Side
I'm generating events from my application and sending it to a flume agent listening for Thrift/Avro RPCs and my timestamps seem to be in the 1970s.
Event generated is expected to have unix time in milliseconds. If the data is being generated by an external application, this application must generated data in terms of milliseconds.
For example, 1305680461000 should result in 5/18/11 01:01:01 GMT, but 1305680461 will result in something like 1/16/70 2:41:20 GMT
Collector Side
Can I control the level of HDFS replication / block size / other client HDFS property?
...