Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
{toc}

h1. General

h3. 
Table of Contents

General

I'm having problems with feature xxx.

...

 

First check to see if this is an experimental feature or a recommended feature in the [Feature Status] page.  We'll focus on bugs in recommended or beta features before we focus on the experimental features.

...



h3. I'm having a hard time with the argument syntax in the catalog section of the manual.  Help!

...



The general form is

...

Code Block
 

{code}
sinkName(reqArg1, reqArg2[, optArg1="default" [optArg2=0]]{, kwarg1="default", kwarg2=0})

...

{code}

reqArg1 and reqArg2 are positional arguments and required in all instances. \[ \] chars enclose optional positional arguments.  All optional arguments have a default value and must be enumerated in order.  Thus optArg1 and optArg2 are optional positional arguments, and have defaults that get filled in if the are not present. \{ \} chars enclose optional keyword arguments.  All keyword arguments are optional and have a default value and can be enumerated in any order.  Thus kwarg1 and kwarg2 are keyword arguments with defaults.

...



Let's take tailDir as an example.  Here's the definition in the manual.

...


{code
}
tailDir("dirname"[, fileregex=".*"[, startFromEnd=false[, recurseDepth=0]]]{,delim="regex", delimMode="exclude|prev|next"}) 
{code}

Here are some valid examples:

...



{code
}
tailDir("/var/log/app")            // all files 
tailDir("/var/log/app",".*\.log")  // all files with names that match the ".*\.log" regex (in shell this is *.log)
tailDir("/var/log/app",".*\.log", false, 1)  // all files with names that match the ".*\.log" regex, starting from beginning of file, with one level of recursion depth.
tailDir("/var/log/app", delim="\n\n", delimMode="exclude")  // all files with names that match the ".*\.log" regex, starting from beginning of file, with one level of recursion depth, that end with double new lines, excluding the double new lines
tailDir("/var/log/app",".*\.log", false, 1, delim="\n\n", delimMode="exclude")  // all files with names that match the ".*\.log" regex, starting from beginning of file, with one level of recursion depth, that end with double new lines, excluding the double new lines
{code}

Here are some invalid examples (should fail):

...


{code
}
tailDir()                                            // must have at least one arg
tailDir("/var/log/app", ".*", startFromEnd=true, 1)  // positional args by default cannot be used as kwargs
{code} 

Here are some currently valid but likely not what you want examples.

...


{code
}
tailDir("/var/log/app", ".*", startFromEnd=true, recurseDepth=1)  // positional args by default cannot be used as kwargs

{code}


h3. I'm new and I'm having a problem using {{dfs}}, {{customDfs}}/{{formatDfs}}, or {{escapedCustomDfs}}/{{escapedFormatDfs}} sinks.

...

  

You should use the {{collectorSink}}.  It is sufficient for most users and greatly simplifies configuration.  The sinks mentioned above are "low-level" and exposed for advanced users.  HDFS files are not durable until they close or are synced, and these sinks do not automatically do this.  The {{collectorSink}} is smarter and handles periodic closing of files.

...

Agent Side




h1. Agent Side

h3.  I'm generating events from my application and sending it to a flume agent listening for Thrift/Avro RPCs and my timestamps seem to be in the 1970s.

...



Event generated is expected to have unix time in milliseconds. If the data is being generated by an external application, this application must generated data in terms of milliseconds.

...

 

For example, 1305680461000 should result in 5/18/11 01:01:01 GMT, but 1305680461 will result in something like 1/16/70 2:41:20 GMT

...



h1. Collector

...

Can I control the level of HDFS replication / block size / other client HDFS property?

...

 Side

h3. I already use syslog/thrift/scribe and want to just have a collector that spools to disk on failure.  Can I do this?

Yes.  The current solution is complex but seems to work.  

< mask("rolltag") roll(1500) { escapedCustomDfs("hdfs://...", "prefix-%{rolltag}) } ? mask("rolltag") diskFailover insistentAppend stubbornAppend insistentOpen mask("rolltag") roll(1500) { escapedCustomDfs("hdfs://...", "prefix-%{rolltag}) } >


h3. Can I control the level of HDFS replication / block size / other client HDFS property?

Yes.  HDFS block size and replication level are HDFS client parameters, so you should expect them to be set by client.  The parameters you get are probably coming from hadoop-core.*.jar file (it usually contains hdfs-default.xml and friends).  If you want to overwrite the default parameters, you need to set dfs.block.size and dfs.replication in your hdfs-site.xml or flume-site.xml file

...



h3. What is a good amount of time for collector rolling?

...

Plugins

I have a plugin that uses version xxx of Thrift and Flume is using version yyy.

...




h1. Plugins

h3. I have a plugin that uses version xxx of Thrift and Flume is using version yyy.

Thrift versions have been wire-compatible from 0.5.0 to 0.6.0.  Thus an application with a thrift 0.5.0 server should accept data from a thrift 0.6.0 client and vice-versa.  I believe it has been wire compatible since 0.2.0 (NEEDS verification).  The API generated code by the thrift compiler and the runtime libraries for java jars however, break compatibility.  This will require a regeneration of thrift generated code.  We suggest modifying the plugin as opposed to modifying flume or the target application.

...

Trivia

Why do the flume services have crazy port numbers?

...



h1. Trivia


h3. Why do the flume services have crazy port numbers?

The initial flume ports were the telephone numbers corresponding to F-L-U-M-E.  F=3, L=5, U=8, M=6, E=3 => 35863.  After this decision we picked arbitrary ports near that number.  Maybe in a future release we'll pick ports that are easier.

h3.

...

 Where did the name Flume come from?

...



The name Flume is the result a word play. Flume collects log data. Log is also a large tree or branch that has been cut down. A log flume is a narrow stream of water that carries logs.  Get it? Told you it was bad.

...

 :)

Aaron Newton, a Cloudera Alum, actually suggested the name for the Flume project and it just seemed to fit.

...



{quote}.