Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Using Event Serializers

The hdfs and file_roll sinks support using event serializers. EventSerializer is an interface that

EventSerializer interface (HDFS & FILE_ROLL sinks)

The EventSerializer interface allows arbitrary serialization of an event. Most users should not have While it is possible to implement this interface directly, although it is an optionmany people will prefer to use one of the Avro serialization implementations built into Flume.

The recommended usage is to serialize your data as Avro. This is a great file format that has a lot of advantages over platform- and language-specific serialization formats. For Avro-serialized events you have two options in Flume: avro_event which is built-in, and writing a custom subclass of AbstractAvroEventSerializer. The first option uses the built-in Flume event schema, while the second option allows you to specify your own avro schema.

Config file syntax

Use the following configuration file syntax to specify using an event serializer in your HDFS sink:

No Format
agent.sinks.svc_7_sink.serializer = avro_event
agent.sinks.svc_7_sink.serializer.compressionCodec = snappy

...

Example config using avro_event with HDFS sink

No Format
agent.sources = svc_src-0_src
agent.channels = svc_chan-0_chan
agent.sinks = svc_sink-0_sink

# Configuration for svc_0
agent.channels.svc_chan-0_chan.type = memory
agent.channels.svc_chan-0_chan.capacity = 100000
agent.channels.svc_chan-0_chan.transactionCapacity = 1000

agent.sources.svc_src-0_src.type = org.apache.flume.source.SyslogTcpSourceSYSLOGTCP
agent.sources.svc_src-0_src.port = 10001
agent.sources.svc_src-0_src.channels = svc_chan-0_chan

agent.sinks.svc_sink-0_sink.type = hdfs
agent.sinks.svc_sink-0_sink.hdfs.fileType = DataStream
agent.sinks.svc_sink-0_sink.hdfs.rollInterval = 300 
agent.sinks.svc_sink-0_sink.hdfs.rollSize = 0 
agent.sinks.svc_sink-0_sink.hdfs.rollCount = 0 
agent.sinks.svc_sink-0_sink.hdfs.batchSize = 1000
agent.sinks.svc_sink-0_sink.hdfs.txnEventMax = 1000
agent.sinks.svc_sink-0_sink.hdfs.path = hdfs://xxxxxxxxxx/user/mpercy/logs/20120521
agent.sinks.svc_sink-0_sink.serializer = avro_event
agent.sinks.svc_sink-0_sink.serializer.compressionCodec = snappy
agent.sinks.svc_sink-0_sink.channel = svc_chan-0_chan

Examples for using AbstractAvroEventSerializer to write a custom schema

...

Additional unit tests / examples: https://svn.apache.org/viewvc/incubator/flume/trunk/flume-ng-core/src/test/java/org/apache/flume/serialization/

In this case, you must specify the Builder of your type as the serializer in the configuration file. For example:

No Format

agent.sinks.svc_0_sink.serializer = com.example.flume.MyCustomSerializer$Builder

This assumes that your Builder is an inner static class. NOTE that your Builder MUST have a public, no-arg constructor.