Applies to release: Flume 1.2.0 as of 2012-08-12

Using Event Serializers

The hdfs and file_roll sinks support using event serializers. EventSerializer is an interface that allows arbitrary serialization of an event. While it is possible to implement this interface directly, many people will prefer to use one of the Avro serialization implementations built into Flume.

The recommended usage is to serialize your data as Avro. This is a great file format that has a lot of advantages over platform- and language-specific serialization formats. For Avro-serialized events you have two options in Flume: avro_event which is built-in, and writing a custom subclass of AbstractAvroEventSerializer. The first option uses the built-in Flume event schema, while the second option allows you to specify your own avro schema.

Config file syntax

Use the following configuration file syntax to specify using an event serializer in your HDFS sink:

agent.sinks.svc_7_sink.serializer = avro_event
agent.sinks.svc_7_sink.serializer.compressionCodec = snappy

Examples

Example config using avro_event with HDFS sink

agent.sources = src-0
agent.channels = chan-0
agent.sinks = sink-0

agent.channels.chan-0.type = memory
agent.channels.chan-0.capacity = 100000
agent.channels.chan-0.transactionCapacity = 1000

agent.sources.src-0.type = SYSLOGTCP
agent.sources.src-0.port = 10001
agent.sources.src-0.channels = chan-0

agent.sinks.sink-0.type = hdfs
agent.sinks.sink-0.hdfs.fileType = DataStream
agent.sinks.sink-0.hdfs.rollInterval = 300 
agent.sinks.sink-0.hdfs.rollSize = 0 
agent.sinks.sink-0.hdfs.rollCount = 0 
agent.sinks.sink-0.hdfs.batchSize = 1000
agent.sinks.sink-0.hdfs.txnEventMax = 1000
agent.sinks.sink-0.hdfs.path = hdfs://xxxxxxxxxx/user/mpercy/logs/20120521
agent.sinks.sink-0.serializer = avro_event
agent.sinks.sink-0.serializer.compressionCodec = snappy
agent.sinks.sink-0.channel = chan-0

Examples for using AbstractAvroEventSerializer to write a custom schema

An example is provided as a unit test in Git at flume-ng-core/src/test/java/org/apache/flume/serialization/SyslogAvroEventSerializer.java

Additional unit tests / examples in Git at flume-ng-core/src/test/java/org/apache/flume/serialization/

In this case, you must specify the Builder of your type as the serializer in the configuration file. For example:

agent.sinks.svc_0_sink.serializer = com.example.flume.MyCustomSerializer$Builder

This assumes that your Builder is an inner static class. NOTE that your Builder MUST have a public, no-arg constructor.

  • No labels