Applies to release: Flume 1.2.0 as of 2012-08-12
Using Event Serializers
The hdfs
and file_roll
sinks support using event serializers. EventSerializer is an interface that
EventSerializer interface (HDFS & FILE_ROLL sinks)
The EventSerializer interface allows arbitrary serialization of an event. Most users should not have While it is possible to implement this interface directly, although it is an optionmany people will prefer to use one of the Avro serialization implementations built into Flume.
The recommended usage is to serialize your data as Avro. This is a great file format that has a lot of advantages over platform- and language-specific serialization formats. For Avro-serialized events you have two options in Flume: avro_event
which is built-in, and writing a custom subclass of AbstractAvroEventSerializer. The first option uses the built-in Flume event schema, while the second option allows you to specify your own avro schema.
Config file syntax
Use the following configuration file syntax to specify using an event serializer in your HDFS sink:
No Format |
---|
agent.sinks.svc_7_sink.serializer = avro_event agent.sinks.svc_7_sink.serializer.compressionCodec = snappy |
...
Example config using avro_event with HDFS sink
No Format |
---|
agent.sources = svc_src-0_src agent.channels = svc_chan-0_chan agent.sinks = svc_sink-0_sink # Configuration for svc_0 agent.channels.svc_chan-0_chan.type = memory agent.channels.svc_chan-0_chan.capacity = 100000 agent.channels.svc_chan-0_chan.transactionCapacity = 1000 agent.sources.svc_src-0_src.type = org.apache.flume.source.SyslogTcpSourceSYSLOGTCP agent.sources.svc_src-0_src.port = 10001 agent.sources.svc_src-0_src.channels = svc_chan-0_chan agent.sinks.svc_sink-0_sink.type = hdfs agent.sinks.svc_sink-0_sink.hdfs.fileType = DataStream agent.sinks.svc_sink-0_sink.hdfs.rollInterval = 300 agent.sinks.svc_sink-0_sink.hdfs.rollSize = 0 agent.sinks.svc_sink-0_sink.hdfs.rollCount = 0 agent.sinks.svc_sink-0_sink.hdfs.batchSize = 1000 agent.sinks.svc_sink-0_sink.hdfs.txnEventMax = 1000 agent.sinks.svc_sink-0_sink.hdfs.path = hdfs://xxxxxxxxxx/user/mpercy/logs/20120521 agent.sinks.svc_sink-0_sink.serializer = avro_event agent.sinks.svc_sink-0_sink.serializer.compressionCodec = snappy agent.sinks.svc_sink-0_sink.channel = svc_chan-0_chan |
Examples for using AbstractAvroEventSerializer to write a custom schema
An example is provided as a unit test : https://svn.apache.org/viewvc/incubator/flume/trunk/in Git at flume-ng-core/src/test/java/org/apache/flume/serialization/SyslogAvroEventSerializer.java?view=markup
Additional unit tests / examples : https://svn.apache.org/viewvc/incubator/flume/trunk/flumein Git at flume-ng-core/src/test/java/org/apache/flume/serialization/
In this case, you must specify the Builder of your type as the serializer
in the configuration file. For example:
No Format |
---|
agent.sinks.svc_0_sink.serializer = com.example.flume.MyCustomSerializer$Builder
|
This assumes that your Builder is an inner static class. NOTE that your Builder MUST have a public, no-arg constructor.