Page History

...

RecordWriter is the base interface implemented by all Writers. A Writer is responsible for taking a record in the form of a byte[] containing data in a known format (such as CSV) and writing it out in the format supported by Hive streaming. A RecordWriter may reorder or drop fields from the incoming record if necessary to map them to the corresponding columns in the Hive Table. A streaming client will instantiate an appropriate RecordWriter type and pass it to TransactionBatch. The streaming client does not directly interact with RecordWriter therafter. The TransactionBatch will thereafter use and manage the RecordWriter instance to perform I/O. See the Javadoc for details.

A RecordWriter has two 's primary functions .are:

Modify input record: This may involve dropping fields from input data if they don’t have corresponding table columns, adding nulls in case of missing fields for certain columns, and changing the order of incoming fields to match the order of fields in the table. This task requires understanding of incoming data format. Not all formats (for example JSON, which includes field names in the data) need this step.
Encode modified record: The encoding involves serialization using an appropriate Hive SerDe.
Identify the bucket to which the record belongs
Write encoded record to Hive using the AcidOutputFormat's record updater for the appropriate bucket.

DelimitedInputWriter

Class DelimitedInputWriter accepts input records that in delimited formats (such as CSV) and writes them to Hive. It reorders the fields if needed, and serializes the record into an Object using LazySimpleSerde, which is then passed on to the underlying AcidOutputFormat's record updater for the appropriate bucket. See Javadoc.

...

This is a base class that handles contains some of the common code needed by RecordWriter objects such as schema lookup and computing the bucket into which a record should belong.

...

Space shortcuts

Child pages

Versions Compared

Old Version 33

New Version 34

Key

DelimitedInputWriter