Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The input format agnostic task of encoding the modified record is delegated to its abstract base class AbstractLazySimpleRecordWriter. This class provides the bulk of the RecordWriter implementation. It internally uses a LazySimpleSerDe to transform the incoming byte[] into an Object required by the underlying writer APIs.

Error Handling

It's imperative for proper functioning of the system that the client of this API handle errors correctly.  Once a TransactionBatch is obtained, if any exception is thrown from TransactionBatch (except SerializationError) should cause the client to call TransactionBatch.close() and start a new batch to write more data and/or redo the work of the last transaction during which the failure occurred.  Not following this may, in rare cases, cause file corruption.  Furthermore, StreamingException should ideally cause the client to perform exponential back off.  This will help the cluster stabilize since the most likely reason for these failures is HDFS overload.

SerializationError indicates that a given tuple could not be parsed.  The client may choose to throw away such tuples or send them to a dead letter queue.  After seeing this exception, more data can be written to the current transaction and further transactions in the same TransactionBatch.

Example – Non-secure Mode

...