Page History

...

Concurrency Note: I/O can be performed on multiple TransactionBatches concurrently. However the transactions within a transaction batch must be consumed sequentially.

See the Javadoc for HiveEndPoint for more information. Generally a user will establish the destination info with HiveEndPoint object and then calls newConnection to make a connection and get back a StreamingConnection object.

...

The StreamingConnection class is used to acquire batches of transactions. Once the connection has been provided by HiveEndPoint the application will generally enter a loop where it calls fetchTransactionBatch and writes a series of transactions. When closing down, the application should call close. See the Javadoc for more information.

TransactionBatch

TransactionBatch is used to write a series of transactions. For each transaction, the application calls beginNextTransaction, write, and then commit or abort as appropriate. See the Javadoc for details. All records in a single Transaction Batch will go to the same bucket. The API randomly picks a bucket for each new TxnBatch in order to spread the data among buckets.

...

RecordWriter is the base interface implemented by all Writers. A Writer is responsible for taking a record in the form of a byte[] containing data in a known format (such as CSV) and writing it out in the format supported by Hive streaming. A RecordWriter may reorder or drop fields from the incoming record if necessary to map them to the corresponding columns in the Hive Table. A streaming client will instantiate an appropriate RecordWriter type and pass it to TransactionBatch. The streaming client does not directly interact with RecordWriter therafter. The TransactionBatch will thereafter use and manage the RecordWriter instance to perform I/O. See the Javadoc for details.

A RecordWriter has two primary functions.

...

Space shortcuts

Child pages

Versions Compared

Old Version 25

New Version 26

Key

TransactionBatch