Reader and Writer Interfaces

Table of Contents

Overview

HCatalog provides a data transfer API for parallel input and output without using MapReduce. This API provides a way to read data from a Hadoop cluster or write data into a Hadoop cluster, using a basic storage abstraction of tables and rows.

...

The HCatalog data transfer API is designed to facilitate integration of external systems with Hadoop.

Note: HCatalog is not thread safe.

HCatReader

Reading is a two-step process in which the first step occurs on the master node of an external system. The second step is done in parallel on multiple slave nodes.

Reads are done on a “ReadEntity”. Before you start to read, you need to define a ReadEntity from which to read. This can be done through ReadEntity.Builder. You can specify a database name, table name, partition, and filter string. For example:

No Format
ReadEntity.Builder builder = new ReadEntity.Builder(); ReadEntity entity = builder.withDatabase("mydb").withTable("mytbl").build();

...

After defining a ReadEntity, you obtain an instance of HCatReader using the ReadEntity and cluster configuration:

No Format
HCatReader reader = DataTransferFactory.getHCatReader(entity, config);

The next step is to obtain a ReaderContext from reader as follows:

No Format
ReaderContext cntxt = reader.prepareRead();

All of the above steps occur on the master node. The master node then serializes this ReaderContext object and sends it to all the slave nodes. Slave nodes then use this reader context to read data.

No Format


for(InputSplit split : readCntxt.getSplits()){
HCatReader reader = DataTransferFactory.getHCatReader(split,
readerCntxt.getConf());
       Iterator<HCatRecord> itr = reader.read();
       while(itr.hasNext()){
              HCatRecord read = itr.next();
          }
}

...

Writes are done on a “WriteEntity” which can be constructed in a fashion similar to reads:

No Format
WriteEntity.Builder builder = new WriteEntity.Builder(); WriteEntity entity = builder.withDatabase("mydb").withTable("mytbl").build();

...

After creating a WriteEntity, the next step is to obtain a WriterContext:

No Format
HCatWriter writer = DataTransferFactory.getHCatWriter(entity, config); WriterContext info = writer.prepareWrite();

...

On slave nodes, you need to obtain an HCatWriter using WriterContext as follows:

No Format
HCatWriter writer = DataTransferFactory.getHCatWriter(context);

Then, writer takes an iterator as the argument for the write method:

No Format
writer.write(hCatRecordItr);

...

A complete java program for the reader and writer examples above can be found here: https://svngithub.com/apache.org/reposhive/asfblob/hive/trunk/hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestReaderWriter.java.

Panel

titleColor	indigo
titleBGColor	silver
title	Navigation Links

Previous: Input and Output Interfaces
Next: Command Line Interface

General: HCatalog Manual – WebHCat (Templeton) Manual – Hive Home
Old version (HCatalog 0.5.0): Reader and Writer Interfaces Wiki Home – Hive Project Site

Space shortcuts

Child pages

Versions Compared

Old Version 1

New Version Current

Key

Reader and Writer Interfaces

Overview

HCatReader

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 1

New Version Current

Key

Reader and Writer Interfaces

Overview

HCatReader