Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Fixed link to example HCatReader/HCatWriter code

Reader and Writer Interfaces

Table of Contents

Overview

HCatalog provides a data transfer API for parallel input and output without using MapReduce. This API provides a way to read data from a Hadoop cluster or write data into a Hadoop cluster, using a basic storage abstraction of tables and rows.

...

Reads are done on a “ReadEntity”. Before you start to read, you need to define a ReadEntity from which to read. This can be done through ReadEntity.Builder. You can specify a database name, table name, partition, and filter string. For example:

No Format

ReadEntity.Builder builder = new ReadEntity.Builder();
ReadEntity entity = builder.withDatabase("mydb").withTable("mytbl").build();

...

After defining a ReadEntity, you obtain an instance of HCatReader using the ReadEntity and cluster configuration:

No Format

HCatReader reader = DataTransferFactory.getHCatReader(entity, config);

The next step is to obtain a ReaderContext from reader as follows:

No Format

ReaderContext cntxt = reader.prepareRead();

All of the above steps occur on the master node. The master node then serializes this ReaderContext object and sends it to all the slave nodes. Slave nodes then use this reader context to read data.

No Format

for(InputSplit split : readCntxt.getSplits()){
HCatReader reader = DataTransferFactory.getHCatReader(split,
readerCntxt.getConf());
       Iterator<HCatRecord> itr = reader.read();
       while(itr.hasNext()){
              HCatRecord read = itr.next();
          }
}

...

Writes are done on a “WriteEntity” which can be constructed in a fashion similar to reads:

No Format

WriteEntity.Builder builder = new WriteEntity.Builder();
WriteEntity entity = builder.withDatabase("mydb").withTable("mytbl").build();

...

After creating a WriteEntity, the next step is to obtain a WriterContext:

No Format

HCatWriter writer = DataTransferFactory.getHCatWriter(entity, config);
WriterContext info = writer.prepareWrite();

...

On slave nodes, you need to obtain an HCatWriter using WriterContext as follows:

No Format

HCatWriter writer = DataTransferFactory.getHCatWriter(context);

Then, writer takes an iterator as the argument for the write method:

No Format

writer.write(hCatRecordItr);

...

A complete java program for the reader and writer examples above can be found here: https://svngithub.com/apache.org/reposhive/asfblob/hive/trunk/hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestReaderWriter.java.

 

Panel
titleColorindigo
titleBGColorsilver
titleNavigation Links

Previous: Input and Output Interfaces
Next: Command Line Interface

General: HCatalog ManualWebHCat ManualHive Wiki HomeHive Project Site
Old version of this document (HCatalog 0.5.0): Reader and Writer Interfaces