Reader and Writer Interfaces
Table of Contents |
---|
Overview
HCatalog provides a data transfer API for parallel input and output without using MapReduce. This API provides a way to read data from a Hadoop cluster or write data into a Hadoop cluster, using a basic storage abstraction of tables and rows.
...
The HCatalog data transfer API is designed to facilitate integration of external systems with Hadoop.
Note: HCatalog is not thread safe.
HCatReader
Reading is a two-step process in which the first step occurs on the master node of an external system. The second step is done in parallel on multiple slave nodes.
Reads are done on a “ReadEntity”. Before you start to read, you need to define a ReadEntity from which to read. This can be done through ReadEntity.Builder. You can specify a database name, table name, partition, and filter string. For example:
No Format |
---|
ReadEntity.Builder builder = new ReadEntity.Builder();
ReadEntity entity = builder.withDatabase("mydb").withTable("mytbl").build();
|
...
After defining a ReadEntity, you obtain an instance of HCatReader using the ReadEntity and cluster configuration:
No Format |
---|
HCatReader reader = DataTransferFactory.getHCatReader(entity, config);
|
The next step is to obtain a ReaderContext from reader
as follows:
No Format |
---|
ReaderContext cntxt = reader.prepareRead();
|
All of the above steps occur on the master node. The master node then serializes this ReaderContext object and sends it to all the slave nodes. Slave nodes then use this reader context to read data.
No Format |
---|
for(InputSplit split : readCntxt.getSplits()){
HCatReader reader = DataTransferFactory.getHCatReader(split,
readerCntxt.getConf());
Iterator<HCatRecord> itr = reader.read();
while(itr.hasNext()){
HCatRecord read = itr.next();
}
}
|
...
Writes are done on a “WriteEntity” which can be constructed in a fashion similar to reads:
No Format |
---|
WriteEntity.Builder builder = new WriteEntity.Builder();
WriteEntity entity = builder.withDatabase("mydb").withTable("mytbl").build();
|
...
After creating a WriteEntity, the next step is to obtain a WriterContext:
No Format |
---|
HCatWriter writer = DataTransferFactory.getHCatWriter(entity, config);
WriterContext info = writer.prepareWrite();
|
...
On slave nodes, you need to obtain an HCatWriter using WriterContext as follows:
No Format |
---|
HCatWriter writer = DataTransferFactory.getHCatWriter(context);
|
Then, writer
takes an iterator as the argument for the write
method:
No Format |
---|
writer.write(hCatRecordItr);
|
...
A complete java program for the reader and writer examples above can be found here: https://svngithub.com/apache.org/reposhive/asfblob/hive/trunk/hcatalog/core/src/test/java/org/apache/hive/hcatalog/data/TestReaderWriter.java.
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
Previous: Input and Output Interfaces General: HCatalog Manual – WebHCat (Templeton) Manual – Hive Home |