...
As one of the requirements for batch loading data onto HBase all revisions must be written with the same revision number to uniquely identify each batch update. Thus we have to add a new field to
OutputJobInfo which enables us to pass implementation specific parameters to the underlying storage driver. This method of passing application specific information is a non-invasive step which we will reevaluate once we have some deliverables running.
Code Block | ||
---|---|---|
| ||
private Map<String,String> properties; /** * Set/Get Property information to be passed down to *StorageDriver implementation * put implementation specific storage driver configurations here * @return */ public Map<String,String> getProperties() { return properties; } |
Not Depicted in the diagram is a Constants class for storing the property keys relevant to this storage driver:
Code Block | ||
---|---|---|
| ||
public class HBaseConstants {
public static final String CONF_OUTPUT_VERSION_KEY = HCatConstants.HCAT_DEFAULT_TOPIC_PREFIX+".hbase.outputVersion";
....
}
|
HBaseDirectStorageDriver itself is a pretty straightforward implementation. HBaseDirectOutputFormat decorates HBase's TableOutputFormat or we can implement one ourselves controlling the client directly enabling us better flexibility with tuning ie disabling WAL for higher write rates. This OutputFormat's key is not used and the Value can only be either a an HBase Put or Delete.
Code Block | ||
---|---|---|
| ||
public class HBaseDirectOutputFormat extends OutputFormat<WritableComparable<?>,Writable> implements Configurable { .... } |
...
HBaseBulkOutputFormat also does not use the key field and the value can only be either a Put or a Delete. These classes implements Writable so no extra work is needed to serialize them.
...