Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

As one of the requirements for batch loading data onto HBase all revisions must be written with the same revision number to uniquely identify each batch update. Thus we have to add a new field to
OutputJobInfo which enables us to pass implementation specific parameters to the underlying storage driver. This method of passing application specific information is a non-invasive step which we will reevaluate once we have some deliverables running.

Code Block
titleOutputJobInfo.java
  private Map<String,String> properties;

  /**
   * Set/Get Property information to be passed down to *StorageDriver implementation
   * put implementation specific storage driver configurations here
   * @return
   */
  public Map<String,String> getProperties() {
    return properties;
  }

Not Depicted in the diagram is a Constants class for storing the property keys relevant to this storage driver:

Code Block
titleHBaseConstants.java

public class HBaseConstants {

  public static final String CONF_OUTPUT_VERSION_KEY = HCatConstants.HCAT_DEFAULT_TOPIC_PREFIX+".hbase.outputVersion";

   ....

}

HBaseDirectStorageDriver itself is a pretty straightforward implementation. HBaseDirectOutputFormat decorates HBase's TableOutputFormat or we can implement one ourselves controlling the client directly enabling us better flexibility with tuning ie disabling WAL for higher write rates. This OutputFormat's key is not used and the Value can only be either a an HBase Put or Delete.

Code Block
titleHCatDirectOutputFormat.java
public class HBaseDirectOutputFormat extends OutputFormat<WritableComparable<?>,Writable> implements Configurable {
....
}

...

HBaseBulkOutputFormat also does not use the key field and the value can only be either a Put or a Delete. These classes implements Writable so no extra work is needed to serialize them.

...