Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Kite TO part ( for writing to HDFS via Kite ) : https://issues.apache.org/jira/browse/SQOOP-1588 

UPDATE: A design wiki was added later on Kite Connector Design

Requirements

  1. Ability for the user to read from and write to Hbase by choosing the Kite connector, It is implementation detail if we choose to have a standalone Kite-Hbase connector reuse the KiteConnector we have today in some fashion to indicate the data set we will use
  2. Ability to indicate the partition strategy and column/counter/key mapping for hbase data sets
  3. Ability to support delta reads and writes to the Hbase 
  4. Integration tests to prove that we can move data from the JDBC to Hbase and vice versa 
  5. Also if we can make use of Avro IDF it would avoid all the unnecessary back and forth between avro and sqoop object array types to improve the performance.

...

Overall there are 2 ways to implementing this functionality using the KiteSDK

Option 1

Duplicate a lot of the code in KiteConnector and add a new independent connector for KiteHbaseConnector. The major con is the code duplication and effort to support Yet another connector

 

Option 2:

  • Use the current KiteConnector and add a enum to select the type of dataset Kite will create underneath, or parse to URI given in the FromJobConfig and ToJobConfig to figure out the dataset to be HIVE/ Hbase or HDFS

    Code Block
    public enum DataSetType {
      HDFS,
      HBASE,
      HIVE
    }
    // use this enum to determine what dataset kite needs to create underneath
      @Input
      public DataSetType datasetType
     
    or
    // parse this to figure out the data set
      @Input(size = 255, validators = {@Validator(DatasetURIValidator.class)})
    
      public String uri

...