Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Ability for the user to read from and write to Hbase by choosing the Kite connector, It is implementation detail if we choose to have a standalone Kite-Hbase connector reuse the KiteConnector we have today in some fashion to indicate the data set we will use
  2. Ability to indicate the partition strategy and column/counter/key mapping for hbase data sets
  3. Ability to support delta reads and writes to the Hbase 
  4. Integration tests to prove that we can move data from the JDBC to Hbase and vice versa 
  5. Also if we can make use of Avo Avro IDF it would avoid all the unnecessary back and forth between avro and sqoop object array types to improve the performance.

...

  1. No code duplication
  2. No weird build dependency of KiteHbaseConnector depending on KiteConnector that might make independent connector upgrade complicated

Implementation Details

...

( With Option#2)

  • Rely on the uri for Hbase dataset to be setup with required mappings. Rely on Kite-HDFS partitioning for hbase partitioning  strategy setupAdd support for Hbase related configs for column mapping and paritioning
  • KiteExtractor to support creating Hbase datasets via Kite SDK and reading records records and piggyback on partitioning implementation of Kite-HDFS
  • KiteLoader to support creating Hbase datasets via Kite SDK and writing records ( merge temp data sets), this needs to be investigated more.How will the Hbase write happen? How different is it from HDFS write or HIVE write?. Unlike Kite-HDFS that has the ability to create temp datasets and merge them only when job succeeds ( commit phase), in case  of Hbase we cannot do that, we have to commit as we write. We are aware that at this point if a job/task failure happens, there can be partial commits and or dupes.
  • If we support DFM, add relevant DFM configs and code in KiteConnector  - TBD

 

Testing 

Integration test suite will be enhanced to add support for the JDBC-KiteHBaseConenctor and vice versa

...