Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents


Proposers

zhangminglei

Approvers

Shaofeng Li

Vinoth Chandar

Status

Current state


Current State

UNDER DISCUSSION

(tick)

IN PROGRESS

(tick)

ABANDONED


COMPLETED


INACTIVE


Discussion threadhere

JIRAhere

Released: 

Current

SQL statements for DML are executed by Hive Driver with concatenation SQL statements in most cases. More importantly, multiple versions of Hive cannot be supported at the moment. The way SQL is concatenated is hard to maintain and the code is easy to break. Not supporting multiple versions of Hive creates a lot of headaches for users to use. 

...

We need to abstract a unified interface completely for all stuff contact with HMS and does not use Driver to execute DML. For supporting multiple version, we can add a shim middle layer to support multiple versions of HMS and allow user to specify the hive version in use. The RFC is divided into two steps to finish, (mainly at the code level). The first part is to support multiple versions of Hive( first priority ) and the second part refactor the DML executorWe do not encourage users to manually configure the Hive version, but we still do provide a configuration for users to set it if they specify it.

Middle layer supports

we can use a wrapper to wrap a IMetaStoreClient and HiveShim , thus we can create different version of IMetaStoreClient by using HiveShim#getHiveMetastoreClient, other functions to be shimmed will also be represented in this interface .The pseudocode below illustrates this process.

...