Proposers
Approvers
Status
Current state:
Current State | |
---|---|
UNDER DISCUSSION | |
IN PROGRESS | |
ABANDONED | |
COMPLETED | |
INACTIVE |
Discussion thread:
JIRA:
Released:
Current
SQL statements for DML are executed by Hive Driver with concatenation SQL statements in most cases. More importantly, multiple versions of Hive cannot be supported at the moment. The way SQL is concatenated is hard to maintain and the code is easy to break. Not supporting multiple versions of Hive creates a lot of headaches for users to use.
for example, the following function use driver to execute sql.
HiveSyncTool#syncHoodieTable, for creating a database
HoodieHiveClient#createTable, for creating a table
HoodieHiveClient#addPartitionsToTable
HoodieHiveClient#updatePartitionsToTable
HoodieHiveClient#updateTableDefinition, alter table
Other than that, HoodieHiveClient#updateTableProperties, HoodieHiveClient#scanTablePartitions, HoodieHiveClient#doesTableExist and etc, those metadata operation use client api to execute sql.
The Improvement Proposal
We need to abstract a unified interface completely for all stuff contact with HMS and does not use Driver to execute DML. For supporting multiple version, we can add a shim middle layer to support multiple versions of HMS and allow user to specify the hive version in use.
Middle layer supports
we can use a wrapper to wrap a IMetaStoreClient and HiveShim , thus we can create different version of IMetaStoreClient by using HiveShim#getHiveMetastoreClient. The pseudocode below illustrates this process.
Code Block | ||
---|---|---|
| ||
HiveShim hiveShim = HiveShimLoader.loadHiveShim(hiveVersion);
IMetaStoreClient createMetastoreClient() {
return hiveShim.getHiveMetastoreClient(hiveConf);
}
IMetaStoreClient client = createMetastoreClient(); |
DML executor
We can also use the wrapper to wrap all dml operation or a fraction of them like create database, create table, add partition, update partition and etc. The pseudocode below shown the process.
Code Block | ||
---|---|---|
| ||
public void createDatabase(Database database)
throws InvalidObjectException, AlreadyExistsException, MetaException, TException {
client.createDatabase(database);
}
public void createTable(Table table)
throws AlreadyExistsException, InvalidObjectException, MetaException, NoSuchObjectException, TException {
client.createTable(table);
}
public Partition add_partition(Partition partition)
throws InvalidObjectException, AlreadyExistsException, MetaException, TException {
return client.add_partition(partition);
}
public int add_partitions(List<Partition> partitionList)
throws InvalidObjectException, AlreadyExistsException, MetaException, TException {
return client.add_partitions(partitionList);
} |
HiveShimLoader
- Added interface
HiveShim
to define methods that need shimming. - Implemented
HiveShimV1
andHiveShimV2
for 1.x and 2.x respectively. - Added
HiveShimLoader
to automatically load HiveShim.