Proposers

Approvers

Status

Current state:

	Current State
UNDER DISCUSSION
IN PROGRESS
ABANDONED
COMPLETED
INACTIVE

Discussion thread: here

JIRA: here

Released:

Current

SQL statements for DML are executed by Hive Driver with concatenation SQL statements in most cases. More importantly, multiple versions of Hive cannot be supported at the moment. The way SQL is concatenated is hard to maintain and the code is easy to break. Not supporting multiple versions of Hive creates a lot of headaches for users to use.

...

We need to abstract a unified interface completely for all stuff contact with HMS and does not use Driver to execute DML. For supporting multiple version, we can add a shim middle layer to support multiple versions of HMS and allow user to specify the hive version in use. The RFC is divided into two steps to finish, (mainly at the code level). The first part is to support multiple versions of Hive( first priority ) and the second part refactor the DML executor. We do not encourage users to manually configure the Hive version, but we still do provide a configuration for users to set it if they specify it.

Middle layer supports

we can use a wrapper to wrap a IMetaStoreClient and HiveShim , thus we can create different version of IMetaStoreClient by using HiveShim#getHiveMetastoreClient. The , other functions to be shimmed will also be represented in this interface .The pseudocode below illustrates this process.

Code Block

language	java

public interface HiveShim extends Serializable {
    
   /**
	 * Create a Hive Metastore client based on the given HiveConf object.
	 *
	 * @param hiveConf HiveConf instance
	 * @return an IMetaStoreClient instance
	 */
   IMetaStoreClient getHiveMetastoreClient(HiveConf hiveConf);

   	/**
	 * Alters a Hive table.
	 *
	 * @param client       the Hive metastore client
	 * @param databaseName the name of the database to which the table belongs
	 * @param tableName    the name of the table to be altered
	 * @param table        the new Hive table
	 */	
    
   void alterTable(IMetaStoreClient client, String databaseName, String tableName, Table table)
			throws InvalidOperationException, MetaException, TException;   

   void alterPartition(IMetaStoreClient client, String databaseName, String tableName, Partition partition)
			throws InvalidOperationException, MetaException, TException;
   ...
   ...
   ...
}

HiveShim hiveShim = HiveShimLoader.loadHiveShim(hiveVersion);

IMetaStoreClient createMetastoreClient() {
    return hiveShim.getHiveMetastoreClient(hiveConf);
}

IMetaStoreClient client = createMetastoreClient();

...

Code Block

language	java

	public void createDatabase(Database database)
			throws InvalidObjectException, AlreadyExistsException, MetaException, TException {
		client.createDatabase(database);
	}

	public void createTable(Table table)
			throws AlreadyExistsException, InvalidObjectException, MetaException, NoSuchObjectException, TException {
		client.createTable(table);
	}

 		public Partition add_partition(Partition partition)
			throws InvalidObjectException, AlreadyExistsException, MetaException, TException {
		return client.add_partition(partition);
	}

	public int add_partitions(List<Partition> partitionList)
			throws InvalidObjectException, AlreadyExistsException, MetaException, TException {
		return client.add_partitions(partitionList);
	}

HiveShimLoader

we add a shim loader to load HiveShim.

Added interface HiveShim to define methods that need shimming.
Implemented HiveShimV1 and HiveShimV2 for 1.x and 2.x respectively.
Added HiveShimLoader to automatically load HiveShim.

hudi on flink refactor

StreamWriteOperatorCoordinator#syncHive needs to rewrite in order to fit the new framework.

Space shortcuts

Page tree

Versions Compared

Old Version 3

New Version Current

Key

Proposers

Approvers

Status

Current

Middle layer supports

HiveShimLoader

hudi on flink refactor

hudi on spark refactor

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 3

New Version Current

Key

Proposers

Approvers

Status

Current

Middle layer supports

HiveShimLoader

hudi on flink refactor

hudi on spark refactor