Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As an optimization, we can save a db lookup in #1 by cache the writeid of modified tables of the transaction. Every modified table will generate a corresponding ALLOC_WRITE_ID_EVENT associate txnid with table writeid generated. Upon we receive commit message of the transaction, we can get the table writeids for the transaction. Thus we don’t need to do a db lookup to find the same information. However, in the initial commit message after bootstrap, we might miss some ALLOC_WRITE_ID_EVENT for the transaction. To address this issue, we will use this optimization unless we saw the open transaction event as well. Otherwise, HMS will still go to the db to fetch the writeids.

Deal with Rename

When we rename a table, writeids are renamed immediately on HMS (TxnHandler.onRename). However, cache won’t update immediately until it catches up the notification log. It is possible the cached table with the same table name is actually another table which is already dropped. To solve the issue, Hive session will fetch tableid of involved tables from db (must bypass cache) at the beginning of the transaction. It can be combined with HMS request for writeid. In every read request, HMS client need to pass tableid as well. HMS will compare the tableid with cached table. If it does not match, HMS will fetch the table from db instead.

External Table

Write id is not valid for external tables. And Hive won’t fetch ValidWriteIdList if the query source is external tables. Without ValidWriteIdList, HMS won’t able to check the staleness of the cache. To solve this problem, we will use the original eventually consistent model for external tables. That is, if the table is external table, Hive will pass null ValidWriteIdList to metastore API/CachedStore. Metastore cache won’t store ValidWriteIdList alongside the entry. When reading, CachedStore always retrieve the current copy. The original notification update will update the metadata of external tables, so we can eventually get updates from external changes.

...

hive_metastore.thriftOld API

New API

get_table(string dbname,string tbl_name)

get_table(string dbname,string tbl_name,,int tableid,string validWriteIdList)

Actually we don’t need to add the new field into every read request because:

...

HMS read API will remain backward compatible for external table. That is, new server can deal with old client. If the old client issue a create_table call, server side will receive the request of create_table with tableid=0 and validWriteIdList=null, and will cache or retrieve the entry regardless(with eventual consistency model). For managed table, tableid and validWriteIdList will be are required and HMS server will throw an exception if validWriteIdList=null.

...

Old API

New API

getTable(String catName,String dbName,String tableName)

getTable(String catName,String dbName,String tableName,int tableid,String validWriteIdList)

Hive.java

...

  1. Generate new write id for every write operation involving managed tables. Since DbTxnManager cache write id for every transaction, so every query will generate at most one new write id for a single table, even if it consists of multiple Hive.java write API calls
  2. Retrieve table write id from config for every read operation if exists Pass the tableid and writeid to the read request of HMS client. It can be retrieved from config (for managed table, it guarantees to be there in config), and pass the write id to HMS API

Changes in Other Components

All other components invoking HMS API directly (bypass Hive.java) will be changed to invoke the newer HMS API. This includes HCatalog, Hive streaming, etc, and other projects using HMS client such as Impala.

For every read request involving table/partitions, HMS client (HiveMetaStoreClient) need to pass a tableid and validWriteIdList string in addition to the existing arguments. tableid and validWriteIdList can be retrieved with txnMgr.getValidWriteIdsAndTblIds(). validWriteIdList can be null if it is external table, as HMS will return whatever in the cache for external table using eventual consistency. But if validWriteIdList=null for managed table, HMS will throw exception. validWriteIdList is a serialized form of ValidReaderWriteIdList. Usually ValidTxnWriteIdList can be obtained from HiveTxnManager using the following code snippet:

Code Block
languagejava
anager txnMgr = TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf);
ValidTxnList txnIds = txnMgr.getValidTxns(); // get global transaction state
ValidTxnWriteIdList
txnWriteIdsTblIds txnWriteIds = txnMgr.getValidWriteIdsgetValidWriteIdsTableIds(txnTables, txnString); // map global transaction state to table specific write id
int tblId = txnWriteIdsTblIds.getTableId(fullTableName);
ValidWriteIdList writeids = txnWriteIds.getTableValidWriteIdList(fullTableName); // get table specific writeid

...