...
- Many APIs are using a request structure rather than taking individual parameters. So need to add ValidWriteIdList to the request structure instead
- Some APIs already take ValidWriteIdList to invalidate outdated transactional statistics. We don’t need to change the API signature, but will reuse the ValidWriteIdList to validate cached entries in CachedStore
Old API | New API |
create_table(Table tbl) | create_table(Table tbl,string validWriteIdList) |
get_table(string dbname,string tbl_name) | get_table(string dbname,string tbl_name,string validWriteIdList) |
RawStore
ObjectStore will not use the additional field, CachedStore will use it to put in TableWrapper/PartitionWrapper (write), or compare with cached ValidWriteIdList (read).
Old API | New API |
createTable(Table tbl) | createTable(Table tbl,String validWriteIdList) |
getTable(String catName,String dbName,String tableName) | getTable(String catName,String dbName,String tableName,String validWriteIdList) |
Use cases
Write
Hive needs to pass a ValidWriteIdList for every metastore write operation (table/partition). CachedStore will store ValidWriteIdList along with the entry in cache. Every Hive query (either DDL or DML) will retrieve a ValidWriteIdList at the beginning of the query. Let’s look at some examples.
...
- At the beginning of the query, Hive will retrieve the global transaction state and store in config (ValidTxnList.VALID_TXNS_KEY)
- Hive translate ValidTxnList to ValidWriteIdList of the table [13:7,8,12]
- Metastore compare ValidTxnList compare ValidWriteIdList [13:7,8,12] with the cached one [12:7,8] using TxnIdUtils.checkEquivalentWriteIds, if no transaction committed between two states, Metastore return cached table entry
- If the cached ValidTxnList is [12:7], the comparison fails because write id 8 is committed. Metastore will fetch the table from ObjectStore
...
In the previous discussion, we know if the cache is stale, metastore will serve the request from ObjectStore. However, we still need to catch up the cache with the latest change, in order to serve read requests from cache for future request. This can be done by the existing notification log based cache update mechanism. This mechanism constantly poll from notification log, and update the cache with the data entries in notification log. However, currently there is no ValidWriteIdList in notification log. We need to add ValidWriteIdList of the query to the notification log. During the update, this ValidWriteIdList will be in the cache. We may further optimize the process to put only writeids in notification logs. Metastore can merge writeids into the existing ValidWriteIdList in cache to create a compatible snapshot of the actual ValidWriteIdList.
...