Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Many APIs are using a request structure rather than taking individual parameters. So need to add ValidWriteIdList to the request structure instead
  2. Some APIs already take ValidWriteIdList to invalidate outdated transactional statistics. We don’t need to change the API signature, but will reuse the ValidWriteIdList to validate cached entries in CachedStore


For HMS write, if validWriteIdList=null, HMS won’t cache the entry at all if this is managed table, and will cache regardless of validWriteIdList if this is external table. For HMS read, if validWriteIdList=null, HMS will return null if it is managed table, and return the cached entry regardless if it is external table.

Thrift API will remain backward compatible. That is, new server can deal with old client. If the old client issue a create_table call, server side will receive the request of create_table with validWriteIdList=null, and won’t cache the entry at all if this is managed table.

hive_metastore.thriftOld API

New API

create_table(Table tbl)

create_table(Table tbl,string validWriteIdList)

get_table(string dbname,string tbl_name)

get_table(string dbname,string tbl_name,string validWriteIdList)

...

All other components invoking HMS API directly (bypass Hive.java) will be changed to invoke the newer HMS API. This includes HCatalog, Hive streaming, etc, and other projects using HMS client such as Impala.

Use cases

Write

Hive needs to pass a ValidWriteIdList for every metastore write operation (table/partition). CachedStore will store ValidWriteIdList along with the entry in cache. Every Hive query (either DDL or DML) will retrieve a ValidWriteIdList at the beginning of the query. Let’s look at some examples.

...