Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. HMS will still compare requested writeid and cached table writeid to decide if the request can serve from cache
  2. Every add/remove/alter/rename partition request will increment the table writeid. HMS will mark cached table entry invalid upon processing the first write message from notification log, and mark it valid and tag with the right writeid upon processing the commit message from notification log

Write

There is no change on HMS write side. HMS will write data into db and also put an entry in notification logEvery write request will advance the write id for the table for both DML/DDL. The writeid will be marked committed locally in HMS client. The next read request is guaranteed to read from db until the notification log catch up to the commit message of the transaction commit, since the writeid is newer than the cache (the writeid for the transaction is committed locally, but is not committed on HMS until notification log catch up).

Commit

When the transaction is committed, HMS will put writeid of the modified tables during the query into notification log. HMS will retrieve the writeid from db. As an optimization, HMS client may also pass a flag to indicate this is a read-only transaction, HMS would not pull writeid from db if it is read-only.

...

The other benefit of this approach is it provide right semantic even if a transaction consists of more than one HMS write request. In that case, there are multiple copies of the cache entry could associate with a single writeid. For example, if there are two alter table request in the transaction, one request route to HMS 1 and the other route to HMS 2. We might not be able to tell which copy is applicable for this writeid if not carefully designed. By using notification log and only make cache available upon commit message, we can make sure we apply all request of the transaction. Once the commit message is processed, we can make sure the copy we have in cache is correct after this transaction.

The other thing to note is this approach does not This approach also provide read your own writes guarantee. Reads within the transaction cannot request newest copy which contains its own write, since the transaction is not committed thus the cached copy may not be correct. It will be correct until HMS apply all writes of the transaction from notification log. So the reads might end up reading the old copy from cache (consider the next read route to the other HMS). I am not sure if there’s a use case read the entry just written in Hive. If it does, we need to cache the written copy in client, so the next time client reads, it will be retrieved from client cache to make sure it contains its own writeswill use a writeid list which marks the current transaction committed, that’s guaranteed to be newer than the cached entry, and thus HMS will go to db to fetch the fresh copy.

API changes

When reading, we need to pass a writeid list so HMS can compare it with the cached version. For every commit, we need to tell HMS what is the writeid for the tables changed during the transaction, so HMS can correctly tag the entry with the writeid. We need to add ValidWriteIdList into metastore thrift API and RawStore interfaces, including all table/partition related read calls and commit transaction calls.

...