Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the previous discussion, we know if the cache is stale, metastore will serve the request from ObjectStore. However, we still need to catch up the cache with the latest change, in order to serve read requests from cache for future request. This can be done by the existing notification log based cache update mechanism. This mechanism constantly poll from notification log, and update the cache with the data entries in notification log. However, currently there is no ValidWriteIdList in notification log. We need to add ValidWriteIdList of the query to the notification log. During the update, this ValidWriteIdList will be in the cache. We may further optimize the process to put only writeids in notification logs. Metastore can merge writeids into the existing ValidWriteIdList in cache to create a compatible snapshot of the actual ValidWriteIdListHowever, there is another scenario need to consider. If a DML consists of multiple HMS writes, all HMS writes will share the same ValidWriteIdList. Another HMS may catch up only 1 writes by the time the DML is committed. This is not acceptable since HMS cannot return partial changes to the user after the DML is committed, that defeats the purpose of a synchronized cache. To solve this problem, we will maintain a committed flag for every entry. During notification log catch up, we will not set the committed flag for a metastore write. Instead, we will wait for the commit event in the notification log. We will set the committed flag in cache until we receive the commit event from notification log. Also note currently commit event only contains txnid, we need to add ValidWriteIdList to the event. During HMS read, If we find the entry is marked uncommitted, HMS will fetch from db.

Here is a complete flow for a cache update when write happen (and illustrated in the diagram):

  1. The ValidWriteIdList of cached table is initially [13:7,8,12]
  2. Metastore 1 get a alter_table request, the new ValidWriteIdList is [14:7,8,12]. The cache entry in Metastore 1 is updated with the new ValidWriteIdList [14:7,8,12]. Metastore 1 also puts the new table along with ValidWriteIdList [14:7,8,12] to notification log. When the transaction is committed, HMS put the commit event into notification log along with ValidWriteIdList
  3. The ValidWriteIdList entry in Metastore 2 is still [13:7,8,12]. A new query try to fetch the table with the latest write id [14:7,8,12]. It is not compatible with the cached version, so Metastore will fetch the table from ObjectSore
  4. The cache update thread in Metastore 2 will read the alter_table event from notification log, update the cache with [14:7,8,12] which is in the notification log, however, this entry is marked as uncommitted
  5. A read for the entry on Metastore 2 will fetch from db since the entry is marked as uncommitted
  6. The cache update thread will further read commit event from notification log, mark the cache committed
  7. The next read from Metastore 2 will server from cache


External Table

Write id is not valid for external tables. And Hive won’t fetch ValidWriteIdList if the query source is external tables. Without ValidWriteIdList, HMS won’t able to check the staleness of the cache. To solve this problem, we will use the original eventually consistent model for external tables. That is, if the table is external table, Hive will pass null ValidWriteIdList to metastore API/CachedStore. Metastore cache won’t store ValidWriteIdList alongside the entry. When reading, CachedStore always retrieve the current copy. The original notification update will update the metadata of external tables, so we can eventually get updates from external changes.