Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Time based synchronization. Cache will be flushed refreshed periodically from the database. However, we need to make sure cache is consistent during the flushingrefresh. One way is using two caches and do a hot switch once new cache is populated with the latest database image. This needs double memory and during the new cache population, we have to block metastore write operations. We might have some optimization such as table by table population to mitigate these issuesThis is part of the current implementation.

  2. Metastore has an event queue log (currently used for implementing replication purposev2). Object change events will be in the queueThe event log captures all the changes to the metadata object. So we shall be able to monitor the event queue log on every cache instance and invalidate changed entries (

    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyHIVE-18056
    ). This might have a minor lag due to the event propagation, but that should be much shorter than the cache eviction

  3. Maintain a unique id for every object in SQL database (eg, modified timestamp, version id, or md5 signature), which is different every time we change the object in SQL database. We will check the DB if the object is changed for every cache access. However, even check the timestamp in SQL database might take some time if the database latency is high

  4. In addition, we might optionally add a “flush cache” statement in Hive in case user want to enforce a cache flush. However, this should be an admin privilege statement and will complicate our security model.

If the requirements present, we can also work on implementing a cache consistency protocol among multiple metastore instances. Such a protocol will need to replicate changes to all the active metastore before finally committing the change and responding to a client write/update request (perhaps using something similar to a two phase commit protocol). 

Case Study: Presto

Presto has a global metastore client cache in its coordinator (HiveServer 2 equivalent). Note Presto currently only has 1 coordinator in a cluster so it does not suffer cache consistency problem if user only changes objects via Presto. However, if user also changes objects in metastore via Hive, it suffers the same issue.

...

Further, in our design, metastore will read all metastore objects once at startup time (prewarm) and there is no eviction of the metastore objects ever since. The only time we change cache is when user requested a change through metastore client (eg, alter table, alter partition), and upon receiving metastore event of changes made by other metastore server. Note that during prewarm (which can take a long time if the metadata size is large), we will allow the metastore to server requests. If a table has already been cached, the requests for that table (and its partitions and statistics) can be served from the cache. If the table has not been prewarmed yet, the requests for that table will be served from the database (

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyHIVE-18264
).

Currently, the size of the metastore cache can be restricted by a combination of cache whitelist and blacklist patterns (

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyHIVE-18056
). Before a table is cached, it is checked against these filters to decide if it can be cached or not. Similarly, when a table is read, if it does not pass the above filters, it is read from the database and not the cache.

...

For remote metastore updates, we will either use a periodical synchronization (current approach), or monitor event queue log and fetch affected objects from SQL database (

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyHIVE-18661
). Both options are discussed already in “Cache Consistency” section.

...

We already have aggregated stats module in ObjectStore (

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyHIVE-10382
). However, the base column statistics is not cached and needs to fetch from SQL database everytime needed. We plan to port aggregated stats module to CachedStore to use cached column statistics to do the calculation. One design choice yet to make is whether we need to cache aggregated stats, or calculate them on the fly in the CachedStore assuming all column stats are in memory. But in either case, once we turn on aggregate stats in CacheStore, we shall turn off it in ObjectStore (already have a switch) so we don’t do it twice.

...