Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A pool has a unique name and a type (eg. "cache"), which is defined by the pool implementation. A pool defines total limits for components of this type in this pool - e.g. "searcherFieldValueCache" pool knows how to handle components of SolrCache type, and it manages all instances of SolrCache in all SolrCore-s that are responsible for field value caching, and it defines total limits for all searcher field value caches across all SolrCore-s. There can be multiple pools of the same type (e.g. "cache") under different names and with different parameters (total limits, schedule, etc), each managing different set of components. Pool configuration specifies the initial limits as well as the interval between management runs - resource manager is responsible for executing each pool's management at the specified intervals.

Limits are expressed as arbitrary name / value pairs, which make sense for the specific pool implementation - e.g. for a "cache" pool type the supported limits are "maxRamMB" and "maxSize". By convention limits use the same names as the component limits (controlled parameters - see below).

...

  • ResourceManager - base class for resource management. Only one instance of resource manager is created per Solr node (CoreContainer)
    • DefaultResourceManager - default implementation.
  • ResourceManagerPoolFactory - base class for creating type-specific pool instances.
    • DefaultResourceManagerPoolFactory - default implementation, containing default registry of pool implementations (currently just "cache " → CacheManagerPool).
  • ResourceManagerPool - base class for managing components in a pool.
    • CacheManagerPool - pool implementation specific to cache resource management.
  • ChangeListener - listener interface for component limit changes. Pools report any changes to their managed components' limits via this interface.
  • ManagedComponent - interface for components to be managed
  • ManagedComponentId - hierarchical unique component ID
  • SolrResourceContext - component's context that helps to register and unregister the component from its pool(s)
  • ResourceManagerHandler - public API for pool operations (CRUD) and resource operations (RUD)

Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.

The definition of a public interface is found on the main SIP page.

CacheManagerPool implementation

This pool implementation manages SolrCache components, and it supports "maxSize" and "maxRamMB" limits.

The control algorithm consists of two phases:

CacheManagerPool implementation

This pool implementation manages SolrCache components, and it supports "maxSize" and "maxRamMB" limits.

The control algorithm consists of two phases:

  • hard limit control - hard limit control - applied only when total monitored resource usage exceeds the pool limit. In this case the controlled parameters are evenly and proportionally reduced by the ratio of actual usage to the total limit.
  • optimization - performed only when the total limit is not exceeded, because it may want to not only shrink but also expand cache sizes , thus making a bad situation worse. Optimization uses hit ratio to determine whether to shrink or to expand each cache individually, while still staying within the total resource limits.

Some background on hitRatio vs. cache size: the relationship between cache size and hit ratio is positive and monotonic, ie. larger cache size leads to a higher hit ratio (an extreme example would be an unlimited cache that has a perfect recall because it keeps all items). On the other hand there's a point where increasing the size yields diminishing returns in terms of higher hit ratio, if we also consider the cost of the resources it consumes. So there's a sweet spot in the cache size where the hit ratio is still "good enough" but the resource consumption is minimized. In the proposed PR this hit ratio threshold is 0.6, which may be probably too high for realistic loads (should we use something like 0.4?).

...

Components are always identified by unique component ID, specific to this instance of a component, because there may be multiple instances of the same component under the same logical path. This is a similar model that already works well with complex Solr metrics (such as gauges), where often an overlap in the life-cycle of logically identical metrics occurs. E.g. when re-opening a searcher a new instance of SolrIndexSearcher is created, but the old one still remains open for some time. The new instance proceeds to register its caches as managed components (the respective pools then correctly reflect the fact that suddenly there's a spike in resource usage because the old searcher is not closed yet). After a while the old searcher is closed, at which point it unregisters its old caches from the framework, which again correctly reflects the fact that some resources have been released.

Proposed Changes

Internal changes

Framework and pool bootstraps

CoreContainer creates and initializes a single instance of ResourceManager in its load() method. This instance is configured using a new section in /clusterprops.json/poolConfigs. Several default pools are always created (at the moment they are all related to SolrIndexSearcher caches) but their parameters can be customized using the clusterprops.

SolrIndexSearcher.register() now also registers all its caches in their respective pools and unregisters them on close().

Other changes

  • SolrMetricsContext now as a rule is created for each child component, and it includes also the component's metric names and scope. This simplifies the management of metrics, obtaining metrics snapshots - and it was needed in order to construct fully-qualified component IDs for the resource API.
  • SolrCache.warm(...) also re-sets the limits (such as maxSize and maxRamMB) using the old cache's limits - this is to preserve custom limits from the old instance when a new instance is a replacement for the old one.

User-level APIs

Config files

Remote API

...

s a spike in resource usage because the old searcher is not closed yet). After a while the old searcher is closed, at which point it unregisters its old caches from the framework, which again correctly reflects the fact that some resources have been released.

Proposed Changes

Internal changes

Framework and pool bootstraps

CoreContainer creates and initializes a single instance of ResourceManager in its load() method. This instance is configured using a new section in /clusterprops.json/poolConfigs. Several default pools are always created (at the moment they are all related to SolrIndexSearcher caches) but their parameters can be customized using the clusterprops.

SolrIndexSearcher.register() now also registers all its caches in their respective pools and unregisters them on close().

Other changes

  • SolrMetricsContext now as a rule is created for each child component, and it includes also the component's metric names and scope. This simplifies the management of metrics, obtaining metrics snapshots - and it was needed in order to construct fully-qualified component IDs for the resource API.
  • SolrCache.warm(...) also re-sets the limits (such as maxSize and maxRamMB) using the old cache's limits - this is to preserve custom limits from the old instance when a new instance is a replacement for the old one.

User-level APIs

Config files

The only change in configuration is a new optional section in /clusterprops.json/poolConfigs. This section contains a map of pre-defined pools and their initial limits and properties.

Remote API

There's a new handler ResourceManagerHandler accessible at /admin/resources (v1 API only for now - I plan to add v2 API once the functionality stabilizes).

NOTE: Persisting the changes is not yet implemented in the PR. Also, the handler supports only local operations - it needs to be modified to dispatch operations to all live nodes.

Operations that select named items (pools or resources) all treat the name as a prefix, ie. the selected items are those that match the prefix provided as the name parameter.

The following operations are supported:

  • Pool operations
    • LIST - lists selected pools and their limits and parameters.
    • STATUS - lists selected pools and their components, and current total resource consumption per pool.
    • CREATE - create a new pool, with the provided limits (limit.<name>=<value>) and parameters (param.<name>=<value>).
    • DELETE - delete an existing pool (and unregister its components)
    • SETLIMITS - set, modify or delete existing pool(s) limits
    • SETPARAMS - set, modify or delete existing pool(s) parameters
  • Resource operations
    • LIST - list components in specified pool(s) and their current resource limits
    • STATUS - list components in specified pool(s) and their current limits and their current monitored values
    • GETLIMITS - get the current limits of specified component(s)
    • SETLIMITS - set the current limits of specified component(s)
    • DELETE - unregister specified components from the pool(s) 

Compatibility, Deprecation, and Migration Plan

...

Users can migrate to this framework gradually by specifying concrete resource limits in place of the defaults - the default settings create unlimited pools for searcher caches so the back-compat behavior remains the same.

  • When will we remove the existing behavior?

Test Plan

.

  • When will we remove the existing behavior?

Test Plan

An integration test TestCacheDynamics has been created to show the behavior of cache resource management under changing resource constraints. Obviously more tests are needed on a real cluster.Describe in few sentences how the SIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.