Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • ResourceManager - base class for resource management. Only one instance of resource manager is created per Solr node (CoreContainer)
    • DefaultResourceManager - default implementation.
  • ResourceManagerPoolFactory - base class for creating type-specific pool instances.
    • DefaultResourceManagerPoolFactory - default implementation, containing default registry of pool implementations (currently just cache → CacheManagerPool).
  • ResourceManagerPool - base class for managing components in a pool.
    • CacheManagerPool - pool implementation specific to cache resource management.
  • ChangeListener - listener interface for component limit changes. Pools report any changes to their managed components' limits via this interface.
  • ManagedComponent - interface for components to be managed
  • ManagedComponentId - hierarchical unique component ID
  • SolrResourceContext - component's context that helps to register and unregister the component from its pool(s)
  • ResourceManagerHandler ResourceManagerAPI - public v2 API for pool operations (CRUD) and resource component operations (RUD)

CacheManagerPool implementation

...

CoreContainer creates and initializes a single instance of ResourceManager in its load() method. This instance is configured using a new section file in /resourceMgr/clusterpropsmanagerConfig.json/poolConfigs. Several default pools are always created (at the moment they are all related to SolrIndexSearcher caches) but their parameters can be customized using the clusterprops /resourceMgr/poolConfigs.json.

SolrIndexSearcher.register() now also registers all its caches in their respective pools and unregisters them on close().

...

  • SolrMetricsContext now as a rule is created for each child component, and it includes also the component's metric names and scope. This simplifies the management of metrics, obtaining metrics snapshots - and it was needed in order to construct fully-qualified component IDs for the resource API.
  • SolrCache.warm(...) also re-sets the limits (such as maxSize and maxRamMB) using the old cache's limits - this is to preserve custom limits from the old instance when a new instance is a replacement for the old one.

User-level APIs

Config files

The only change in configuration is a new optional section in /clusterprops.json/poolConfigs. This section contains a map of pre-defined pools and their initial limits and properties.

Remote API

There's a new handler ResourceManagerHandler accessible at /admin/resources (v1 API only for now - I plan to add v2 API once the functionality stabilizes).

NOTE: Persisting the changes is not yet implemented in the PR. Also, the handler supports only local operations - it needs to be modified to dispatch operations to all live nodes.

instance of ResourceManager that is created and initialized uses the configuration in /resourceMgr/managerConfig.json. This contains the typical Solr plugin info, ie. implementation class and its initArgs.

Pool configurations are kept in ZK in /resourceMgr/poolConfigs.json. Changes made to this file via API are watched by all live nodes, and upon change each node refreshes its internal copy of the config and re-configures local pools to match the config.

The content of the pool configurations file is a serialization of ResourceManagerAPI.ResourcePoolConfigs, which is basically a map of pool names to their configurations. Each pool configuration consists of the following:

  • name (required) - unique pool name
  • type (required) - one of the supported pool types (currently only "cache" is supported)
  • poolParams (optional) - a map of arbitrary key-value pairs containing runtime parameters of the pool. Currently supported parameters:
    • scheduleDelaysSeconds - how often the resource manager will invoke the pool's manage() method, which checks and controls the resource usage of its components.
  • poolLimits (optional) - a map of arbitrary key-value pairs containing total resource limits for the pool. Eg. for "cache" type pools these are currently:
    • maxSize - defines the total maximum number of elements in all cache instances in the pool
    • maRamMB - defined the total maximum memory use of all cache instances in the pool

There are several pre-defined pools, which can be listed using the /cluster/resource API.

Example configuration in /resourceMgr/poolConfigs.json:

{
"configs":{
"searcherUserCache":{
"name":"searcherUserCache",
"type":"cache",
"poolParams":{},
"poolLimits":{
"maxSize": 1000,
"maxRamMB":-1}},
...
}

Currently the PR doesn't use other configuration files or system properties.

Remote API

There's a new v2 ResourceManagerAPI accessible at /cluster/resources for managing cluster-level aspects of the framework (such as pool configurations, their limits and parametersrs) and /node/resource for managing node-level parameters (such as directly modifying individual component's limits).

Changes to pool configurations are persisted in ZK . Also, each node watches the changes in this file and upon change it reloads the config and re-configures local pools to match the config - this may include removing, adding pools, changing their limits and parameters.

Per-node (component) operations that select named items all treat the name as a prefix, ie. the selected items are those that match the prefix provided as the name parameter. This is required because of the quickly changing identifiers of the components.

Update operations that use maps of key-value pairs as payload all use the same "partial update" semantics: new or existing values with the same keys are created/updated, null values cause existing keys to be deleted, and all other existing KV pairs are unchangedOperations that select named items (pools or resources) all treat the name as a prefix, ie. the selected items are those that match the prefix provided as the name parameter.

The following operations are supported:

  • Pool operations
    • Read API (GET):
      • (no payload):
    • LIST -
      • lists selected pools and their limits and parameters.
    • STATUS - lists selected pools and their components, and current total resource consumption per pool.
      • Additional boolean request parameters are supported:
        • components - list also all components registered in the pool
        • limits - show limit values for each pool
        • params - show pool parameters
        • values - show current aggregated total values (resource usage) of all components in the pool
    • Write API (POST):
      • create
      CREATE
      • - create a new pool,
      with
      • using the provided ResourcePoolConfig configuration, containing pool name, pool type, and it's initial parameters and resource limits
      (limit.<name>=<value>) and parameters (param.<name>=<value>).
    • DELETE - delete an existing pool (and unregister its components)
      • .
      • delete - delete an existing pool (and unregister its components). The name of the pool to delete can be obtained from the string payload or from the path (eg. /cluster/resources/myPool)
      • setlimits - set, modify or delete existing pool(s) limits. The payload is a map of arbitrary key / value pairs.
      • setparams
      SETLIMITS
      • - set, modify or delete existing
      pool(s) limits
    • SETPARAMS - set, modify or delete existing pool(s) parameters
      • pool(s) parameters. The payload is a map of arbitrary key / value pairs.
  • Component operations
    • Read API (GET):
      • (no payload):
    Resource operations
    • LIST -
      • list components in specified pool(s) and their current resource limits
    • STATUS - list components in specified pool(s) and their current limits and their current monitored values
    • Write API (POST):
      • setlimits - set
      GETLIMITS - get
      • the current limits of specified component(s)
    • SETLIMITS - set the current limits of specified component(s)
      • . Payload is a map of key / value pairs defining the updated limits.
      • delete
      DELETE
      • - unregister specified components from the pool(s) 

Compatibility, Deprecation, and Migration Plan

...

Users can migrate to this framework gradually by specifying concrete resource limits in place of the defaults - the default settings create unlimited pools for searcher caches so the back-compat behavior remains the same.

  • When will we remove the existing behavior?

Test Plan

An integration test TestCacheDynamics has been created to show the behavior of cache resource management under changing resource constraints. Obviously more tests are needed on a real cluster.

An integration test TestResourceManagerIntegration exercises the REST API. 


Ref Guide content

Most of the material in this SIP, plus example configurations, will become a new section in the Ref Guide. 

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.