Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current stateUnder DiscussionImplemented

Discussion thread: https://the-asf.slack.com/archives/CEKUCUNE9/p1585240648004600#solr-scaling Slack channel

JIRA:

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySOLR-14275
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySOLR-14409
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySOLR-14613

Released: (targeting hopefully 9.0.0 ?)

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast). Confluence supports inline comments that can also be used.

...

Chris Hostetter suggested that in the light of Solr being more and more often used in containerized environments (Docker, Kubernetes), which already provide their own support for up- / down-scaling, the built-in V2 framework in Solr should offer only a bare minimum to support the most common cases (eg. equal placement of replicas across nodes, by # of cores & freedisk). The scope fo the Solr autoscaling would be to adequately support basic needs of standalone (non-containerized) Solr clusters. All other more sophisticated scenarios should be left out, but Solr should provide API hooks to make it easier for external frameworks to react and optimize the layout and resolve resource constraints (eg. too many / too few nodes for the # of replicas).

Clean-cut pluggable APIs

Concerns were raised that the current autoscaling implementation is too intrusive, regardless of its strengths and deficiencies. Ilan GinzburgNoble Paul and Andrzej Bialeckiare investigating what a minimal set of APIs could look like. Some others proposed a spike to investigate how much effort would be to remove the autoscaling completely, clean up the existing APIs and add it again as a plugin (using the Plugins framework).

Requirements for the V2 policy engine

...

  1. Use-case #1 is for fault tolerance. Putting more than one replica on the same node does not help with redundancy.
  2. Use-cases #2 and #3 are for for fault tolerance and cost minimzation:
    1. You want to survive one out of three zones failing so you need to distribute shards equally among at least two zones.
    2. You want to have (approximately) equal capacity for each shard in these zones so a zone outage doesn't eliminate a majority of your capacity.
    3. You want to have at least one replica of each shard in a given zone so that you can minimize cross-AZ traffic for searching (which is chargeable in AWS)
    4. Taking all the above scenarios on mind, either all shards of a collection must be hosted in the same two zones or all shards are hosted equally in all three zones to provide both fault tolerance as well as to minimize inter-az cost.
  3. Use-case #4 is useful for workload partitioning for writes vs reads e.g. you might want to pin TLOG replicas to a certain node type optimized for indexing and PULL replicas on nodes optimized for searching.
  4. Use-case #5 is for workload partitioning between analytics, search and .system collections so you can have collections specific to those workloads on nodes optimized for those use-cases.
  5. Use-case #6 is useful to implement autoscaling node groups such that a specific number of nodes are always available and the rest come and go without causing data loss or moving data each time we scale down. It is also useful for workload partitioning between analytics and search use-case e.g. we might want to dedicate a few replicas for streaming expressions and spark jobs on nodes optimized for those and keep other replicas for search only.
  6. Use-case #7 is for balanced utilization of all nodes. This is tricky with disk usage or heavily/lightly loaded collections.
    Multi-tenant use-cases (think a collection per tenant) are trickier because now you want to take care of blast radius as well in case a node or zone goes down.

...

From Ilan:

A minimalistic autoscaling would offer the following properties, expressed in very vague terms:

  • Prevent multiple replicas of a shard from being placed on same node,
  • Try to spread replicas on all nodes (random placement ok)
  • Try to spread replicas for different shards of same collection on all nodes (so that concurrent execution of queries on multiple replicas does use available compute power)
  • Try to spread leaders on all nodes

Then, a periodic task/trigger moving replicas and leaders around would correct the imbalance that may result from the above operations, that therefore can be imperfect, hence the use of "try" in the descriptions (would also support adding a new empty node for example).

On top of the above, being able to spread not only on nodes but on groups of nodes (for example groups representing AZ's) would be helpful.
Nice to have also is auto add replica when nodes go down.

...

Please add your user stories in the following sections...

...

Additionally, due to the performance issues with the V1 policy engine the new one should be the default for clusters larger than N nodes (where N > 100 ?). It should still be possible to opt-out and default to the current engine.

Phase 1 of the migration: we can implement a cluster & collection property that defines what assignment strategy it should use (with collection-level property overriding the cluster-level property or default if missing). This property would select one of the existing AssignStrategy implementations or a user-provided custom one. This effectively allows users to switch policy engines on a per-collection basis.

  • What impact (if any) will there be on existing users? us
  • If we are changing behavior how will we phase out the older behavior?
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

...