Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current stateUnder DiscussionImplemented

Discussion thread: https://the-asf.slack.com/archives/CEKUCUNE9/p1585240648004600#solr-scaling Slack channel

JIRA:

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySOLR-14275
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySOLR-14409
Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySOLR-14613

Released: (targeting hopefully 9.0.0 ?)

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast). Confluence supports inline comments that can also be used.

...

Chris Hostetter suggested that in the light of Solr being more and more often used in containerized environments (Docker, Kubernetes), which already provide their own support for up- / down-scaling, the built-in V2 framework in Solr should offer only a bare minimum to support the most common cases (eg. equal placement of replicas across nodes, by # of cores & freedisk). The scope fo the Solr autoscaling would be to adequately support basic needs of standalone (non-containerized) Solr clusters. All other more sophisticated scenarios should be left out, but Solr should provide API hooks to make it easier for external frameworks to react and optimize the layout and resolve resource constraints (eg. too many / too few nodes for the # of replicas).

Clean-cut pluggable APIs

Concerns were raised that the current autoscaling implementation is too intrusive, regardless of its strengths and deficiencies. Ilan GinzburgNoble Paul and Andrzej Bialeckiare investigating what a minimal set of APIs could look like. Some others proposed a spike to investigate how much effort would be to remove the autoscaling completely, clean up the existing APIs and add it again as a plugin (using the Plugins framework).

Requirements for the V2 policy engine

...

Additionally, due to the performance issues with the V1 policy engine the new one should be the default for clusters larger than N nodes (where N > 100 ?). It should still be possible to opt-out and default to the current engine.

Phase 1 of the migration: we can implement a cluster & collection property that defines what assignment strategy it should use (with collection-level property overriding the cluster-level property or default if missing). This property would select one of the existing AssignStrategy implementations or a user-provided custom one. This effectively allows users to switch policy engines on a per-collection basis.

  • What impact (if any) will there be on existing users? us
  • If we are changing behavior how will we phase out the older behavior?
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

...