Status
Current state: Under DiscussionImplemented
Discussion thread: https://the-asf.slack.com/archives/CEKUCUNE9/p1585240648004600, #solr-scaling Slack channel
JIRA:
Jira | ||||||||
---|---|---|---|---|---|---|---|---|
|
Jira | ||||||||
---|---|---|---|---|---|---|---|---|
|
Jira | ||||||||
---|---|---|---|---|---|---|---|---|
|
Released: (targeting hopefully 9.0.0 ?)
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast). Confluence supports inline comments that can also be used.
...
Chris Hostetter suggested that in the light of Solr being more and more often used in containerized environments (Docker, Kubernetes), which already provide their own support for up- / down-scaling, the built-in V2 framework in Solr should offer only a bare minimum to support the most common cases (eg. equal placement of replicas across nodes, by # of cores & freedisk). The scope fo the Solr autoscaling would be to adequately support basic needs of standalone (non-containerized) Solr clusters. All other more sophisticated scenarios should be left out, but Solr should provide API hooks to make it easier for external frameworks to react and optimize the layout and resolve resource constraints (eg. too many / too few nodes for the # of replicas).
Clean-cut pluggable APIs
Concerns were raised that the current autoscaling implementation is too intrusive, regardless of its strengths and deficiencies. Ilan Ginzburg, Noble Paul and Andrzej Bialeckiare investigating what a minimal set of APIs could look like. Some others proposed a spike to investigate how much effort would be to remove the autoscaling completely, clean up the existing APIs and add it again as a plugin (using the Plugins framework).
Requirements for the V2 policy engine
...
Additionally, due to the performance issues with the V1 policy engine the new one should be the default for clusters larger than N nodes (where N > 100 ?). It should still be possible to opt-out and default to the current engine.
Phase 1 of the migration: we can implement a cluster & collection property that defines what assignment strategy it should use (with collection-level property overriding the cluster-level property or default if missing). This property would select one of the existing AssignStrategy
implementations or a user-provided custom one. This effectively allows users to switch policy engines on a per-collection basis.
- What impact (if any) will there be on existing users? us
- If we are changing behavior how will we phase out the older behavior?
- If we need special migration tools, describe them here.
- When will we remove the existing behavior?
...