Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Two new public interfaces are needed, one API addition, one API change and one ReplicaPlacementPlugin method.

 API

Balance Replicas:

v1: (If we want a v1 API)

...

A new super class will take the balancing and placement logic for all built-in PlacementPlugins, the "OrderedNodePlacementPlugin".
Built-in PlacementPlugins will now implement OrderedNodePlacementPlugin, which requires one method:

protected abstract Map<Node, WeightedNode> getBaseWeightedNodes(
  PlacementContext placementContext,
  Set<Node> nodes,
  Iterable<SolrCollection> relevantCollections,
  boolean skipNodesWithErrors) throws PlacementException {} 

This method lets each PlacementPlugin return a mapping of Node to the relevant WeightedNode for that implementation.

WeightedNode is an abstract class that each PlacementPlugin that extends OrderedNodePlacementPlugin must implement. It determines how replicas should be placed for that PlacementPlugin. Nodes with lower weights will have replicas placed on them, and Nodes that have higher weights will have replicas taken off of them.

API

Balance Replicas:

v2:

PUT /api/cluster/balanceReplicasreplicas/balance
{
"nodes": [], (Optional)
"waitForFinalState": false,
  "async": "async"
}

Replace NodeMigrate Replicas: (A change to an existing option)

v1: (If we want a v1 API)

GET /solr/admin/collections?action=REPLACENODE&sourceNode=source-node&targetNode=target-node&targetNodes=target-node1,target-node2

The change is the addition of the targetNodes urlParams, which is optional and replaces targetNode.

An extension of the existing ReplaceNode command)

v2:

POST /api/cluster/nodes/localhost:7574_solr/replacereplicas/migrate
{
"sourceNodes": [],
"targetNodes": [], (Optional) // replaces targetNodeNamedefaults to liveNodes that are not sourceNodes
"waitForFinalState": false,
  "async": "async"
}

...

Solr Operator Interfaces

SolrCloud CRD:Two options, either the HPA creation is done by the Solr Operator or by the user. We can support both.

spec:
...
autoscalescaling:
balanceReplicasOnScaleUppopulatePodsOnScaleUp: falsetrue
evictReplicasOnScaleDownvacatePodsOnScaleDown: true
hpa:
create: true
minimumNodes: 2
maximumNodes: 10
metrics:
...
customSolrKubeOptions:
  horizontalPodAutoscalerOptions:
    behavior: ...

If the user want the Solr Operator to create the HPAmove replicas around when scaling up/down, they will set use the "autoscaleReplicas.hpa.create" to true or set it to false if they want to manage it themselves.
Managing the HPA does come with additional burden, but it does allow the Solr Operator to spin up an autoscaling cluster for users with very little intervention.scaling.populatePodsOnScaleUp" and "scaling.vacatePodsOnScaleDown" options.


The Solr Operator managing the HPA for users will not be a part of this SIP. HPAs are very custom to each users usage of Solr, and Kubernetes makes it easy to point an HPA at a SolrCloud.

We can add an optional HPA to the Solr helm chart.

Since the Operator will not control the HPA, it cannot Its also good for the Solr Operator to know the HPA, so that it can disable it during rolling restarts and other maintenance operations. However due to the new LockedCluster operations logic, we can make sure that scaling does not take place during other cluster maintenance.

Proposed Changes

This feature will require changes to both Solr and the Solr Operator. Since the Solr Operator supports a range of Solr versions, this will not be available for Solr Operator users until they upgrade to a version of Solr that implements this SIP.

...

  • Move replicas off of node, because it will no longer be in use
    • This already exists, and has been improved in SOLR-15803 to optimally place all replicas from a node across the cluster.This needs one more parameter that lets you set multiple targetNodes instead of a single targetNode. (In v2 we can replace the old parameter since its still in experimental state)
    • A new option, MigrateReplicas will be added that the Solr Operator can use in the future to move replicas off of multiple nodes at once.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keySOLR-16855
  • Move replicas onto node, because it is now a part of the cluster
    • This will be a NEW API, and needs to be implemented. Ideally it will work similarly to the above command, but opposite.
    • Instead of just moving replicas onto a node, we will introduce an API to balance replicas across a set of nodes, or a whole SolrCloud.
      Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keySOLR-16806

In order to implement this logic, we would need new interfaces and methods in the placement package, as described above. Since we have 4 different built-in PlacementPlugins, we would need to implement this feature for those built in plugins. Instead of implementing this for each built-in PlacementPlugin, we will rewrite the existing PlacementPlugins to extend OrderedNodePlacementPlugin, which implements computePlacements and computeBalancing. Each PlacementPlugin will then implement a node weighting that will determine where replicas should be placed/moved.

Solr Operator Changes

The Solr Operator would need four changes:

  • If enabled, On scale-down of the statefulsetStatefulSet, first move replicas off of the pods that will be deleted.
  • If enabled, On scale-up of the statefulsetStatefulSet, afterwards move replicas onto the pods that have been created.
  • If the user requests it, create and maintain the HorizontalPodAutoscaler that will do autoscaling for the SolrCloud.
  • During Cloud maintenance, disable the HorizontalPodAutoscaler if it is managing it for the userscaling activity.
  • In the helm chart, manage an HPA for users.


Compatibility, Deprecation, and Migration Plan

  • This feature will require changes to both Solr and the Solr Operator. Since the Solr Operator supports a range of Solr versions, this will not be available for Solr Operator users until they upgrade to a version of Solr that implements this SIP.
  • Existing users of the Solr Operator will see these new features used only when they enable the "autoscaleReplicas" optionby default.
    • This The populatePods option requires a new Solr versionsVersion (9.3), so it cannot be enabled by default until that Solr version is the minimum supported version supported by the Operator.the Operator will skip the logic if it receives an error that indicates the SolrCloud does not support the command.
  • The new MigrateReplicas command is only available in Solr 9.3+, so the ReplaceNode command will be used for now. Later the operator can try to use MigrateReplicas and fallback on ReplaceNode if necessaryThe Replace Node v2 command will have an API change, but v2 is still experimental, so there should be no concern there.

Security considerations

No Security Concerns

...

For the Solr APIs, we can use unit tests to test for dispersion for UtilizeNodeBalanceReplicas and MigrateReplicas, like the current ReplaceNode tests.

...