Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

So far, Solr only has a single type of node, one that is capable of assuming all kinds of tasks. There are usecases where one would like dedicated nodes for specific types of workloads. For example, a dedicated overseer node or a dedicated data node and query node or a node with no data hosted on it, one that can be used for administrative tasks or running plugins etc. Elasticsearch, Vespa etc. have first class support for node roles.

Going forward, once SOLR-15715 is introduced, there would be a distinct role for coordinator nodes. These nodes can be used as query aggregations for distributed requests or streaming expressions and possibly also (later) distributed indexing. This provides for a clean mechanism for users to specify which are the data nodes (stateful) and which are coordinator nodes (stateless), and hence employing heterogeneous deployment strategies.

Proposal

Every node in Solr has one or more “roles”. The following roles are proposed:

  1. “data” role: A node with this role can host data hosting replicas. By default, this is the case for all nodes.
  2. “overseer” role: A node with this role indicates that this node is a preferred overseer. When one or more such nodes are live, Solr guarantees that one of those nodes become the overseer.
  3. “coordinator” role [UPCOMING FEATURE]: This role can be associated with a node to where requests can be sent, and this node sends out other remote calls to data hosting nodes, aggregates the results and sends back to user. This will be useful for dealing with distributed query requests, bulk indexing & streaming expressions based queries. See
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySOLR-15715
    . This is very similar in concept to ElasticSearch's coordinating nodes. A coordinator node would be assumed to have no data hosted on it.
  4. “zookeeper” role [UPCOMING FEATURE]: This role can be associated with nodes that can have embedded ZK nodes. See: https://cwiki.apache.org/confluence/display/SOLR/SIP-14+Embedded+Zookeeper


Notes:

  1. If "-DnodeDsolr.node.roles" parameter is not passed, it is implicitly assumed to be "-DnodesDsolr.nodes.role=data" (due to backcompat reasons and also so that those who don't use the role feature don't need any extra parameters).

Public Interfaces

There will just one supported way to use the roles functionality:

...

  1. Data node that can act as preferred overseer too:
        -DnodeDsolr.node.roles=overseer
        -DnodeDsolr.node.roles=overseer,data
  2. Preferred overseer node with no data (dedicated overseer): -DnodeDsolr.node.roles=overseer
  3. Preferred overseer with data: -DnodeDsolr.node.roles=overseer,data
  4. Coordinator node: -DnodeDsolr.node.roles=coordinator

Cluster API

...

Proposing the roles as:
* Layer1 nodes are the "data nodes" and hence get either no role defined for them or -DnodeDsolr.node.roles=data.
* Layer2 nodes are "overseer nodes" (though, only one of them can be an overseer at a time). They get -DnodeDsolr.node.roles=overseer
* Layer3 nodes are "coordinator nodes", no data must be hosted on these nodes and they are started with -DnodeDsolr.node.roles=coordinator

How to Retrieve Roles?

...

To Read the values use HTTP GET

GET /api/cluster/roles

{

     “node1”: [“overseer”],

     “node2”: [“overseer”, “data”],

     “node3”: [“data”]

}


GET /api/cluster/roles?node=/nodes/node1

[“overseer”]


GET /api/cluster/roles?role=/data

...

["node2",

...

"node3"]


Internal representation in ZK

TBD

  • All nodes join live_nodes, as is the case today
  • ZK structure for roles:
    • /roles
        - overseer
             - solr1:8983 (ephemeral nodes)
             - solr2:8983 ( .. )
        - <rolename>
             - <nodename>
  • Implementation details like these can be fleshed out in the PR
  • "roles.json"  already exists for this purpose, we can also consider adding roles as metadata for each entry in the live_nodes  list.

Other notes

  • Every time a node starts up with specified roles, the node assumes it is the correct role for that node and publishes those roles in ZK after successful startup.
  • If a node is started with a -DnodeDsolr.node.roles parameter that doesn't have a data role, but it already has data hosting replicas on it, the startup fails with an error (and a hint indicating how to move replicas away from this replica).
  • If a coordinator node is started with "data" role also, it fails to startup with a message indicating a node cannot both be coordinator and data node.

...

  • Deprecate APIS ADDROLE, REMOVEROLE (so that ability to change node roles at runtime is removed).
  • New V2 API for GET /api/cluster/roles to have nodes as key (deprecating/replacing the current one)

Major Risks

...

Security considerations

None

...

Testing should mainly focus on how the nodes behave when roles are added to and removed from the nodes. Also, the API would be tested.

Discussions

Here's the mail thread, including a summary at the end. Gmail - First class support for node roles.pdf

Rejected Alternatives

There is no proper alternative today. There are awkward ways to achieve similar functionality:

  • Use autoscaling to stop nodes from being placed on nodes. But, that framework itself has been re-written from Solr 8x to 9x, hence we don’t have a recommendation for users for a consistent way to achieve this. Also, 9x autoscaling framework doesn't support placement plugin chaining, and hence placement plugins shouldn't be used for a first class support of node roles.
  • OVERSEER role is already available today, it indicates "preferred" overseer.