Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Proposal

Every node in Solr has to have one or more “roles”.

What is a role?

A role is a designation of a node that indicates that the node may perform a certain functionality that is governed by the role. A node that doesn't have a role may not perform the functionality associated with the role.

For example:
- Nodes with "data" role MAY host replicas (i.e. nodes without MAY NOT)
- Nodes with (FUTURE ROLE) "zk" role MAY run zk (i.e. nodes without the role MAY NOT)
- Nodes with (IMAGINARY EXAMPLE) "worker" role MAY execute streaming map/reduce work
- Nodes with (IMAGINARY EXAMPLE) "ingest" role MAY run Tika parsing, OCR, data prepping etc


Modes:

  • Every role also has a list of modes under which a node can be. For certain roles (e.g. overseer) it is useful for potentially finer grained control of how strictly or loosely that role applies for that node.
  • Most roles would just have two modes (on, off)
  • In special cases a role might have more modes, e.g. "overseer" role to have (allowed, disallowed, preferred) modes.
  • For every role, one of the modes is assumed to be a defaultIfAbsent (see below in roles API section, supported GET call), i.e. on a node that doesn't specify that role, the mode for that role that is assumed.
    • For example, if a node starts with "-Dsolr.node.roles=data:on", then it will be assumed that the node has overseer has mode "disallowed" (i.e. the defaultIfAbsent mode of the overseer role).
    • Note: Users don't need to bother about this concept much. This is for tighter representation of the roles and modes in our system for implementation purposes, and for developers implementing new roles.


The following roles are proposed (based on existing functionality):

  1. “data” datarole: A node with this role can host data hosting replicas. By default, this is the case for all nodes. There are two modes (on, off), i.e.
    a node with role "data:on" can host replicas, whereas nodes with "data:off" cannot host replicas.
  2. overseer“overseer” role: A node with this role indicates that this node is a preferred overseer. When one or more such nodes are live, Solr guarantees that one of those nodes become the overseer.can act as an overseer. The modes supported as (allowed, disallowed, preferred). (1) Nodes with "overseer:preferred" will be favoured to function as the overseer leader, (2) nodes with "overseer:allowed" can become the overseer leader if no "overseer:preferred" node is live, and (3) nodes with "overseer:disallowed" mean overseer functionality won't run on these nodes.


Roles that might be introduced in future (specifics are outside the scope of this SIP, except for examples):

  1. “coordinator” role [UPCOMING FEATURE]: This role (modes: on/off) can be associated with a node to where requests can be sent, and this node sends out other remote calls to data hosting nodes, aggregates the results and sends back to user. This will be useful for dealing with distributed query requests, bulk indexing & streaming expressions based queries. See
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySOLR-15715
    . This is very similar in concept to ElasticSearch's coordinating nodes. A coordinator node would be assumed to have no data hosted on it.
  2. “zookeeper” “zk” role [UPCOMING FEATURE]: This role can be associated with nodes that can have embedded ZK nodes. See: https://cwiki.apache.org/confluence/display/SOLR/SIP-14+Embedded+Zookeeper

...

  1. If "-Dsolr.node.roles" parameter is not passed, it is implicitly assumed to be "-Dsolr.nodes.role=data:on,overseer:allowed" (due to backcompat reasons and also so that those who don't use the role feature don't need any extra parameters).
  2. Roles are static and immutable for the entire life cycle of a node. Once a node starts up with a role, it registers the role in ZK and that sticks around until the node is stopped/restarted.
  3. The bar for adding new roles in future should be high so it is not abused as any other tag or label for any tiny feature. It should be reserved for functionality that may benefit from a dedicated set of nodes.

...

There will just one supported way to use the roles functionality:

Startup

...

parameter (sysprop)

Parameter

Value

Required?

Default

solr.node.roles

Comma separated list of roles (in the format: <role>:<mode>) for this node.
e.g. "data:on,overseer:allowed" or "overseer:preferred"

No

data:on,overseer:allowed


(assumed when parameter is not specified. A subsequent Solr release might have the ability to add a new role here that's turned on by default)


Examples:

  1. Data node that can act as preferred overseer too:
        Preferred overseer node with no data (dedicated overseer):
     
    -Dsolr.node.roles=overseer
        :preferred or -Dsolr.node.roles=overseer:preferred,data:off
  2. Preferred overseer node with no data (dedicated overseer):
    -Dsolr.node.roles=overseer
    :preferred,data:on
  3. Regular data node that can also act as an overseer:
    Either
    specify no solr.node.roles param or explicitly specify "Preferred overseer with data: -Dsolr.node.roles=overseer,datadata:on,overseer:allowed".
  4. Coordinator node (preview for upcoming feature) that doesn't host data, nor does any overseer duty:
    -Dsolr.node.roles=coordinator:on

Cluster API

As of today, there is ADDROLE and REMOVEROLE APIs to add/remove roles at run time to nodes. It supports only OVERSEERROLE, that designates a preferred overseer. We propose to deprecate this API, and recommend users to use startup params for achieving the same. Supporting both ways (API and startup params) is tricky and will lead to a lot of confusion among users.

...

Proposing the roles as:
* Layer1 nodes are the "data nodes" and hence get either no role defined for them or -Dsolr.node.roles=data:on,overseer:allowed.
* Layer2 nodes are "overseer nodes" (though, only one of them can be an overseer at a time). They get -Dsolr.node.roles=overseer:preferred
* Layer3 nodes are "coordinator nodes", no data must be hosted on these nodes and they are started with -Dsolr.node.roles=coordinator:on

Note: In this configuration, the actual overseer leader will be one of the nodes in layer2. However, if all nodes in layer2 are down, then one of the layer1 nodes (with overseer:allowed) will become the overseer (until a layer2 node isn't back up).

How to Retrieve Roles?

Public API

To Read the values use HTTP GET

GET /api/cluster/roles

Sample output: {

     “node1”: [“overseer”“overseer:preferred”],

     “node2”: [“overseer”, “data”“overseer:allowed”, “data:on”],

     “node3”: [“data:on”]

}


GET /api/cluster/roles/supported

Sample output:
{
"overseer": {modes: ["preferred", "allowed", "disallowed"], "defaultIfAbsent":
[“data”]
}"disallowed"},
"data": {"modes": ["on", "off"], "defaultIfAbsent": "off"}
}

Description: Which roles (and their corresponding modes) do this current Solr cluster support?


GET /api/cluster/roles/nodes/node1${nodename}

Sample output: [“overseer:preferred”[“overseer”]


GET /api/cluster/roles/data${rolename}

Sample output: {["node2": "preferred", "node3"]: "allowed"}


Internal representation in ZK

  • All nodes join live_nodes, as is the case today
  • ZK structure for roles:
      • /node_roles
        • overseer
            znode data: { .. /* some configs for overseer role */ ..}
            • preferred 
              • nodes
                •  solr1_8983 (ephemeral node) 
                •  solr2_8983 (ephemeral node)
            • allowed
              • nodes
                •  solr3_8983 (ephemeral node)
            • disallowed
              • nodes
                • solr4_8983 (ephemeral node)
                • solr5_8983 (ephemeral node)
                • solrcoord1_8983 (ephemeral node)
          • data
            znode data: { .. /* some configs for data role */ ..}
            • on 
              • nodes
                •  solr4_8983 (ephemeral node) 
                •  solr5_8983 (ephemeral node) 
            • off
              • nodes
                • solr1_8983 (ephemeral node)
               solr6
                • solr2_8983 (ephemeral node)
               solr7
                • solr3_8983
              ...
                • (ephemeral node)
                • solrcoord1_8983 (ephemeral node)
          • coordinator (example of a future role)
              znode data: {.. /* configs.. */}
              • on
                • nodes
                  • solrcoord1_8983 (ephemeral node)
              • off
                • nodes
                  • solr1_8983 (ephemeral node)
                  • solr2_8983 (ephemeral node)
                  • solr3_8983 (ephemeral node)
                  • solr4_8983 (ephemeral node)
                  • solr5_8983 (ephemeral node)
              • solrcoord1_8983
              • ...

      Roles During Application Lifecycle:

      ...

      2) If at startup, sysprops are present:

      ...

      • Yes:

      ...

      • Role is published as ephemeral nodes in ZK.
      • No: Roles are configured to export the default set of roles (at the time of this SIP, that’s [

      ...

      • data:on,overseer:allowed])

      4) Node completes any other necessary startup and publishes itself in live_nodes.

      ...

      1) Roles will be checked in publicly published configuration (i.e. roles API, ZK), and a watches can set to detect any change.zk)

      2) Roles will not be checked by loading config from disk (except for sysprops in bin/solr.in.sh). ZK ONLY source of truth. (ZK ONLY source of truth)

      Guidance on adding a new role

      • Do you have a new functionality or existing functionality that you want the users to be able to turn on/off on certain nodes, esp from the point of view of functional (role based) isolation of nodes? Yes: good candidate, No: you might not need a separate role
      • Do you want the functionality associated with the role to be turned on for any user (not already using roles functionality) upgrading to this new Solr version (without having to explicitly turn it on)?
        • Yes: Change the current default value for "solr.node.roles" from "data:on,overseer:allowed" to "data:on,overseer:allowed,myrole:on"
        • No: Either don't change the default of "solr.node.roles" or change it from "data:on,overseer:allowed" to "data:on,overseer:allowed,myrole:off"
      • How to tell users who are already using some roles on their nodes on how to turn on this functionality?
        • In upgrade notes and/or in ref guide, instruct the users with language similar to this: "If you're already explicitly using roles (i.e. you are using "solr.node.roles" for your nodes), then you should append ",myrole:on" to all nodes where you wish to enable this functionality (introduced by myrole)"
      • Designate one of the modes as a defaultIfAbsent. Most likely that's going to be "off" or "disallowed" etc.. This affects only those nodes where some roles are explicitly or implicitly configured, but this new role is not present.

      Other notes

      • Every time a node starts up with specified roles, the node assumes it is the correct role for that node and publishes those roles in ZK after successful startup.
      • If a node is started with a -Dsolr.node.roles parameter that doesn't have a data role (or with data:off), but it already has data hosting replicas on it, the startup fails with an error (and a hint indicating how to move replicas away from this replica).If a coordinator node is started with "data" role also, it fails to startup with a message indicating a node cannot both be coordinator and data node.

      Compatibility, Deprecation, and Migration Plan

      ...

      Discussions

      Here's the mail thread, including a summary at the end. Gmail - First class support for node roles.pdf. roles discussion - 1.pdf (first 100 mails in the thread) and roles discussion - 2.pdf (next 29 mails in the thread).

      Rejected Alternatives

      There is no proper alternative today. There are awkward ways to achieve similar functionality:

      • Use autoscaling to stop data (replicas) from being placed on nodes. But, that framework itself has been re-written from Solr 8x to 9x, hence we don’t have a recommendation for users for a consistent way to achieve this. Also, 9x autoscaling framework doesn't support placement plugin chaining, and hence placement plugins shouldn't be used for a first class support of node roles.Autoscaling placement rules may be helpful in avoiding replicas getting placed in a certain node. But, that does not mean other nodes can discover who is performing what functionality or tell a node to start with some feature enabled/disabled
      • OVERSEER role is already available today, it indicates "preferred" overseer.

      Discussions (summary)

      • No negative roles

      There shouldn’t be a concept of “not data” or “not overseer” etc.

      Everyone agree

      • Roles on/off by default?

      Jason, Ilan, Houston, Jan: All roles should be on by default. Having all roles on by default is less complicated for users, instead of “treating data role differently from other roles”.

      Ishan, Noble, ?Gus?: Only those roles to be on by default that are needed for backcompat, so that we don’t take premature decision for any future roles introduced later. When a new role is introduced, whether that new role should be enabled by default or not can be decided then.

      • Which branch to target?

      Jan, Ishan, Noble: New feature to be added to 9x branch

      • Need for roles?

      ...

      • overseer

      ...

      • Roles for collections?

      ...

      • .

      ...

      Ishan: Role aware collections can be implemented separately later using node roles and placement plugins. As for user extensible roles, a separate concept of user defined node labels (as a separate feature) make more sense. This SIP is more about first class roles (that comes pre-defined with Solr).

      • Configuration

      Sysprops vs solr.xml+sysprops vs envvars:

      Shawn: Solr.xml and/or envvars

      Houston,Ilan: Sysprops and/or envvars

      Ishan,Noble: Sysprops

      ...