Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Introduced role modes, removed discussion summary

...

For example:
- Nodes with "data" role MAY host replicas (i.e. nodes without MAY NOT)
- Nodes with (FUTURE ROLE) "zk" role MAY run zk (i.e. nodes without the role MAY NOT)
- Nodes with (IMAGINARY EXAMPLE) "worker" role MAY execute streaming map/reduce work
- Nodes with (IMAGINARY EXAMPLE) "ingest" role MAY run Tika parsing, OCR, data prepping etc


Modes:

  • Every role also has a list of modes under which a node can be. For certain roles (e.g. overseer) it is useful for potentially finer grained control of how strictly or loosely that role applies for that node.
  • Most roles would just have two modes (on, off)
  • In special cases a role might have more modes, e.g. "overseer" role to have (allowed, disallowed, preferred) modes.


The following roles are proposed (based on existing functionality):

  1. “data” datarole: A node with this role can host data hosting replicas. By default, this is the case for all nodes.
    There are two modes (on, off), i.e. a node with role "data:on" can host replicas, whereas nodes with "data:off" cannot host replicas.
  2. overseer“overseer” role: A node with this role indicates that this node is a preferred overseer. When one or more such nodes are live, Solr guarantees that one of those nodes become the overseer. Note: There is no change proposed to the OVERSEER role as it exists today, except that it can now be enabled using startup params, and ADDROLE/REMOVEROLE are deprecatedcan act as an overseer. The modes supported as (allowed, disallowed, preferred). (1) Nodes with "overseer:preferred" will be favoured to function as the overseer leader, (2) nodes with "overseer:allowed" can become the overseer leader if no "overseer:preferred" node is live, and (3) nodes with "overseer:disallowed" mean overseer functionality won't run on these nodes.


Roles that might be introduced in future (specifics are outside the scope of this SIP, except for examples):

  1. “coordinator” role [UPCOMING FEATURE]: This role (modes: on/off) can be associated with a node to where requests can be sent, and this node sends out other remote calls to data hosting nodes, aggregates the results and sends back to user. This will be useful for dealing with distributed query requests, bulk indexing & streaming expressions based queries. See
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keySOLR-15715
    . This is very similar in concept to ElasticSearch's coordinating nodes. A coordinator node would be assumed to have no data hosted on it.
  2. “zk” role [UPCOMING FEATURE]: This role can be associated with nodes that can have embedded ZK nodes. See: https://cwiki.apache.org/confluence/display/SOLR/SIP-14+Embedded+Zookeeper

...

  1. If "-Dsolr.node.roles" parameter is not passed, it is implicitly assumed to be "-Dsolr.nodes.role=data:on,overseer:allowed" (due to backcompat reasons and also so that those who don't use the role feature don't need any extra parameters).
  2. Roles are static and immutable for the entire life cycle of a node. Once a node starts up with a role, it registers the role in ZK and that sticks around until the node is stopped/restarted.
  3. The bar for adding new roles in future should be high so it is not abused as any other tag or label for any tiny feature. It should be reserved for functionality that may benefit from a dedicated set of nodes.

...

Parameter

Value

Required?

Default

solr.node.roles

Comma separated list of roles (in the format: <role>:<mode>) for this node, with each role in lowercase, and no repetitions in the list..
e.g. "data:on,overseer:allowed" or "overseer:preferred"

No

data:on,overseer:allowed


(assumed when parameter is not specified. A subsequent Solr release might have the ability to add a new role here that's turned on by default)


Examples:

  1. Preferred overseer node with no data (dedicated overseer):
     
    -Dsolr.node.roles=overseer:preferred or -Dsolr.node.roles=overseer:preferred,data:off
  2. Preferred overseer with data:
    -Dsolr.node.roles=overseer:preferred,data:on
  3. Regular data node that can also act as an overseer:
    Either
    specify no solr.node.roles param or explicitly specify "-Dsolr.node.roles=data:on,overseer:allowed".
  4. Coordinator node (preview for upcoming feature) that doesn't host data, nor does any overseer duty:
    -Dsolr.node.roles=coordinator:on

Cluster API

As of today, there is ADDROLE and REMOVEROLE APIs to add/remove roles at run time to nodes. It supports only OVERSEERROLE, that designates a preferred overseer. We propose to deprecate this API, and recommend users to use startup params for achieving the same. Supporting both ways (API and startup params) is tricky and will lead to a lot of confusion among users.

...

Proposing the roles as:
* Layer1 nodes are the "data nodes" and hence get either no role defined for them or -Dsolr.node.roles=data:on,overseer:allowed.
* Layer2 nodes are "overseer nodes" (though, only one of them can be an overseer at a time). They get -Dsolr.node.roles=overseer:preferred
* Layer3 nodes are "coordinator nodes", no data must be hosted on these nodes and they are started with -Dsolr.node.roles=coordinator:on

Note: In this configuration, the actual overseer leader will be one of the nodes in layer2. However, if all nodes in layer2 are down, then one of the layer1 nodes (with overseer:allowed) will become the overseer (until a layer2 node isn't back up).

How to Retrieve Roles?

Public API

...

Sample output: {

     “node1”: [“overseer”“overseer:preferred”],

     “node2”: [“overseer”, “data”“overseer:allowed”, “data:on”],

     “node3”: [“data”“data:on”]

}


GET /api/cluster/roles/supported

Sample output: [“overseer”, "data"]:
{
"overseer": {modes: "preferred", "allowed", "disallowed"},
"data": {"modes": ["on", "off"]}
}

Description: Which roles (and their corresponding modes) do this current Solr cluster support?

...

GET /api/cluster/roles/nodes/${nodename}

Sample output: [“overseer”“overseer:preferred”]


GET /api/cluster/roles/${rolename}

Sample output: [{"node2": "preferred", "node3": "allowed"]}


Internal representation in ZK

  • All nodes join live_nodes, as is the case today
  • ZK structure for roles:
      • /node_roles
        • overseer
          • nodes
            •  solr1_8983 (ephemeral node) [zdata: {"mode": "preferred"}]
            •  solr2_8983 (ephemeral node) [zdata: {"mode": "preferred"}]
            •  solr3_8983 (ephemeral node) [zdata: {"mode": "preferred"}]
        • data
          • nodes
            •  solr4_8983 (ephemeral node) [zdata: {"mode": "on"}]
            •  solr5_8983 (ephemeral node) [zdata: {"mode": "on"}]
            •  solr6_8983 (ephemeral node) [zdata: {"mode": "on"}]
            •  solr7_8983 (ephemeral node) [zdata: {"mode": "on"}]
            • ...
        • coordinator (example of a future role)
          • nodes
            • solrcoord1_8983 (ephemeral node) [zdata: {"mode": "on"}]
            • ...

Roles During Application Lifecycle:

...

  • Yes: Role is published as ephemeral nodes in ZK.
  • No: Roles are configured to export the default set of roles (at the time of this SIP, that’s [“data”data:on,overseer:allowed])

4) Node completes any other necessary startup and publishes itself in live_nodes.

...

2) Roles will not be checked by loading config from disk. (ZK ONLY source of truth)

Guidance on adding a new role

  • Do you have a new functionality or existing functionality that you want the users to be able to turn on/off on certain nodes, esp from the point of view of functional (role based) isolation of nodes? Yes: good candidate, No: you might not need a separate role
  • Do you want the functionality associated with the role to be turned on for any user upgrading to this new Solr version (without having to explicitly turn it on)?
    • Yes: Change the current default value for "solr.node.roles" from "data:on,overseer:allowed" to "data:on,overseer:allowed,myrole:on"
    • No: Either don't change the default of "solr.node.roles" or change it from "data:on,overseer:allowed" to "data:on,overseer:allowed,myrole:off"
  • How to tell users who are already using some roles on their nodes on how to turn on this functionality?
    • In upgrade notes and/or in ref guide, instruct the users with language similar to this: "If you're already explicitly using roles (i.e. you are using "solr.node.roles" for your nodes), then you should append ",myrole:on" to all nodes where you wish to enable this functionality (introduced by myrole)"

Other notes

  • Every time a node starts up with specified roles, the node assumes it is the correct role for that node and publishes those roles in ZK after successful startup.
  • If a node is started with a -Dsolr.node.roles parameter that doesn't have a data role (or with data:off), but it already has data hosting replicas on it, the startup fails with an error (and a hint indicating how to move replicas away from this replica).If a coordinator node is started with "data" role also, it fails to startup with a message indicating a node cannot both be coordinator and data node.

Compatibility, Deprecation, and Migration Plan

...

  • Use autoscaling to stop data (replicas) from being placed on nodes. Autoscaling placement rules may be helpful in avoiding replicas getting placed in a certain node. But, that does not mean other nodes can discover who is performing what functionality or tell a node to start with some feature enabled/disabled
  • OVERSEER role is already available today, it indicates "preferred" overseer.

Discussions (summary)

  • No negative roles

There shouldn’t be a concept of “not data” or “not overseer” etc.

Everyone agree

  • Roles on/off by default?

Jason, Ilan, Houston, Jan: All roles should be on by default. Having all roles on by default is less complicated for users, instead of “treating data role differently from other roles”.

Ishan, Noble, ?Gus?: Only those roles to be on by default that are needed for backcompat, so that we don’t take premature decision for any future roles introduced later. When a new role is introduced, whether that new role should be enabled by default or not can be decided then.

  • Which branch to target?

Jan, Ishan, Noble: New feature to be added to 9x branch

  • Need for roles?

Tim, Ilan: new concept of nodes unnecessary since everything that's proposed can be achieved using changes to new autoscaling framework and replica placement plugins. “This proposal in its current form (data and overseer roles) doesn't offer much that can't be reasonably achieved by other means” -- Ilan

...

  • Roles for collections?

Ilan: Role aware collections. “If we make collections role-aware for example (replicas of that collection can only be placed on nodes with a specific role, in addition to the other role based constraints), the set of roles should be user extensible and not fixed.”

Ishan: Role aware collections can be implemented separately later using node roles and placement plugins. As for user extensible roles, a separate concept of user defined node labels (as a separate feature) make more sense. This SIP is more about first class roles (that comes pre-defined with Solr).

  • Configuration

Sysprops vs solr.xml+sysprops vs envvars:

Shawn: Solr.xml and/or envvars

Houston,Ilan: Sysprops and/or envvars

Ishan,Noble: Sysprops

Jan: SIP-11

  • "Capable vs currently providing"
    Gus: Exact quote: "... a way in which roles could be made to express both "I can perform this role if required" and "I am presently performing this role" in zookeeper. The basic idea was that some config in zookeeper that lists the nodes and information about each node has an attribute declaring that it capable of a given set of roles"
    Ishan: No need to complicate this current design, we can take this up later, if needed.