Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • All nodes join live_nodes, as is the case today
  • ZK structure for roles:
      • /node_roles

    •   - solr1_8983
               {"roles": "data", "overseer"]
       - solr2_8983
             
       {"roles": ["overseer"]}
        • overseer
          znode data: { .. /* some configs for overseer role */ ..}
          •  solr1_8983
          •  solr2_8983
          •  solr3_8983
        • data
          znode data: { .. /* some configs for data role */ ..}
          •  solr4_8983
          •  solr5_8983
          •  solr6_8983
          •  solr7_8983
          • ...
        • coordinator (example of a future role)
          znode data: {.. /* configs.. */}
          • solrcoord1_8983
          • ...

Roles During Application Lifecycle:

1) Roles to be configured for a node once a node is started (via sysprops)

2) If at startup, sysprops are present:

  1. a) Yes: If configured roles are found in ZK, overwrite them with roles specified with sysprops. If no configured roles are present, just add the roles in ZK.
  2. b) If no sysprops are present, roles are configured to export the default set of roles (at the time of this SIP, that’s [“data”])

4) Node completes any other necessary startup and publishes itself in live_nodes.

Usage of roles in code:

1) Roles will be checked in publicly published configuration (i.e. roles API, ZK), and a watches can set to detect any change.

2) Roles will not be checked by loading config from disk (except for sysprops in bin/solr.in.sh). ZK ONLY source of truth.

Other notes

  • Every time a node starts up with specified roles, the node assumes it is the correct role for that node and publishes those roles in ZK after successful startup.
  • If a node is started with a -Dsolr.node.roles parameter that doesn't have a data role, but it already has data hosting replicas on it, the startup fails with an error (and a hint indicating how to move replicas away from this replica).
  • If a coordinator node is started with "data" role also, it fails to startup with a message indicating a node cannot both be coordinator and data node.

...