IDIEP-44
Author
Sponsor
Created

  

Status

COMPLETED


Motivation

Thin clients connect to one or more known servers. The set of endpoints is defined in ClientConfiguration and can not change after client start. However, Ignite clusters can be dynamic: nodes start and stop, IP addresses change. This is especially true in cloud environments, Kubernetes, and so on. In particular, Best Effort Affinity requires connections to all nodes to function efficiently.

Thin clients should be able to discover all server nodes automatically when connected to any of them, and maintain an up to date list of servers at all times. This behavior should be enabled when ClientConfiguration.PartitionAwarenessEnabled property is true.

Description

Clients can track topology changes: IEP-23 Best Effort Affinity introduced a response header change that sends topology version whenever it changes.

All we need is a new client operation to retrieve server node endpoints - IP:port combinations that every server listens to.

OP_CLUSTER_GROUP_GET_NODE_ENDPOINTS = 5102

Request
longstartTopologyVersion
longendTopologyVersion

To avoid transferring full topology every time, client can retrieve topology diff (joined nodes + disconnected nodes), providing startTopologyVersion and endTopologyVersion.

startTopologyVersion and/or endTopologyVersion can be -1, which means min and max, respectively. First request should be -1 in both to retrieve initial topology with endpoints. Consequent requests retrieve only the diff.

Response
longtopologyVersion (actual topology version in this response)
intJoined node count
JoinedNode * count
JoinedNode
UUIDNode id
intClient listener port
intAddress count
String * count

IP or host name

intDisconnected node count
UUID * countDisconnected node ids

Server must return endpoints according to IgniteConfiguration.Localhost setting (IgniteConfiguration.Localhost defines network interfaces for Ignite to bind to).

Client Public API Changes

None.

When ClientConfiguration.PartitionAwarenessEnabled is true, discovery is enabled as well.

Client Logic

  1. Connect to one or more endpoints from ClientConfiguration
  2. Perform OP_CLUSTER_GROUP_GET_NODE_ENDPOINTS(-1, -1) to retrieve current topology.
  3. Connect to every server from (2) that is not yet connected.
    1. Use node UUID to make sure we don't connect to the same node twice
    2. Verify that node UUID from handshake is equal to node UUID from OP_CLUSTER_GROUP_GET_NODE_ENDPOINTS response: when client and server are in different subnets, there can be a mismatch. The node can even belong to a different cluster. Drop the connection in case of a mismatch. 
  4. For every response to any operation, compare topology version from the header to the topology version from (2). When it changes, request updated topology with OP_CLUSTER_GROUP_GET_NODE_ENDPOINTS(oldVer, newVer).
  5. Connect to newly discovered servers, remove connections to removed servers
  6. GOTO (4)

Risks and Assumptions

We assume that clients and servers share a common subnet. Other use cases are not supported.

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/IEP-44-Thin-Client-Discovery-td47129.html

Tickets

key summary type created updated due assignee reporter customfield_12311032 customfield_12311037 customfield_12311022 customfield_12311027 priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels