IDIEP-90
Author
Sponsor Igor Sapego 
Created

  

StatusDRAFT


Motivation

The main idea is to define client lifecycle as well as core algorithms and mechanisms used by clients. This proposal can be used as a reference for implementation of a new client for Ignite when dealing with such problems as:

  • Resolving of user-provided addresses;
  • Initial connection to a cluster;
  • Maintaining cluster connection;
  • Connection recovery;
  • Connection break handling.

Definitions

An address here is either a transport level address or hostname that can be used to get at least one transport level address using system API.

A transport level address is a valid TCP/IP address, i.e. a pair of IP address and TCP port.

A transport level connection is a connection between client and server node that can be used to send client protocol messages. This is now either TCP connection or TLS connection.

A node connection is a single transport level connection between a client and a server node on which the handshake or logical connection restoration procedure was completed successfully. A node connection is considered to be closed when the underlying transport level connection is closed.

A logical connection is a logical link between node and client that is established on successful handshake but does not bind to a single exact node connection and can outlive it. A logical connection is considered to be closed when the underlying node connection is closed and logical connection is not restored within connection restore timeout.

A handshake is a procedure of establishing a logical connection between client and server node using established transport level connection.

A cluster connection is a logical connection between a client and the entire cluster. Client is considered connected to the cluster as long as there is at least one logical connection between client and the node of the cluster.

A Connection ID is a UUID uniquely identifying a logical connection.

A Node ID is a unique identifier of a server node.

Description

Initial connection

Client gets initial lists of cluster node addresses from client configuration provided by the user. There should be at least one valid address in the list provided by the user.

To establish an initial cluster connection, the client should establish at least one node connection using a list of addresses provided by the user. To do this client sequentially tries to establish node connection until one of the following happens: 

  • A node connection has been established successfully;
  • A node connection establishment failed with the handshake error. Note that the transport level connection error does not lead to the failure of the entire procedure of initial connection;
  • Connection timeout has been reached. If the client was not able to establish a single node connection within timeout, an appropriate error is returned to the user;

Once the connection has been established successfully, cluster connection is considered established and initial connection procedure is considered complete.

Node connection establishment

To try and establish a node connection, a client should be given exactly one valid address of the cluster node.

There are two phases of node connection establishment: transport level connection establishment and handshake.

Transport level connection establishment

The address of the node is only used to establish transport level connection. Here are the steps for this procedure:

  1. Client tries to resolve the address to a set of transport level addresses. If the address can not be resolved, operation of establishment of node connection is failed;
  2. Client picks addresses from the resolved set of transport level addresses one by one in random order and tries to establish transport level connection with it. Alternatively, a round robin approach can be used as long as the first picked address is random. The main idea here is to change the order of connection every time the same address is used, because some users utilize hostnames that can be resolved to multiple transport level addresses for load balancing; 
  3. If none of the transport level addresses is connectable, operation of establishment of node connection is failed.

Once a transport level connection is established, the client initiates the handshake procedure. Result of the handshake procedure at this point is the result of node connection establishment. Note, if the handshake procedure has failed, the client does not try other addresses from the set of resolved transport level addresses as they are all considered to belong to the same node.

Handshake

A handshake procedure serves several purposes:

  • Ensures client and server are both using Ignite 3 Client Protocol;
  • Ensures client and server are both using protocol of the same version;
  • Let client and server negotiate a subset of features they support;
  • Let client and server establish a logical connection;
  • Exchange initial data, i.e. authentication, attributes, etc.

Those capabilities are provided by the following mechanisms:

  • Magic bytes at the very beginning of the very first message. This helps the server to fail quickly in case there is invalid or non-Ignite client established transport level connection with node;
  • Protocol versioning. It is needed to let the user use clients and servers of different versions at the same time. Ordinarily, clients and servers would support several versions of protocol and negotiate a version during the handshake procedure. Protocol version is in major.minor.revision format, 2 bytes for each part. Major version is changed on breaking changes (new protocol, basically), minor is changed when new features are added, and revision is changed on bug fixes, just as always;
  • Protocol features bitset. This mechanism lets different clients support different subset of features and allows for simpler implementation of new clients. This also may be a mechanism for providing backward compatibility. Bitset is encoded as a variable length array to support any number of features in the future.
  • Cluster tag. This identifier is used to make sure a client won’t connect to nodes from different clusters.
  • Connection ID. This identifier is used to identify a logical connection established during a handshake. Connection ID is a UUID.
  • Extensions map which consists of keys and values (string -> any). Both client and server may skip keys that are unknown to them, which also adds to the protocol’s backward compatibility.

The detailed description of handshake protocol messages can be found here: IEP-76 Thin Client Protocol for Ignite 3.0: Handshake

A handshake procedure is conducted the following way:

  1. The client sends a HandshakeRequest using the latest supported version of the protocol and it’s full set of supported features and waits for the response;
  2. If the client does not receive HandshakeResponse within timeout, the handshake considered failed.
  3. If HandshakeResponse contains different protocol version then:
    1. If the version is not supported by the client, the handshake considered failed;
    2. If the version is supported by the client, it should try conducting handshake again from the step 1 using provided version;
  4. If HandshakeResponse contains non-zero error code, the handshake considered failed;
  5. If HandshakeResponse is successful, the client should save the protocol version and resulting feature bitmask for future use. It may also additionally process extension data from the server. At this point the handshake is considered successful and the node connection is considered established.

Logical connection

Logical connection is the mechanism that can be used to handle temporary disconnections without failing ongoing user operations. Physically this is a context associated with a client and stored on the server side, which can outlive a node connection for some span of time and allow client to restore connection without losing a result of some operations. Current approach implies locally stored logical connection context. In this case, a client has multiple unrelated logical connections – a single logical connection per node.

The following mechanism is used on servers and clients to support logical connections:

  1. If a client does not have a Connection ID associated with a server it tries to connect, it initiates a handshake procedure as always;
  2. Upon successful completion of the handshake server node creates a connection context object and associated Connection ID and stores it locally, while the node connection is alive. The Connection ID should be generated using a proper secure algorithm (additional research is required here) to make sure an intruder can not generate an existing Connection ID. The ID is returned to the client in HandshakeResponse;
  3. Client saves the Connection ID on its side and associates it with the server. A proper way to identify a server is by using a Node ID;
  4. If the node connection is broken, the server starts a logical connection restore timer (timeout is configurable). A result of any ongoing operation should be saved in the connection context object during this period;
  5. If a new node connection is established successfully before the timer expires, and client send a ConnectionRestoreReq with a valid Connection ID then timer is stopped, logical connection is considered to be restored and all pending operation results are reported to the client;
  6. If the logical connection was not restored during timeout, the connection context object is released and the logical connection is considered closed. In this case even if the client establishes a new node connection successfully and issues a ConnectionRestoreReq, a reject response should be issued by the server node;
  7. The client in its turn tries to establish a new node connection and restore logical connection using ConnectionRestoreReq and the saved Connection ID. It can either get “ok” or “connection unknown” responses. The latter means that the client was not able to restore the logical connection within timeout and thus the client should clear data associated with the logical connection and consider it closed. It should also fail any ongoing operations associated with the logical connection and report errors to the user when appropriate.

Maintaining cluster connection

Once initial connection is completed, the client should, if possible, increase the number of node connections as this increases stability of cluster connection, improves load balancing among cluster nodes and improves overall client performance.

There are two aspects of maintaining stable cluster connection:

Getting an actual list of active cluster nodes. 

The more actual this list is, the faster the client can establish connections to the nodes. Currently, there is no way for the client to find out addresses of cluster nodes other than the initial list provided by the user with configuration. However, we should consider adding such a mechanism (Cluster Discovery), as it would improve user experience (e.g. no need to change client configuration every time new cluster nodes are added) and improve performance when we introduce IEP-23: Best Effort Affinity for Thin Clients.

Establishing new and restoring logical connections

This may be done in several ways:

  • Synchronous approach. In this case all provided addresses are tested and as many node connections as possible are established during the initial connection. Pros: easier to implement. Cons: initial connection may take a long time on big clusters, it’s harder to restore logical connections;
  • Asynchronous approach. In this case during initial connection only one node connection is established and other connections are performed in the background by a separate task/thread. Pros: fast initial connection, more robust. Cons: harder to implement, additional thread is required (bad for some users).

Overall, an asynchronous approach seems more preferred, but synchronous approach may be adopted when a client is at the early stage of development as it is more simple.  

Logical connection break

There are two perspectives of this event: client and server.

Server side

All resources allocated for the client on the server as well as the Connection ID are released in this case.

Client side

The main goal of the client is to make logical connection failure as invisible to the user as possible.  There are two basic cases:

  • There are no active or pending operations on the logical connection. In this case the client should silently mark the address for reconnection. There is a IEP-83 Thin Client Keepalive mechanism that helps detect dead connections early on the client side and lower the number of errors for the user.
  • There are active operations on the logical connection. In this case the client fails the operation and reports error to the user. A user may use IEP-82 Thin Client Retry Policy to automatically retry all or some operations on the failed logical connection.

Cluster connection break

This is actually a special case of a logical connection break, when a broken logical connection is the last or the only one. There are two main scenarios in this case: 

  • If this happens with no user activity, the client should continue on procedures described in “Maintaining cluster connection” section;
  • If the user tries to perform an operation when there is no cluster connection or cluster connection is found to be broken when trying to perform user operation, user operation should be postponed and initial connection procedure should be performed again.

There is also a chance that when we recover connection, this is going to be a connection to a new cluster. If such a situation occurs, a client should be able to detect it using a previously received and saved cluster tag. In this case, a client should report a user error and stop trying to recover the connection. Further handling of the case is up to the user.

Discussion Links

https://lists.apache.org/thread/38fxj1f2yg3kxcm0k84ln61l3yg1ybbr

Reference Links

Tickets

key summary type created updated due assignee reporter customfield_12311032 customfield_12311037 customfield_12311022 customfield_12311027 priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels