ID	IEP-95
Author	Pavel Tupitsyn
Sponsor	Pavel Tupitsyn
Created	14 Sep 2022
Status	DRAFT

Motivation

Ignite distributes table data across cluster nodes. Every row belongs to a specific node. Currently, clients are not aware of this distribution, which may result in additional network calls. For example, client calls server A to read a row, server A has to call server B where the row is stored.

In an optimal scenario, the client knows that the row is stored on the server B and makes a direct call there.

Description

Currently, clients can already establish connections to multiple server nodes. Handhsake response includes node id and name.

Update client implementation:

Retrieve and maintain up-to-date partition assignment - an array of node ids, where Nth element is the leader node ID for partition N.
Use HashCalculator to compute colocation key hash (see Row#colocationHash).
Calculate partition number as rowKeyHash % partitionCount.
Get node id from partition assignment (p1).
If a connection to the resulting node exists, perform a direct call. Otherwise, use default connection.

Exceptions:

Transaction belongs to a specific node and client connection. When a non-null transaction is provided by the user, partition awareness logic is skipped.

Protocol Changes

1. Add PARTITION_ASSIGNMENT_GET operation

Request
UUID	table ID

Response
array	Array of node ids, where array index is the partition number

.

2. Update standard response message to include flags

Response
int	Type = 0
int	Request id
int	Flags
int	Error code (0 for success)
string	Error message (when error code is not 0)
...	Operation-specific data

3. Include colocation flag in SCHEMAS_GET response.

Tracking Assignment Changes

There are three potential ways to keep partition assignment up-to-date on the client:

Response flag. All server responses include flags field, and server sets a flag when the assignment has changed since the last response. It is up to the client to retrieve updated assignment when needed. This mechanism is used in Ignite 2.x.
Pros: Low overhead, no extra network traffic.
Cons: Unlike Ignite 2.x, there is no concept of TopologyVersion in Ignite 3. So when there are multiple server connections, a notification for the same assignment update will come from all of them, and it is not possible to tell whether it was the same update or a new one. This may cause unnecessary assignment update requests. A workaround is to use only one connection to track assignment changes. Even if there is no activity on this connection, heartbeats (IEP-83) will trigger the update.
Server → client notification. As soon as assignment changes, server sends a message to all clients.
Pros: Immediate update for all clients.
Cons: Increased network traffic and server load. Some clients may not need the update at all (not all APIs require this).
PrimaryReplicaMissException (suggested in Unable to render Jira issues macro, execution error. comments).
Pros: No protocol changes.
Cons: Retry is required on replica miss (complicated & inefficient). Using exceptions for control flow.

The first approach (response flag) is battle tested and seems to be the most optimal.

Discussion Links

Tickets

key	summary	type	created	updated	due	assignee	reporter	priority	status	resolution
JQL and issue key arguments for this macro require at least one Jira application link to be configured

Page tree

IEP-95 Client Partition Awareness