ID

IEP-95

Author

Pavel Tupitsyn

Sponsor

Pavel Tupitsyn

Created

14 Sep 2022

Status


colour	Grey
title	DRAFT

Table of Contents

Motivation

Ignite distributes table data across cluster nodes. Every row belongs to a specific node. Currently, clients are not aware of this distribution, which may result in additional network calls. For example, client calls server A to read a row, server A has to call server B where the row is stored.

In optimal scenario, client knows that the row is stored on server B and makes a direct call there.

Description

Currently, clients can already establish connections to multiple server nodes. Handhsake response includes node id and name.

Update client implementation:

Retrieve and maintain up-to-date partition assignment - an array of node ids, where Nth element is the leader node ID for partition N.
Use the same logic as server to compute row key hash (see Row#colocationHash).
Calculate partition number as rowKeyHash % partitionCount.
Get node id from partition assignment (p1).
If a connection to the resulting node exists, perform a direct call. Otherwise, use default connection.

Exceptions:

Transaction belongs to a specific node and client connection. When a non-null transaction is provided by the user, partition awareness logic is skipped.

Protocol Changes

Add PARTITION_ASSIGNMENT_GET operation
Request
UUID table ID

Response
array Array of node ids, where array index is the partition number
Update standard response message to include flags
Response
int Type = 0
int Request id
int Flags
int Error code (0 for success)
string Error message (when error code is not 0)
... Operation-specific data
Include colocation flag in SCHEMAS_GET response.

Tracking Assignment Changes

There are three potential ways to keep partition assignment up-to-date on the client:

Response flag. All server responses include flags field, and server sets a flag when the assignment has changed since the last response. It is up to the client to retrieve updated assignment when needed. This mechanism is used in Ignite 2.x.
Pros: Low overhead, no extra network traffic.
Cons: Idle clients do not get the update.
Server → client notification. As soon as assignment changes, server sends a message to all clients.
Pros: Immediate update for all clients.
Cons: Increased network traffic and server load. Some clients may not need the update at all (not all APIs require this).
PrimaryReplicaMissException (suggested in
Jira
server ASF JIRA
serverId 5aa69414-a9e9-3523-82ec-879b028fb15b
key IGNITE-17394
comments).
Pros: No protocol changes.
Cons: Retry is required on replica miss (complicated & inefficient). Using exceptions for control flow.

The first approach is battle tested and seems to be the most optimal.

Discussion Links

Tickets

Jira

server	ASF JIRA
columnIds	issuekey,summary,issuetype,created,updated,duedate,assignee,reporter,priority,status,resolution
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues	20
jqlQuery	labels=iep-95
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b

Page tree

Versions Compared

Old Version 12

New Version 13

Key

Motivation

Description

Protocol Changes

Tracking Assignment Changes

Discussion Links

Tickets

Response
int	Type = 0
int	Request id
int	Flags
int	Error code (0 for success)
string	Error message (when error code is not 0)
...	Operation-specific data

Page tree

Page History

Versions Compared

Old Version 12

New Version 13

Key

Motivation

Description

Protocol Changes

Tracking Assignment Changes

Discussion Links

Tickets