...

ID

IEP-101

Author

Alexander

Sponsor

Alexander

Mirza Aliev

Sergey Utsel

Created

01 Nov 2022

Status

colour	Blue
title	DRAFTIN PROGRESS

Table of Contents

Motivation

Generally speaking, Apache Ignite replicates data over multiple cluster nodes in order to achieve goals of high availability and performance. Data partitioning is controlled by the affinity function that determines the mapping both between keys and partitions and partitions and nodes. Please consider following article [1] about data distribution as a prerequisite for the following proposal. Specifying an affinity function along with replica factor and partitions count sometimes is not enough, meaning that explicit fine grained tuning is required in order to control what data goes where. Distribution zones provides aforementioned configuration possibilities that eventually makes it possible to achieve following goals:

The ability to trigger data rebalance upon adding and removing cluster nodes.
The ability to delay rebalance until the topology stabilizes.
The ability to conveniently specify data distribution adjustment rules for tables.

Description

Let’s start with instituting data nodes term - a concept that defines a list of cluster nodes for their subsequent transfer to an affinity function for calculating assignments:

...

Such a dataNodes attribute is evaluated according to some data nodes configuration rules, let’s check them up.

Distribution zones configuration syntax

Code Block

language	java
title	Table

CREATE ZONE 
	{ database_name.schema_name.distribution_zone_name | schema_name.distribution_zone_name | distribution_zone_name }
	[WITH 
		[
			<data_nodes_auto_adjust> |
			DATA_NODES_FILTER = filter |
			(<data_nodes_auto_adjust>, DATA_NODES_FILTER = filter)
		],
		[PARTITIONS = partitions],
    	[REPLICAS = replicas],
		[AFFINITY_FUNCTION = function]
	]
[;]

<data_nodes_auto_adjust> ::= [
	DATA_NODES_AUTO_ADJUST_SCALE_UP = scale_up_value |
	DATA_NODES_AUTO_ADJUST_SCALE_DOWN = scale_down_value |
	(DATA_NODES_AUTO_ADJUST_SCALE_UP = scale_up_value & DATA_NODES_AUTO_ADJUST_SCALE_DOWN = scale_down_value) | DATA_NODES_AUTO_ADJUST  = auto_adjust_value
]

...

Similar syntax is expected for altering table in order to modify primary zone.

Distribution zones configuration internals

Let's now take a closer look at two lexemes were introduced:

...

Let’s check the AUTO_ADJUST rules in more detail.

DATA_NODES_AUTO_ADJUST rules

The easiest way to understand auto_adjust semantics is to take a look at a few examples. Let’s start with a simplest one - scale_up example:

...

At this point we’ve covered DATA_NODES_AUTO_ADJUST_SCALE_UP DATA_NODES_AUTO_ADJUST_SCALE_DOWNand its combination. Let's take a look at the last remaining auto adjust property - DATA_NODES_AUTO_ADJUST. The reason we have one is to eliminate excessive rebalance in case of users intention on having the same value for both scale_up and scale_down. As we saw in the example above, the events of adding and removing nodes fall into the corresponding frames with a dedicated timer each: one for expanding the topology (adding nodes), and another for narrowing it (removing nodes), which in turn leads to two rebalances - one per each frame. If the user however wants to put both types of events (adding and removing nodes) in one frame with only one dataNodes recalculation and one rebalance, he should use the DATA_NODES_AUTO_ADJUST property.

DATA_NODES_FILTER rule

Often, a user may want to distribute table data on a specific set of nodes that satisfy a particular requirement. E.g. in the case of a geo-distributed cluster, a user may want to place the table closer to the business logic applications in a particular region, or, on the contrary, stretch it over several regions. For hot tables, a user may want to spread data across “fast” nodes, for example nodes with SSD disks.

...

Regexp.
Simple predicate rules

lists, e.g. “US”, “EU”
groups ()
or ||
and &&
not !
contains()

Data nodes evaluation rules

Compatibility Matrix

	AUTO_ADJUST_SCALE_UP	AUTO_ADJUST_SCALE_DOWN	AUTO_ADJUST	FILTER
AUTO_ADJUST_SCALE_UP
AUTO_ADJUST_SCALE_DOWN
AUTO_ADJUST
FILTER

As you can see, the only properties that are incompatible with each other are DATA_NODES_AUTO_ADJUST with any of the properties DATA_NODES_AUTO_ADJUST_SCALE_UP or DATA_NODES_AUTO_ADJUST_SCALE_DOWN or a combination of them.

Override rules

In addition to specifying DATA_NODES properties on table creation it’s also possible to update them with alter table/tableGroup queries. The set of properties specified in the next request will override the corresponding previously specified properties.

...

Will add the filter without affecting DATA_NODES_AUTO_ADJUST properties.

Retrieve distribution zones configuration

It should be possible to retrieve distribution zone both through table views and through a new sql command DESCRIBE:

DESCRIBE TABLE <tableName>;
DESCRIBE TABLE GROUP <tableGroupName>;

Risks and Assumptions

Some aspects of node attributes configuration aren't well designed. Besides that, manual setting of data nodes is intentionally skipped. There's corresponding extension point in aforementioned design that will allow user to specify set of nodes explicitly using special APPLY keyword. However, currently it's not clear whether we really need it or not.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

[1] https://ignite.apache.org/docs/latest/data-modeling/data-partitioning

[2] https://cwiki.apache.org/confluence/display/IGNITE/IEP-77%3A+Node+Join+Protocol+and+Initialization+for+Ignite+3

Tickets

Jira

server	ASF JIRA
columnIds	issuekey,summary,issuetype,created,updated,duedate,assignee,reporter,customfield_12311032,customfield_12311037,customfield_12311022,customfield_12311027,priority,status,resolution
columns	key,summary,type,created,updated,due,assignee,reporter,Priority,Priority,Priority,Priority,priority,status,resolution
maximumIssues	20
jqlQuery	project = IGNITE AND (labels = iep-101)
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b

// Links or report with relevant JIRA tickets.

Page tree

Versions Compared

Old Version 4

New Version Current

Key

Motivation

Description

Distribution zones configuration syntax

Distribution zones configuration internals

DATA_NODES_AUTO_ADJUST rules

DATA_NODES_FILTER rule

Data nodes evaluation rules

Compatibility Matrix

Override rules

Retrieve distribution zones configuration

Risks and Assumptions

Discussion Links

Reference Links

Tickets

Page tree

Page History

Versions Compared

Old Version 4

New Version Current

Key

Motivation

Description

Distribution zones configuration syntax

Distribution zones configuration internals

DATA_NODES_AUTO_ADJUST rules

DATA_NODES_FILTER rule

Data nodes evaluation rules

Compatibility Matrix

Override rules

Retrieve distribution zones configuration

Risks and Assumptions

Discussion Links

Reference Links

Tickets