ID

IEP-[97]

Author

Sponsor

Created

01 Nov 2022

Status

title	DRAFT

Table of Contents

Motivation

...

Code Block

language	java
title	Table

CREATE ZONE 
	{ database_name.schema_name.distribution_zone_name | schema_name.distribution_zone_name | distribution_zone_name }
	[WITH 
		[
			<data_nodes_auto_adjust> |
			DATA_NODES_FILTER = filter |
			(<data_nodes_auto_adjust>, DATA_NODES_FILTER = filter)
		],
		[PARTITIONS = partitions],
    	[REPLICAS = replicas],
		[AFFINITY_FUNCTION = function]
	]
[;]

<data_nodes_auto_adjust> ::= [
	DATA_NODES_AUTO_ADJUST_SCALE_UP = scale_up_value |
	DATA_NODES_AUTO_ADJUST_SCALE_DOWN = scale_down_value |
	(DATA_NODES_AUTO_ADJUST_SCALE_UP = scale_up_value & DATA_NODES_AUTO_ADJUST_SCALE_DOWN = scale_down_value) | DATA_NODES_AUTO_ADJUST  = auto_adjust_value
]

...

The easiest way to understand auto_adjust semantics is to take a look at a few examples. Let’s start with a simplest one - scale_up example:

-1	start Node A; start Node B; start Node C; CREATE ZONE zone1 WITH DATA_NODES_AUTO_ADJUST_SCALE_UP = 300; CREATE TABLE Accounts … WITH PRIMARY_ZONE = zone1	User starts three Ignite nodes A, B, C and creates table Accounts specifying scale up auto adjust timeout as 300 seconds. Accounts table is created on current topology, meaning that <Transaction>.dataNodes = [A,B,C]
0	start Node D -> Node D join/validation -> D enters logical topology -> logicalTopology.onNodeAdded(Node D) -> scale_up_auto_adjust(300) timer is scheduled for the <Accounts> table.	At time 0 seconds the user starts one more Ignite node D, that joins the cluster. On entering logical topology the onNodeAdded event is fired. This event schedules a timer of 300 seconds for table Accounts after which the dataNodes of that table transitively through the distribution zone will be recalculated from [A,B,C] to [A,B,C,D]
250	start Node E -> scale_up_auto_adjust(300) is rescheduled for the <Accounts> table.	At 250 seconds one more node is added, that action reschedules scale_up_auto_adjust timer for another 300 seconds.
550	scale_up_auto_adjust fired -> set table.<Accounts>.dataNodes = [NodeA, NodeB, NodeC, Node D, Node E]	At 550 seconds scale_up_time is fired, that leads to <Transaction>dataNodes recalculation by attaching the nodes that were added to logical topology - Nodes D and E in the given example.
600	start Node F -> <Accounts> table schedules scale_up_auto_adjust(300);	At 600 seconds one more node is added, there are no active scale_up_auto_adjust timers, so given events schedules new one.

Now it’s time to expand the example above with node that exits the cluster topology:

-1	start Node A; start Node B; start Node C; CREATE ZONE zone1 WITH DATA_NODES_AUTO_ADJUST_SCALE_UP = 300, DATA_NODES_AUTO_ADJUST_SCALE_DOWN= 300_000; CREATE TABLE Accounts … WITH PRIMARY_ZONE = zone1	User starts three Ignite nodes A, B, C and creates table Accounts specifying scale up auto adjust timeout as 300 seconds. Accounts table is created on current topology, meaning that <Transaction>.dataNodes = [A,B,C]
0	start Node D -> Node D join/validation -> D enters logical topology -> logicalTopology.onNodeAdded(Node D) -> scale_up_auto_adjust(300) timer is scheduled for the <Accounts> table.	At time 0 seconds the user starts one more Ignite node D, that joins the cluster. On entering logical topology the onNodeAdded event is fired. This event, schedules a timer of 300 seconds for table Accounts after which the dataNodes of that table will be recalculated from [A,B,C] to [A,B,C,D]
100	stop Node C -> scale_down_auto_adjust(300_000) timer is scheduled for the <Accounts> table.	At 100 seconds the user stops Node C (or it painfully dies). TableManager detects onNodeLeft(Node C) event and starts scale_down time for 300_000 seconds for table <Accounts>. Please pay attention that the node left doesn’t affect the scale_up timer.
250	start Node E -> scale_up_auto_adjust(300) timer is re-scheduled for the <Accounts> table.	At 250 seconds Node E is added, that re-schedules scale_up_auto_adjust timer for another 300 seconds. The important part here is that adding the node doesn’t change scale_down time only scale_up one.
550	scale_up_auto_adjust fired -> set table.<Accounts>.dataNodes = [NodeA, NodeB, NodeC, Node D, Node E]	At 550 seconds scale_up_time is fired, that leads to <Transaction>dataNodes recalculation by attaching the nodes that were added to logical topology - Nodes D and E in the given example. Please pay attention that despite the fact there's no Node C in logical topology it still takes its place in <Transaction>.dataNodes.
300100	scale_down_auto_adjust fired -> set table.<Accounts>.dataNodes = [NodeA, NodeB, Node D, Node E]	At 300_100 seconds scale_down_auto_adjust timer is fired, that leads to removing Node C from <Transaction>.dataNodes.

At this point we’ve covered DATA_NODES_AUTO_ADJUST_SCALE_UP DATA_NODES_AUTO_ADJUST_SCALE_DOWNand its combination. Let's take a look at the last remaining auto adjust property - DATA_NODES_AUTO_ADJUST. The reason we have one is to eliminate excessive rebalance in case of users intention on having the same value for both scale_up and scale_down. As we saw in the example above, the events of adding and removing nodes fall into the corresponding frames with a dedicated timer each: one for expanding the topology (adding nodes), and another for narrowing it (removing nodes), which in turn leads to two rebalances - one per each frame. If the user however wants to put both types of events (adding and removing nodes) in one frame with only one dataNodes recalculation and one rebalance, he should use the DATA_NODES_AUTO_ADJUST property.

...

Data nodes evaluation rules

Compatibility Matrix

	AUTO_ADJUST_SCALE_UP	AUTO_ADJUST_SCALE_DOWN	AUTO_ADJUST	FILTER
AUTO_ADJUST_SCALE_UP
AUTO_ADJUST_SCALE_DOWN
AUTO_ADJUST
FILTER

As you can see, the only properties that are incompatible with each other are DATA_NODES_AUTO_ADJUST with any of the properties DATA_NODES_AUTO_ADJUST_SCALE_UP or DATA_NODES_AUTO_ADJUST_SCALE_DOWN or a combination of them.

...

Page tree

Versions Compared

Old Version 1

New Version 2

Key

Motivation

Data nodes evaluation rules

Compatibility Matrix

Page tree

Page History

Versions Compared

Old Version 1

New Version 2

Key

Motivation

Data nodes evaluation rules

Compatibility Matrix