A schema update U is visible to a node N at moment T_op if the node N got a message M containing this update at a moment T<=T_op. (Schema update messages are distributed using Meta - Storage RAFT group (so a message consumption is an application of it to a state machine), so the update messages are delivered in order (hence, if a message is not delivered, none of the messages that follow it will be delivered) and not duplicated)). This is due to the fact that there is exactly one Meta - Storage RAFT state machine at each node.

Client issues a statement that modifies database schema (CREATE/ALTER/DROP TABLE, CREATE/DROP INDEX, etc)
The new schema version (with its T_u) is propagated to all cluster nodes using the Meta -storage Storage (as we have a Meta - Storage RAFT group member, a learner or not, on every cluster node)
Each node acts independently from other nodes in interpreting the T_u, so no cluster-wide coordination between nodes is needed when partitioning the transactions. Schema version activation happens "automatically" when the node is sure it has all the updates it needs for processing the transaction it is processing. To make sure it has all the updates, a node might need to wait.
Before an update is reported as installed to the client, an additional wait happens to make sure that the client’s request on any node will see the installed update after the installation call returns. The required wait is until the schema update activation becomes non-future on all nodes (taking clock skew into account, so the client must wait for at least Now ≥ T_u+CS_max).

...

Making sure a node does not miss any schema messages that are relevant to a reference timestamp

The clocks are may be skewed and the network is asynchronous (so message delivery might take an arbitrarily long time). This means that we cannot just use the naive approach of installing the update to the future and processing transactions on the nodes right away.

Briefly, the idea is to do the following:

Use the fact that Meta - Storage is available on all cluster nodes (thanks to Learners) to distribute the schema updates
Wait for Meta - Storage safeTime on a node to reach the required value to make sure the node sees all the updates it needs to process a transaction at question
Actually wait for a safeTime Delay Duration (DD) earlier than needed (to avoid waiting in most of the cases), compensate this by installing an update after DD in the future to make the timestamp math easier on the nodes

...

We must prevent a node processing a transaction at moment Now before the node has received all the schema updates with activation moment not later than Now. We are going to use Meta -storage Storage for storing the schema updates (note that this implies that we use HLC from now on, not the physical clock that we used in the warm-up section). We are going to read schema updates from the Meta -storage Storage on other nodes. To make sure we never read stale data (breaking AP), we could use the same approach that we use to make reads (in RO) from non-primary replicas: Safe Time.

Message M saying ‘Starting at moment T_u, schema is S’ is sent with timestamp T_m (that moves safeTime forward). A schema storage client that wants to obtain a schema active at moment T_op waits for Meta -storage’s Storage’s safeTime>=T_op. T_u must be ≥ T_m (because we’ll construct messages in this way), so, if T_m<T_op, after waiting, the client sees M (as T_m<T_op<=safeTime, so T_m<safeTime, hence M is visible to the client’s node) and knows that S has been activated at T_u before it obtains schema for moment T_op.

...

This approach guarantees correctness (AP is always maintained), but every node (that is not the Meta -storage Storage leader), while trying to obtain a schema for the moment T_op=Now, will have to wait for safeTime visible on this node to reach T_op, which will take PD (propagation duration). So each transaction will be delayed for PD (on average). This creates a performance problem: latency increases.

...

Schema updates are not visible right away

Even on the Meta -storage Storage leader, a just-installed schema update is not visible right away, because it activates after DD in future. This might be confusing for a user that just issues an ALTER TABLE query because ‘read own writes’ is violated. This can be alleviated by making sure that we complete the future used to install the schema update not later than DD passes, but if a client fails before the wait elapses and reconnects, they might expect to see the update.

A node not receiving Meta

...

Storage updates freezes processing of transactions where the node participate

A primary replica needs to wait for safeTime to reach T_op-DD before processing a transaction at a reference timestamp T_op. If the node does not get any Meta - Storage updates (i.e. it lags behind the Meta - Storage leader), it cannot proceed.

...

The primary replica actually lags behind the whole cluster. In such a situation, we need to promote another replica to become a primary. This case is not caused by the new mechanism introduced here, we already should have measures to fight this.
The whole cluster lags behind the Meta -storage Storage leader. This means that the whole cluster is in an uncontrollable state. In such a situation, we become unable to process transactions.

...

An unpleasant thing is that a node lagging too much behind the Meta - Storage freezes transaction processing even in the absence of any schema updates.

...

Reference Links

// N/A

Tickets

Jira

server	ASF JIRA
columnIds	issuekey,summary,issuetype,created,updated,assignee,reporter,priority,status,resolution
columns	key,summary,type,created,updated,assignee,reporter,priority,status,resolution
maximumIssues	30
jqlQuery	labels in (iep-98) order by Key
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b

// Links or report with relevant JIRA tickets.

Page tree

Versions Compared

Old Version 4

New Version Current

Key

Making sure a node does not miss any schema messages that are relevant to a reference timestamp

Schema updates are not visible right away

A node not receiving Meta

Storage updates freezes processing of transactions where the node participate

Reference Links

Tickets

Page tree

Page History

Versions Compared

Old Version 4

New Version Current

Key

Making sure a node does not miss any schema messages that are relevant to a reference timestamp

Schema updates are not visible right away

A node not receiving Meta

Storage updates freezes processing of transactions where the node participate

Reference Links

Tickets