Here we add the following requirements:

A DDL operation must be linearized wrt any DML operation (in other words, if we treat a DDL as a ‘transaction’, it is strictly serializable). This means that a DDL operation immediately affects any transactions that started before it, and new transactions will start using the new schema. No transaction can have reads/writes run both on the old schema and the new schema AND have the possibility to detect that they were executed on different schemas.
After a CREATE TABLE, the created table must be available for any transaction started before the CREATE TABLE was executed.
If an ALTER TABLE happens after a transaction was started and only later (after the DDL) the table gets enlisted in the transaction, the transaction should not be aborted due to the schema change.

Handling of a transborder transaction should not produce any weird (from the point of view of the user) behavior. An example of a weirdness is if a transaction is run on more than one schema during its life, or it looks (to other transactions) that it did so.
After a CREATE TABLE, the created table must be available for any transaction started before the CREATE TABLE was executed.
If an ALTER TABLE happens after a transaction was started and only later (after the DDL) the table gets enlisted in the transaction, the transaction should not be aborted due to the schema change.

Overall flow

A client comes and tries to execute an operation in a transaction (it sends its schema version clientSV in each tuple it sends in the request)
The tx coordinator gets the request, takes baseTs=tableEnlistTs(table, tx), then takes a schema version tableEnlistSV corresponding to the ts and, if clientSV does not match tableEnlistSV, returns an error to the client (the error contains tableEnlistSV); by getting the error, the client retries its request using tableEnlistSV. The coordinator caches tableEnlistSV for each table in each transaction.
If the client request validation succeeds, the coordinator sends tableEnlistSV to the primary (either implicitly [in binary tuples of the request] or explicitly)
(As a result, for the same table in the same tx, there will be the same tableEnlistSV)
The primary uses tableEnlistSV as I (initial schema) when validating schema compatibility for each transaction read/write operation
The coordinator uses tableEnlistSV as I (initial schema) when validating schema compatibility during commit

The diagram illustrates the following:

If the transaction is aborted due to schema validation failure (items 5 and 6), an error code is returned to the client that says that the transaction should be retried
If the client gets a retriable error code, it retries the transaction

The diagram illustrates the following:

Client schema version validation and Client schema version validation and refresh
Successful writes
Schema validation at commit

Image Removed

When schema changes are validated

Retry of an aborted transaction

Image Added

Weirdness and DDL linearization wrt DML

We are not going to implement transactional DDL for now, also we don’t want to block transactions while executing DDL. This means that transborder transactions (TB) are possible (those transactions that start on one schema, but commit on another one).

There are 3 options:

We allow a TB to read/write on the new schema (if the change is compatible) as soon as a DDL executes (so the TB sees the effect of the DDL right away wrt reads/writes). The effect of such TBs could look weird from the POV of transactions reading tuples written by the TB: even though they seem to appear at the same time (commitTs), some of them would adhere to one set of constraints (corresponding to the old schema), but some would adhere to another set of constraints (corresponding to the new schema).
We allow a TB to read/write after a DDL happens (if it’s compatible), but it still reads/writes on the old schema. This might look weird from the POV of the user running the TB: the DDL is not transactional, but the effect of a DDL is not kicking in right away.
We don’t allow a TB to read/write after a DDL happens (so it can only commit [if the schema change is compatible])

We choose option 3 to avoid the weirdness of the first two options at the expense of a slightly increased probability of rollbacks. To make sure it holds, we require that a DDL operation is linearized wrt any DML operation (in other words, if we treat a DDL as a ‘transaction’, it is strictly serializable). This means that a DDL operation immediately affects any transactions that started before it, and new transactions will start using the new schema. No transaction can have reads/writes run both on the old schema and the new schema AND have the possibility to detect that they were executed on different schemas.

When schema changes are validated

When committing, we validate that I (the initial schema at tableEnlistTs) is compatible with C (the commit schema at commitTs)
At each read/write operation in a transaction (that has its opTs), if schema at opTs is different than I, we fail the operation and
When committing, we validate that I (the initial schema at tableEnlistTs) is compatible with C (the commit schema at commitTs)
At each read/write operation in a transaction (that has its opTs), if schema at opTs is different than I, we fail the operation and abort the transaction (we do so to make it impossible for a transaction to observe that it runs on a different schema)
When a schema change happens, we find all transactions that touched this table (and started on an earlier schema) and mark all transactions for which the schema change is incompatible as rollback-only. This is not needed for correctness, it’s an optimization to make sure we abort transactions as soon as possible if they will never be able to commit.

...

(Note: there is a table in the end of the document summarizing the table-related rules)

Creation of a table does not need to be considered because before the table was created it did not exist, so a write to the table in the initial schema is not possible in the first placeis forward-compatible
Drop of a table is not forward-compatible because the table does not exist in the commit schema
Rename of a table is not forward-compatible because the table identified with the original name (seen by the transaction) does not exist in the commit schema

...

We don’t allow any read/write operation on a transaction after an ALTER TABLE happened on an enlisted table, even for the white-listed compatible DDLs (like ADD COLUMN or dropping a constraint) to make sure that even a transborder transaction is not able to see any weirdness due to working on 2 schemas. Such weirdness can only be observed when looking at table metadata (this way, the transaction might find out that a new column had appeared) because the client-side schema is fixed for the duration of the transaction due to the validation on the coordinator. We might consider such a level of weirdness negligible and allow transborder transactions read/write even after a compatible DDL.
For now, we consider an addition of a NOT NULL column without a default value as impossible, but we could allow it for empty tables. This requires an additional design.
For now, we treat RENAME TABLE as incompatible, this is simple and consistent, but it will make transborder transactions touching the renamed table to be aborted. It is possible to use a two-stage mechanism to have the old name as an alias along with the new one for some time to allow the older transactions to finish.
For safety, we consider all changes as backward incompatible, but it might be possible to make some of them backward compatible later to reduce abort rate due to schema changes

Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

// Links to various reference documents, if applicable.

Tickets

Risks and Assumptions

N/A

Discussion Links

N/A

Reference Links

Tickets

Jira

server	ASF JIRA
columnIds	issuekey,summary,issuetype,created,updated,assignee,reporter,priority,status,resolution
columns	key,summary,type,created,updated,assignee,reporter,priority,status,resolution
maximumIssues	20
jqlQuery	labels in (iep-110) order by Key
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b

// Links or report with relevant JIRA tickets.

Page tree

Versions Compared

Old Version 5

New Version Current

Key

Overall flow

When schema changes are validated

Weirdness and DDL linearization wrt DML

When schema changes are validated

Risks and Assumptions

Discussion Links

Reference Links

Tickets

Risks and Assumptions

Discussion Links

Reference Links

Tickets

Page tree

Page History

Versions Compared

Old Version 5

New Version Current

Key

Overall flow

When schema changes are validated

Weirdness and DDL linearization wrt DML

When schema changes are validated

Risks and Assumptions

Discussion Links

Reference Links

Tickets

Risks and Assumptions

Discussion Links

Reference Links

Tickets