...
The document also describes how we deal with basic schema changes (these are changes that have nothing to do with indices and don’t require existing data validation) under the Schema Synchronization framework. Compatibility issues are covered for basic schema changes.
This is a dependent design, some requirements are defined in the Schema synchronization design. As a highlight, we should avoid blocking both transaction processing (due to DDLs) and DDLs (due to running transactions); if we must choose between blocking and aborting some transactions, abortion is the preferred approach.
Here we add the following requirements:
The diagram illustrates the following:
We are not going to implement transactional DDL for now, also we don’t want to block transactions while executing DDL. This means that transborder transactions (TB) are possible (those transactions that start on one schema, but commit on another one).
There are 3 options:
...
We choose option 3 to avoid the weirdness of the first two options at the expense of a slightly increased probability of rollbacks. To make sure it holds, we require that a DDL operation is linearized wrt any DML operation (in other words, if we treat a DDL as a ‘transaction’, it is strictly serializable). This means that a DDL operation immediately affects any transactions that started before it, and new transactions will start using the new schema. No transaction can have reads/writes run both on the old schema and the new schema AND have the possibility to detect that they were executed on different schemas.
The result is that the ability to commit transborder transactions for forward-compatible schema changes is a minor optimization. It might have a noticeable effect only for very short transactions (implicit ones and explicit transactions having just a couple of operations each). This optimization might be postponed.
All objects in a database are either top-level (defined directly in the Catalog) or belong to some other object. Please note that the notion of being a top-level object or a dependent object is specific to this design and might be different from usual considerations (for instance, here we consider a table to be a top-level object, even though one might say that a table belongs to a database).
Top-level:
Belonging to a table:
When a schema change happens, the change is compatible from the point of view of any transaction whose enlisted tables are different from the top-level object of the change in question. So for a transaction that only touches table A, a change in table B or any of indices of table B is compatible in both directions.
When a schema change from I to C actually consists of a few steps (for instance, the table was altered a few times since I), each of the steps (which is in turn a schema change) is validated in sequence. If any of them is not compatible, the whole change is not compatible. It works the same for forward and backward compatibility.
When a schema change is compound (that is, it consists of a few simple schema changes), it is represented for validation purposes as a sequence of these simple changes and then each simple change is validated, in order. The compound change is compatible if and only if all the simple changes are compatible.
An example of such a compound change is a mixed ALTER TABLE statement like the following: ALTER TABLE t1 DROP COLUMN c1, ADD COLUMN c2 VARCHAR(20), that will be represented as ‘drop column c1, then add column c2’.
Please note that columns are identified by their IDs, not names, so after a column is dropped and readded with the same name, the result is actually a column different from the original column.
(It seems that for GA we are only going to support the ability to drop a few columns at once, so for GA the compound schema changes support will be very limited.)
Initial schema I is forward-compatible with commit schema C if all of the following requirements are met:
Imagine the following scenario.
If a schema change at step 2 did not happen, such a scenario would be perfectly valid according to the Transaction protocol. But, as the schema change has happened, we need to represent it as a schema change transaction Ts (a virtual transaction, just for the sake of argument, because the current design does not represent schema changes as ‘real’ transactions). Let’s look at the serialization ordering:
We have a cycle, so the serializability is broken and all 3 transactions cannot commit.
The notion of backward compatibility is about the cases where we pretend that no schema update has happened (because, whether the transaction runs on the old I or the new C does not affect the effects of the tx). For example, if the schema change is an addition of a nullable column, then, when T1 reads the row from T2, the row gets automatically downgraded to I, so T1 has no chance of noticing that something has changed; such a change could be called backward compatible as it would allow to pretend that no schema change has happened, so there is no Ts, so the serializability is not broken and T1 can still commit.
Currently, the described scenario still seems suspicious, so, for the sake of safety, we always forbid T1 to commit (so, in our terms, any schema change is backward incompatible).
Note that, when determining forward compatibility, only the ‘type’ of the change is considered and not the actual writes that were made in the transaction (to avoid scanning all the transaction’s writes to the table on the commit prepare phase). In contrast to this, if we had backward compatibility, we would consider backward compatibility for each read, so the rules would be able to consider not just the type of the schema change, but also the tuple’s contents.
(Note: there is a table in the end of the document summarizing the table-related rules)
When a table is dropped, it does not make sense to pretend that it’s still there until a commit and only on commit abort the transaction. Each RW transaction operation has a timestamp associated with this (this timestamp moves the corresponding partition forward). Before an operation gets executed, the schema must be obtained for the operation timestamp; if the table does not exist in this schema, the transaction should be aborted and the operation should fail.
Columns are identified by their IDs, not names, for schema compatibility calculations. This means that, if a column X exists, then gets dropped and then column X gets added again, the second X column has a different ID, so it’s a column different from the original X.
Always forward-compatible (because the value written in a narrow type in the old schema can always be read in a wide type in the new schema).
Dropping any constraint is forward-compatible.
Creation of constraints is planned to be defined in another IEP.
ALTER TABLE syntax allows to change column type, nullability and default value all at once.
When making a validation for any table schema change, the rules above are applied (in order); all that match the change cause the corresponding compatibility validation. As a result, a complex change (like INT NOT NULL -> LONG DEFAULT 42) will trigger a few rules and all of them will be applied, in order.
This is out of scope of this design and planned for another IEP.
CREATE/DROP/RENAME are analogous to the corresponding table changes.
Creation of a distribution zone precedes creation of a table, so no special handling is required.
Drop of a distribution zone follows drop of a table, so, again, no special handling is required.
Alteration of a distribution zone that does not change partitions count does not affect the tuples written, so special handling is not needed here as well. Changing partition count is not supported for now.
The design has a quirk. Namely, the following scenario is possible:
The validation will fail as the schema version corresponding to opTs is different from tableEnlistTs, even though, logically, tableEnlistTs corresponds to the same operation.
This seems to be acceptable as:
One of the basic requirements sounds like this:
If an ALTER TABLE happens after a transaction was started and only later (after the DDL) the table gets enlisted in the transaction, the transaction should not be aborted due to the schema change.
This forces us to maintain tableEnlistTs for each enlisted table in each transaction both on the coordinator and primaries; this complicates the code.
A possible simplification is to give this requirement up and just abort a transaction that tries to enlist a table that was ALTERed after the transaction was started. This would allow to use max(beginTs, tableCreationTs) as baseTs, instead of tableEnlistTs; both beginTs and tableCreationTs are available.
The problem here is that a table might be renamed. If we resolve a table by the same name ‘A’ a minute ago and a minute after, we might get different tables. To still be consistent with how we work with tables in the same transaction, we still have to track either tableEnlistTs or a mapping from table names to IDs on the coordinator. On primaries, this is not needed (they are chosen by table ID).
The suggestion is to start the implementation without the mentioned requirement and only support it later.
The following possibilities where evaluated, but postponed for a possible later implementation:
N/A
N/A
Jira | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
// Links to discussions on the devlist, if applicable.
// Links to various reference documents, if applicable.
// Links or report with relevant JIRA tickets.