ID | IEP-108 |
Author | |
Sponsor | |
Created |
|
Status | DRAFT |
This proposal introduces a new feature that allows users to change the column type in a database tables. With this feature allows users to adapt their database schema to evolving business requirements without extensive downtime and/or manual actions.
Column type change feature is supported by most vendors. Postgres, Oracle, MsSQL, mySQL are among them. In old-school databases, column type changing takes the table offline for some period, which duration depends on operation complexity. It may be quite short e.g. for NOT_NULL -> NULLABLE, which doesn't require data validation or data conversion, or long e.g. NULLABLE -> NOT_NULL, which requires validation of existed data.
By leveraging versioned schema functionality, we can
The main limitation here - the underlying storage must supports versioned schemas. To understand how a new schema is applied and which schema version is used in transaction, please, refer to IEP-98 Schema synchronization design
In general case, changing column type caused next challenges
Some operations, which relax constraints and switch column type to a type of wider value range, doesn't require data validation.In that case we can skip validation phase.
For other operations we can start a background validation process and validate ongoing transactions to ensure the column change operation is applicable.
<TBD: describe background validation details, limitation and guarantees here>
Leveraging versioned schema functionality allow us to postpone row upgrading. Schema history can have all the information required for upgrading row to a new schema version.
Thus any valid data type changing is always forward compatible. The backward conversion may be impossible in some cases, and backward compatibility can be stated.
A row is stored in schema were actual at a time the row was created. Thus, any further transaction may upgrade the row regarding schema history, and any time-travelling transaction can read the row using correct schema.
This also introduce the limitation: the row can't be replaced with it's newer version if the row is visible for any time-travelled transaction.
Some operations like changing FLOAT->DOUBLE
may produce different results, because these types are not accurate. This case is ok, but may be non-obvious should be well documented.
Float: 3.6
Double: 3.5999999046325684
<TBD: describe operation limitation, when data conversion is required or impossible. E.g. changing type for indexed column>
For starters, let's focus on the next operations in a list below
INT8 -> INT16 -> INT32 -> INT64
FLOAT -> DOUBLE
DECIMAL(10,5) -> DECIMAL(15, 5) // increasing precision only
VARCHAR(10) -> VARCHAR(20)
VARBINARY(10) -> VARBINARY(20)
NOT NULL -> NULLABLE
These operation doesn't requires validation of the existed data beforehand.
Current BinaryTuple implementation stores integer numbers in compact format and does implicit conversion between these types transparently. So, no actual conversion for INT8->INT16->INT32->INT64
, and for FLOAT->DOUBLE
is required.
Changing max length, precision and nullability doesn't require any data conversion.
In general, all these operations are not backward compatible. For starters, we can suppose changing column type operation breaks backward compatibility.
Further, we can apply more sophisticated techniques to detect if compatibility was really broken or not. try to verify values against previous schema, e.g. check whether value of INT64
type fits to INT32
type.
However, FLOAT->DOUBLE
is always backward incompatible, because casting ‘double to float’ leads to accuracy loss, and is forbidden by BinaryTuple design.
Change data type for indexed column should be forbidden. Further, we could relax this limitation for cases, when the operation is safe.
The most popular vendors dialects:
ALTER TABLE <table_name> MODIFY COLUMN <col_name> <new_type>;
ALTER TABLE <table_name> ALTER COLUMN <col_name> TYPE <new_type> [USING <expression>];
ALTER COLUMN column_name new_data_type(size);
These examples are differs from DDL API described in SQL`11 standard. Postgres syntax is the most closest to the standard, and extends the standard with ‘USING’ clause.
The original DDL syntax assumes a single action per statement and offers different actions for changing data type, column default value and column nullability.
ALTER TABLE <table_name> ALTER COLUMN <col_name> [SET DATA TYPE <new_type> | SET DEFAULT <new_default> | DROP DEFAULT | SET NOT NULL | DROP NOT NULL];
*dropping column default means setting default to null.
Let’s support syntax compatible with the standard, and also allow changing all the properties within the same statement:
ALTER TABLE <table_name> ALTER COLUMN <col_name> SET [DATA TYPE <new_type>] [NOT NULL] [DEFAULT <default value>];
ALTER TABLE <table_name> ALTER COLUMN <col_name> DROP [NOT NULL] [DEFAULT];
Examples of using syntax:
ALTER TABLE ‘tbl’ ALTER COLUMN 'mycolumn' SET DATA TYPE LONG ;
ALTER TABLE ‘tbl’ ALTER COLUMN 'mycolumn' SET DEFAULT “1”;
ALTER TABLE ‘tbl’ ALTER COLUMN 'mycolumn' DROP DEFAULT;
ALTER TABLE ‘tbl’ ALTER COLUMN 'mycolumn' SET NOT NULL;
ALTER TABLE ‘tbl’ ALTER COLUMN 'mycolumn' DROP NOT NULL;
ALTER TABLE’ tbl’ ALTER COLUMN ‘mycolumn’ SET DATA TYPE LONG NOT NULL DEFAULT -1;
// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.
// Links to discussions on the devlist, if applicable.
// Links to various reference documents, if applicable.
// Links or report with relevant JIRA tickets.