ID	IEP-108
Author	Andrey Mashenkov
Sponsor	Andrey Mashenkov
Created	12 Jun 2023
Status	DRAFT

Motivation

This proposal introduces a new feature that allows users to change the column type in a database tables. With this feature allows users to adapt their database schema to evolving business requirements without extensive downtime and/or manual actions.
Column type change feature is supported by most vendors. Postgres, Oracle, MsSQL, mySQL are among them. In old-school databases, column type changing takes the table offline for some period, which duration depends on operation complexity. It may be quite short e.g. for NOT_NULL -> NULLABLE, which doesn't require data validation or data conversion, or long e.g. NULLABLE -> NOT_NULL, which requires validation of existed data.

Description

By leveraging versioned schema functionality, we can

eliminate the need for table downtime during the column type change process. Most of the work can be performed in parallel with ongoing transactional load, significantly reducing the downtime.
postpone data conversion and upgrade data on an ad-hoc basis when a reading transaction requests the data in the newer version.
maintain consistency with time-traveled queries, where transactions initiated before the DDL operation will observe the old schema, while transactions starting after the DDL operation will see the new schema

The main limitation here - the underlying storage must supports versioned schemas. To understand how a new schema is applied and which schema version is used in transaction, please, refer to IEP-98 Schema synchronization design

In general case, changing column type caused next challenges

preliminary data validation to check if the operation can be applied or should be rolled back.
data conversion to make time-traveled queries see consistent data.
committing a transaction, which was started on schema A (before changing column type), but finishes on schema B (after column type changed). See compatibility guarantees in IEP-98 Schema synchronization design

Data validation.

Some operations, which relax constraints and switch column type to a type of wider value range, doesn't require data validation.In that case we can skip validation phase.

For other operations we can start a background validation process and validate ongoing transactions to ensure the column change operation is applicable.

<TBD: describe background validation details, limitation and guarantees here>

Data conversion.

Leveraging versioned schema functionality allow us to postpone row upgrading. Schema history can have all the information required for upgrading row to a new schema version.
Thus any valid data type changing is always forward compatible. The backward conversion may be impossible in some cases, and backward compatibility can be stated.

A row is stored in schema were actual at a time the row was created. Thus, any further transaction may upgrade the row regarding schema history, and any time-travelling transaction can read the row using correct schema.
This also introduce the limitation: the row can't be replaced with it's newer version if the row is visible for any time-travelled transaction.

Some operations like changing FLOAT->DOUBLE may produce different results, because these types are not accurate. This case is ok, but may be non-obvious should be well documented.

Float: 3.6
Double: 3.5999999046325684

<TBD: describe operation limitation, when data conversion is required or impossible. E.g. changing type for indexed column>

Proposal.

For starters, let's focus on the next operations in a list below

INT8 -> INT16 -> INT32 -> INT64
FLOAT -> DOUBLE
DECIMAL(10,5) -> DECIMAL(15, 5) // increasing precision only
VARCHAR(10) -> VARCHAR(20)
VARBINARY(10) -> VARBINARY(20)
NOT NULL -> NULLABLE

These operation doesn't requires validation of the existed data beforehand.
Current BinaryTuple implementation stores integer numbers in compact format and does implicit conversion between these types transparently. So, no actual conversion for INT8->INT16->INT32->INT64, and for FLOAT->DOUBLE is required.
Changing max length, precision and nullability doesn't require any data conversion.

In general, all these operations are not backward compatible. For starters, we can suppose changing column type operation breaks backward compatibility.

Further, we can apply more sophisticated techniques to detect if compatibility was really broken or not. try to verify values against previous schema, e.g. check whether value of INT64 type fits to INT32 type.
However, FLOAT->DOUBLE is always backward incompatible, because casting ‘double to float’ leads to accuracy loss, and is forbidden by BinaryTuple design.

Change data type for indexed column should be forbidden. Further, we could relax this limitation for cases, when the operation is safe.

API.

The most popular vendors dialects:

Oracle dialect: ALTER TABLE <table_name> MODIFY COLUMN <col_name> <new_type>;
Postgres dialect: ALTER TABLE <table_name> ALTER COLUMN <col_name> TYPE <new_type> [USING <expression>];
MS SQL dialect: ALTER COLUMN column_name new_data_type(size);

These examples are differs from DDL API described in SQL`11 standard. Postgres syntax is the most closest to the standard, and extends the standard with ‘USING’ clause.
The original DDL syntax assumes a single action per statement and offers different actions for changing data type, column default value and column nullability.

ALTER TABLE <table_name> ALTER COLUMN <col_name> [SET DATA TYPE <new_type> | SET DEFAULT <new_default> | DROP DEFAULT | SET NOT NULL | DROP NOT NULL];

*dropping column default means setting default to null.

Let’s support syntax compatible with the standard, and also allow changing all the properties within the same statement:

ALTER TABLE <table_name> ALTER COLUMN <col_name> SET [DATA TYPE <new_type>] [NOT NULL] [DEFAULT <default value>]; ALTER TABLE <table_name> ALTER COLUMN <col_name> DROP [NOT NULL] [DEFAULT];

Examples of using syntax:

ALTER TABLE ‘tbl’ ALTER COLUMN 'mycolumn' SET DATA TYPE LONG ;

ALTER TABLE ‘tbl’ ALTER COLUMN 'mycolumn' SET DEFAULT “1”; ALTER TABLE ‘tbl’ ALTER COLUMN 'mycolumn' DROP DEFAULT;

ALTER TABLE ‘tbl’ ALTER COLUMN 'mycolumn' SET NOT NULL; ALTER TABLE ‘tbl’ ALTER COLUMN 'mycolumn' DROP NOT NULL;

ALTER TABLE’ tbl’ ALTER COLUMN ‘mycolumn’ SET DATA TYPE LONG NOT NULL DEFAULT -1;

Risks and Assumptions

// Describe project risks, such as API or binary compatibility issues, major protocol changes, etc.

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

// Links to various reference documents, if applicable.

Tickets

// Links or report with relevant JIRA tickets.

Page tree

IEP-108 Change column type