Status

Current state: ["Under Discussion"]

Discussion thread: here (<- link to https://mail-archives.apache.org/mod_mbox/flink-dev/)

JIRA: Unable to render Jira issues macro, execution error.

Released: 1.16

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

The current syntax/features of Flink SQL is very perfect in both stream mode and batch mode.

But there are still some usability to improve.

for example, If the user wants to insert data into a new table, 2 steps are required:

First, prepare the DDL statement of the table named t1;

Second, insert the data into t1;

These two steps seem to be normal, but if there are many fields, spelling DDL statements can be difficult,

and write out these columns in the following insert statement.

Therefore, we can support CTAS (CREATE TABLE AS SELECT) like MySQL, Oracle, Microsoft SQL Server, Hive, Spark, etc ...

It will be more user friendly. In addition, the Hive dialect already has some support for CTAS.

Proposed Changes

First of all, make it clear, CTAS command create table must go through catalog.

Syntax

I suggest introducing a CTAS clause with a following syntax:

syntax

CREATE TABLE [ IF NOT EXISTS ] table_name 
[ WITH ( table_properties ) ]
[ AS query_expression ]

Example:

syntax

CREATE TABLE ctas_hudi
 WITH ('connector.type' = 'hudi')
 AS SELECT id, name, age FROM hive_catalog.default.test WHERE mod(id, 10) = 0;

Resulting table equivalent to:

syntax

CREATE TABLE ctas_hudi
 (
 	id BIGINT,
 	name STRING,
 	age INT
 )
 WITH ('connector.type' = 'hudi');

INSERT INTO ctas_hudi SELECT id, name, age FROM hive_catalog.default.test WHERE mod(id, 10) = 0;

Program research

I investigated other bigdata engine implementations such as hive, spark:

Hive(MR) ：atomic

Hive MR is client mode, the client is responsible for parsing, compiling, optimizing, executing, and finally cleaning up.

Hive executes the CTAS command as follows:

Execute query first, and write the query result to the temporary directory.
If all MR tasks are executed successfully, then create a table and load the data.
If the execution fails, the table will not be created.

Spark(DataSource v1) : non-atomic

There is a role called driver in Spark, the driver is responsible for compiling tasks, applying for resources, scheduling task execution, tracking task operation, etc.

Spark executes CTAS steps as follows:

Create a sink table based on the schema of the query result.
Execute the spark task and write the result to a temporary directory.
If all Spark tasks are executed successfully, use the Hive API to load data into the sink table created in the first step.
If the execution fails, driver will drop the sink table created in the first step.

Spark(DataSource v2, Not yet completed, Hive Catalog is not supported yet) : optional atomic

Non-atomic

Non-atomic implementation is consistent with DataSource v1 logic. For details, see CreateTableAsSelectExec .

Atomic

Atomic implementation( for details, see AtomicCreateTableAsSelectExec), supported by StagingTableCatalog and StagedTable .

StagedTable supports commit and abort.

StagingTableCatalog is in memory, when executes CTAS steps as follows:

Create a StagedTable based on the schema of the query result, but it is not visible in the catalog.
Execute the spark task and write the result into StagedTable.
If all Spark tasks are executed successfully, call StagedTable#commitStagedChanges(), then it is visible in the catalog.
If the execution fails, call StagedTable#abortStagedChanges().

Implementation Plan

Supported Job Mode

Support both streaming and batch mode.

In order to guarantee atomicity, there will be differences in implementation details.

Streaming

Since streaming job are long-running, the table needs to be created first.

Create the sink table in the catalog based on the schema of the query result.
Start the job and write the result to the sink table.

Batch

The batch job will end. In order to guarantee atomicity, we usually write the results in a temporary directory.

We will refer to spark DataSource v1 implementation.

Steps:

Create the sink table in the catalog based on the schema of the query result.
Start the job and write the result to a temporary directory.
If the job executes successfully, then load data into the sink table.
If the job execution fails, then drop the sink table.(This capability requires runtime module support, such as hook, and SQL passes relevant parameters to the runtime module.)

Drop the table if the job fails requires some additional support:

TableSink needs to provide the CleanUp API, developers implement as needed. Do nothing by default. If an exception occurs, can use this API to drop table or delete the temporary directory, etc.

Precautions

when need drop table:

User manually cancel the job.
Job final FAILED status, such as after exceeds the maximum number of task Failovers.

Drop table and TableSink are strongly bound:

Do not do drop table operations in the framework, drop table is implemented in TableSink according to the needs of specific TableSink, because the operations of different sinks is different.

For example, in HiveTableSink, we need to delete the temporary directory and drop the metadata in the Metastore, but FileSystemTableSink only need to delete the temporary directory,

it is also possible that no operations is required.

Atomicity & Data Visibility

Atomicity

CTAS does not provide strict atomicity, we will create the table first, the final atomicity is determined by the cleanUp implementation of TableSink.

This requires runtime module support, like the description in the implementation of batch mode.

Data Visibility

Regarding data visibility, it is determined by the TableSink and runtime-mode:

Stream mode:

If the external storage system supports transactions or two-phase commit, then data visibility is related to the Checkpoint cycle,

otherwise, data is visible in real time, which is consistent with the current flink behavior.

Batch mode:

Data should be written to the temporary directory first, visible after the final job is successful.

Public API Changes

Table Environment

Providing method that are used to execute CTAS for Table API user.

@PublicEvolving
public interface TableEnvironment {

    /**
     * Registers the given {@link Table}'s result as a catalog table with {@link TableDescriptor}'s options.
     *
     * <p> CTAS for Table API.
     *
     * <p>Examples:
     *
     * <pre>{@code
     * tEnv.createTable("MyTable", TableDescriptor.forConnector("hive")
     *   .build());
     * }</pre>
     *
     * @param path The path under which the table will be registered. See also the {@link
     *     TableEnvironment} class description for the format of the path.
     * @param descriptor Template for creating a {@link CatalogTable} instance.
     * @param query The {@link Table} object describing the pipeline for further transformations.
     */
     void createTable(String path, TableDescriptor descriptor, Table query);
}

Catalog

We can think that there are two types of catalogs in Flink, in-memory catalogs and external catalogs:

In-memory catalog:

Metadata is a copy of the metadata of the external system, and the user ensures that the entity exists in the external system and the metadata is consistency, otherwise, throw exception when running. CTAS need create table first, so it is hard to ensures that the entity exists in the external system and the metadata is consistency.
The user needs to configure the parameters of the external system through the with syntax, and Flink cannot obtain it through the in-memory directory.

Such as kafka table, we need the user to tell us the address of the kafka server, the name of the topic, and the data serialization format, otherwise flink job will failed.

External catalog:

Metadata directly refers to external systems, and there is no consistency problem. Create table also directly calls the external system, so it is naturally guaranteed that the entity exists in the external system.
The with syntax parameter is optional, Flink can obtain the necessary parameters through the external catalog.

Such as hive table, we can obtain the table information required by the Flink engine through HiveCatalog.

Both in-memory catalog and external catalog will support CTAS, if the CTAS command is executed in the in-memory catalog and the target store does not exist in the external system, the Flink job will fail, which is consistent with the current flink behavior.

In-memory Catalog ，we should check the table's options, avoid users not setting configuration parameters.

Providing method that are used to execute create table for CTAS to check table's options.

/**
 * Creates a new table for CTAS.
 *
 * <p>The framework will make sure to call this method with fully validated {@link
 * CatalogTable} Those instances are easy to serialize for a durable catalog implementation.
 *
 * @param tablePath path of the table to be created
 * @param table the table definition
 * @param dropIfExists flag to specify behavior when a table already exists at the
 *     given path: if set to false, it throws a TableAlreadyExistException, if set to true,
 *     drop the table first, then create.
 * @param checkTableOptions flag to specify behavior to check {@param table}'s options.
 * @throws TableAlreadyExistException if table already exists and ignoreIfExists is false
 * @throws DatabaseNotExistException if the database in tablePath doesn't exist
 * @throws CatalogException in case of any runtime exception
 */
void createTable(ObjectPath tablePath, CatalogTable table, boolean dropIfExists,
                 boolean checkTableOptions)
        throws TableAlreadyExistException, DatabaseNotExistException, CatalogException;

Compatibility, Deprecation, and Migration Plan

It is a new feature with no implication for backwards compatibility.

Test Plan

changes will be verified by UT

Rejected Alternatives

N/A

Page tree

Status

Motivation

Proposed Changes

Syntax

Program research

Hive(MR) ：atomic

Spark(DataSource v1) : non-atomic

Spark(DataSource v2, Not yet completed, Hive Catalog is not supported yet) : optional atomic

Non-atomic

Atomic

Implementation Plan

Supported Job Mode

Streaming

Batch

Precautions

when need drop table:

Drop table and TableSink are strongly bound:

Atomicity & Data Visibility

Atomicity

Data Visibility

Stream mode:

Batch mode:

Public API Changes

Table Environment

Catalog

In-memory catalog:

External catalog:

Compatibility, Deprecation, and Migration Plan

Test Plan

Rejected Alternatives

References

Page tree

FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

Status

Motivation

Proposed Changes

Syntax

Program research

Hive(MR) ：atomic

Spark(DataSource v1) : non-atomic

Spark(DataSource v2, Not yet completed, Hive Catalog is not supported yet) : optional atomic

Non-atomic

Atomic

Implementation Plan

Supported Job Mode

Streaming

Batch

Precautions

when need drop table:

Drop table and TableSink are strongly bound:

Atomicity & Data Visibility

Atomicity

Data Visibility

Stream mode:

Batch mode:

Public API Changes

Table Environment

Catalog

In-memory catalog:

External catalog:

Compatibility, Deprecation, and Migration Plan

Test Plan

Rejected Alternatives

References