Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

These two steps seem to be normal, but if there are many fields, spelling DDL statements can be difficult, and write out these columns in the following insert statement. Therefore, we can support CTAS (CREATE TABLE AS SELECT) like MySQL, Oracle, Microsoft SQL Server, Hive, Spark, etc ... It will be more user friendly. In addition, the Hive dialect already has some support for CTAS. My suggestion would be to support a variation of an optional Feature T172, “AS subquery clause in table definition”, of SQL standard.

Public API Changes

Through the appendix research summary and analysis, the current status of CREAE TABLE AS SELECT(CTAS) in the field of big data is:

...

  • Flink: Hive dialect already supports CTAS but does not guarantee atomic(can not roll back). ==> LEVEL-21
  • Spark DataSource v1: is atomic (can roll back), but is not isolated. ==> LEVEL-32
  • Spark DataSource v2: Guaranteed atomicity and isolation. ==> LEVEL-43
  • Hive MR: Guaranteed atomicity and isolation. ==> LEVEL-43

Hive SQL and Spark SQL are mainly used in offline(batch mode) scenarios; Flink SQL is suitable for both real-time(streaming mode) and offline(batch mode) scenarios. In a real-time scenario, we believe that the job is always running and does not stop, and the data is written in real time and visible in real time, so it is no need to provide atomicity.

To ensure that Flink SQL is semantically consistent in Streaming mode and Batch mode, combining the current situation of Flink and the needs of our business, choosing LEVEL-2 atomicity as 1 as the default behavior for Flink streaming and batch mode. If the user requires LEVEL-3 atomicity2 atomicity,  this ability can be achieved by enabling an table.cor-atomicity.enabled option. In general, batch mode usually requires LEVEL-32 atomicity. In a nutshell, Flink provides two level atomicity guarantee, LEVEL-2 as the default behavior.

Public API Changes

Syntax

We proposing the CREATE TABLE AS SELECT(CTAS) clause as following:

...