Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Hive SQL and Spark SQL are mainly used in offline(batch mode) scenarios; Flink SQL is suitable for both real-time(streaming mode) and offline(batch mode) scenarios. In a real-time scenario, we believe that the job is always running and does not stop, and the data is written in real time and visible in real time, so we do not think it is necessary to provide atomicity in a real-time scenario..

To ensure that Flink SQL is semantically consistent in Streaming mode and Batch mode, combining Combining the current situation of Flink and the needs of our business, choosing LEVEL-2 atomicity as the default behavior for Flink streaming and batch. If the user requires atomicity, allowing users to enable LEVEL-3 atomicity support atomicity with an option. 

Syntax

We proposing the CREATE TABLE AS SELECT(CTAS) clause as following:

...

Add configuration options to allow users to enable atomicity.

111111
// placeholder

Implementation Plan

We provide two semantics for Flink CTAS: Non-atomic and Atomic. Non-atomic implementations are the default behavior of Streaming and Batch modes. 

...

The non-atomic implementation is basically the same as the existing Insert data process, except that the sink table is first created on the Client side via Catalog before performing the insert.

Compile the SQL, parse the schema of the sink table based on the query, then create the table, and finally submit the job to write data to the sink table. No need for too much introduction.

...

The atomicity implementation of Flink CTAS requires two parts:

  •  Add Add the Enable atomicity support option.
  • Catalog can be serialized, ensuring atomicity by performing created/dropped table on the JM side.

...