Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Flink: Flink dialect does not support CTAS. ==> LEVEL-1
  • Flink: Hive dialect already supports CTAS but does not guarantee atomic(can not roll back). ==> LEVEL-2
  • Spark DataSource v1: is atomic (can roll back), but is not isolated. ==> LEVEL-23
  • Spark DataSource v2: Guaranteed atomicity and isolation. ==> LEVEL-34
  • Hive MR: Guaranteed atomicity and isolation. ==> LEVEL-35

Hive SQL and Spark SQL are mainly used in offline(batch mode) scenarios; Flink SQL is suitable for both real-time(streaming mode) and offline(batch mode) scenarios. In a real-time scenario, we believe that the job is always running and does not stop, and the data is visible in real time, so we do not think it is necessary to provide atomicity in a real-time scenario.

Combining the current situation of Flink and the needs of our business, choosing LEVEL-2 atomicity as the default behavior for Flink streaming and batch, allowing users to enable LEVEL-3 atomicity support with an optionCombining the current situation of Flink and the needs of our business, choosing a Level-2 atomicity implementation for Flink in batch execution mode. However, in streaming mode, we don't provide atomicity guarantees because of job is long running. Moreover, at the moment here no strong needs to guarantee atomicity in stream mode.

Syntax

We proposing the CREATE TABLE AS SELECT(CTAS) clause as following:

...