Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The batch job will end. In order to guarantee atomicity, we usually write the results in a temporary directory.

I investigated other engine implementations:

Hive(MR) :atomic

Hive MR is client mode, the client is responsible for parsing, compiling, optimizing, executing, and finally cleaning up.

Therefore, when Hive executes the CTAS command, it can execute Query first, and write the Query result to the temporary directory.

After all MR tasks are executed successfully, create a Table and load the data.

If the execution fails, the table will not be created.

Spark(v1) : non-atomic

There is a role called driver in Spark, 





After the job is successfully executed, move the data to the official directory and create a table, like hive/spark.

...

we finally need to perform the operations of moving data and creating tables in JM.

Support in Table API

...

The executeSql method will be reused

Code Block
languagejava
titleTableEnvironment
collapsetrue
    /**
     * Executes the given single statement and returns the execution result.
     *
     * <p>The statement can be DDL/DML/DQL/SHOW/DESCRIBE/EXPLAIN/USE. For DML and DQL, this method
     * returns {@link TableResult} once the job has been submitted. For DDL and DCL statements,
     * {@link TableResult} is returned once the operation has finished.
     *
     * <p>If multiple pipelines should insert data into one or more sink tables as part of a single
     * execution, use a {@link StatementSet} (see {@link TableEnvironment#createStatementSet()}).
     *
     * <p>By default, all DML operations are executed asynchronously. Use {@link
     * TableResult#await()} or {@link TableResult#getJobClient()} to monitor the execution. Set
     * {@link TableConfigOptions#TABLE_DML_SYNC} for always synchronous execution.
     *
     * @return content for DQL/SHOW/DESCRIBE/EXPLAIN, the affected row count for `DML` (-1 means
     *     unknown), or a string message ("OK") for other statements.
     */
    TableResult executeSql(String statement);


Compatibility, Deprecation, and Migration Plan

...