Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Inconsistent execution semantics for `sqlUpdate()`. For now, one DDL statement passed to this method will be executed immediately while one `INSERT INTO` statement actually gets executed when we call the `execute()` method, which confuses users a lot.
  2. Don’t support obtaining the returned value from sql execute. The FLIP-69[1] introduces a lot of common DDLs such as `SHOW TABLES`, which require that TableEnvironment can have an interface to obtain the executed result of one DDL. SQL CLI also has a strong demand for this feature so that we can easily unify the execute way of SQL CLI and TableEnvironemt. Besides, the method name `sqlUpdate` is not consistent with doing things like `SHOW TABLES`.
  3. Unclear and buggy support buffering SQLs/Tables execution[2]. Blink planner has provided the ability to optimize multiple sinks, but we don’t have a clear mechanism through TableEnvironment API to control the whole flow.
  4. Unclear Flink table program trigger point. Both `TableEnvironment.execute()` and `StreamExecutionEnvironment.execute` execute()` can trigger a Flink table program execution. However if you use TableEnvironment to build a Flink table program, you must use `TableEnvironment.execute()` to trigger execution, because you can’t get the StreamExecutionEnvironment instance. If you use StreamTableEnvironment to build a Flink table program, you can use both to trigger execution. If you convert a table program to a DataStream program (using StreamExecutionEnvironment.toAppendStream/toRetractStream), you also can use both to trigger execution. So it’s hard to explain which `execute` method should be used.  Similar to StreamTableEnvironment, BatchTableEnvironment has the same problem.
  5. Don’t support executing Flink table jobs in an asynchronous way. In streaming mode, one `INSERT INTO xxx` statement may never end. It’s also possible that one ETL task takes too much time to be done in a batch environment. So it’s very natural and necessary to support executing Flink table jobs in an asynchronous way. 

...

  1. We propose to deprecate the following methods in TableEnvironment:

    • void TableEnvironment.sqlUpdate(String sql)void

    • TableEnvironment.insertInto(String targetPath, Table table)
    • void TableEnvironment.execute(String jobName)
    • String TableEnvironment.explain(boolean extended)
    • Table TableEnvironment.fromTableSource(TableSource<?> source)
    • Table.insertInto(String)

  2. meanwhile, we propose to introduce the following new methods:
    New methods in TableEnvironment:


    New methods in Table:
    • TableResult executeSqlResultTable executeStatement(String statement) 
      synchronously  throw Exception;
      synchronously execute the given single statement immediately, and return the execution result.
      publicinterfaceResultTable {
          TableSchema getResultSchema();
          Iterable<Row> getResultRows();

    • String explainSql(String statement, ExplainDetail... extraDetails);
      get the AST and the execution plan for the given single statement (DQL, DML)
    • StatementSet createStatementSet}
      DmlBatch createDmlBatch()
      create a DmlBatch StatementSet instance which can add dml DML statements or Tables to the batch set and explain or execute them as a batch.

      New methods in Table:interfaceDmlBatch{
          void addInsert(String insert);
          voidaddInsert(String targetPath, Table table);
          ResultTableexecute() throws Exception ;
          Stringexplain(boolean extended);
      }

      publicinterfaceResultTable {
          TableSchema getResultSchema();
          Iterable<Row> getResultRows();
      }
  3. For current messy Flink table program trigger point, we propose that: for TableEnvironment and StreamTableEnvironment, you must use `TableEnvironment.execute()` to trigger table program execution, once you convert the table program to a DataStream program (through `toAppendStream` or `toRetractStream` method), you must use `StreamExecutionEnvironment.execute` to trigger the DataStream program.

...