Page History

...

Inconsistent execution semantics for `sqlUpdate()`. For now, one DDL statement passed to this method will be executed immediately while one `INSERT INTO` statement actually gets executed when we call the `execute()` method, which confuses users a lot.
Don’t support obtaining the returned value from sql execute. The FLIP-69[1] introduces a lot of common DDLs such as `SHOW TABLES`, which require that TableEnvironment can have an interface to obtain the executed result of one DDL. SQL CLI also has a strong demand for this feature so that we can easily unify the execute way of SQL CLI and TableEnvironemt. Besides, the method name `sqlUpdate` is not consistent with doing things like `SHOW TABLES`.
Unclear and buggy support buffering SQLs/Tables execution[2]. Blink planner has provided the ability to optimize multiple sinks, but we don’t have a clear mechanism through TableEnvironment API to control the whole flow.
Unclear Flink table program trigger point. Both `TableEnvironment.execute()` and `StreamExecutionEnvironment.execute` execute()` can trigger a Flink table program execution. However if you use TableEnvironment to build a Flink table program, you must use `TableEnvironment.execute()` to trigger execution, because you can’t get the StreamExecutionEnvironment instance. If you use StreamTableEnvironment to build a Flink table program, you can use both to trigger execution. If you convert a table program to a DataStream program (using StreamExecutionEnvironment.toAppendStream/toRetractStream), you also can use both to trigger execution. So it’s hard to explain which `execute` method should be used. Similar to StreamTableEnvironment, BatchTableEnvironment has the same problem.
Don’t support executing Flink table jobs in an asynchronous way. In streaming mode, one `INSERT INTO xxx` statement may never end. It’s also possible that one ETL task takes too much time to be done in a batch environment. So it’s very natural and necessary to support executing Flink table jobs in an asynchronous way.

...

We propose to deprecate the following methods in TableEnvironment:

void TableEnvironment.sqlUpdate(String sql)void
TableEnvironment.insertInto(String targetPath, Table table)
void TableEnvironment.execute(String jobName)
String TableEnvironment.explain(boolean extended)
Table TableEnvironment.fromTableSource(TableSource<?> source)
Table.insertInto(String)

meanwhile, we propose to introduce the following new methods:
New methods in TableEnvironment:

New methods in Table:
- TableResult executeSqlResultTable executeStatement(String statement)
  synchronously throw Exception;
  synchronously execute the given single statement immediately, and return the execution result.
  publicinterfaceResultTable {
  TableSchema getResultSchema();
  Iterable<Row> getResultRows();
- String explainSql(String statement, ExplainDetail... extraDetails);
  get the AST and the execution plan for the given single statement (DQL, DML)
- StatementSet createStatementSet}
  DmlBatch createDmlBatch()
  create a DmlBatch StatementSet instance which can add dml DML statements or Tables to the batch set and explain or execute them as a batch.
  
  New methods in Table:interfaceDmlBatch{
  void addInsert(String insert);
  voidaddInsert(String targetPath, Table table);
  ResultTableexecute() throws Exception ;
  Stringexplain(boolean extended);
  }
  
  publicinterfaceResultTable {
  TableSchema getResultSchema();
  Iterable<Row> getResultRows();
  }
For current messy Flink table program trigger point, we propose that: for TableEnvironment and StreamTableEnvironment, you must use `TableEnvironment.execute()` to trigger table program execution, once you convert the table program to a DataStream program (through `toAppendStream` or `toRetractStream` method), you must use `StreamExecutionEnvironment.execute` to trigger the DataStream program.

...

Page tree

Versions Compared

Old Version 20

New Version 21

Key