THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- Inconsistent execution semantics for `sqlUpdate()`. For now, one DDL statement passed to this method will be executed immediately while one `INSERT INTO` statement actually gets executed when we call the `execute()` method, which confuses users a lot.
- Don’t support obtaining the returned value from sql execute. The FLIP-69[1] introduces a lot of common DDLs such as `SHOW TABLES`, which require that TableEnvironment can have an interface to obtain the executed result of one DDL. SQL CLI also has a strong demand for this feature so that we can easily unify the execute way of SQL CLI and TableEnvironemt. Besides, the method name `sqlUpdate` is not consistent with doing things like `SHOW TABLES`.
- Unclear and buggy support buffering SQLs/Tables execution[2]. Blink planner has provided the ability to optimize multiple sinks, but we don’t have a clear mechanism through TableEnvironment API to control the whole flow.
- Unclear Flink table program trigger point. Both `TableEnvironment.execute()` and `StreamExecutionEnvironment.execute` execute()` can trigger a Flink table program execution. However if you use TableEnvironment to build a Flink table program, you must use `TableEnvironment.execute()` to trigger execution, because you can’t get the StreamExecutionEnvironment instance. If you use StreamTableEnvironment to build a Flink table program, you can use both to trigger execution. If you convert a table program to a DataStream program (using StreamExecutionEnvironment.toAppendStream/toRetractStream), you also can use both to trigger execution. So it’s hard to explain which `execute` method should be used. Similar to StreamTableEnvironment, BatchTableEnvironment has the same problem.
- Don’t support executing Flink table jobs in an asynchronous way. In streaming mode, one `INSERT INTO xxx` statement may never end. It’s also possible that one ETL task takes too much time to be done in a batch environment. So it’s very natural and necessary to support executing Flink table jobs in an asynchronous way.
...
We propose to deprecate the following methods in TableEnvironment:
void TableEnvironment.sqlUpdate(String sql)void
- TableEnvironment.insertInto(String targetPath, Table table)
- void TableEnvironment.execute(String jobName)
- String TableEnvironment.explain(boolean extended)
- Table TableEnvironment.fromTableSource(TableSource<?> source)
- Table.insertInto(String)
- meanwhile, we propose to introduce the following new methods:
New methods in TableEnvironment:
New methods in Table:- TableResult executeSqlResultTable executeStatement(String statement)
synchronously throw Exception;
synchronously execute the given single statement immediately, and return the execution result.
publicinterfaceResultTable {
TableSchema getResultSchema();
Iterable<Row> getResultRows(); - String explainSql(String statement, ExplainDetail... extraDetails);
get the AST and the execution plan for the given single statement (DQL, DML)
- StatementSet createStatementSet}
DmlBatch createDmlBatch()
create a DmlBatch StatementSet instance which can add dml DML statements or Tables to the batch set and explain or execute them as a batch.
New methods in Table:interfaceDmlBatch{
void addInsert(String insert);
voidaddInsert(String targetPath, Table table);
ResultTableexecute() throws Exception ;
Stringexplain(boolean extended);
}
publicinterfaceResultTable {
TableSchema getResultSchema();
Iterable<Row> getResultRows();
}
- TableResult executeSqlResultTable executeStatement(String statement)
- For current messy Flink table program trigger point, we propose that: for TableEnvironment and StreamTableEnvironment, you must use `TableEnvironment.execute()` to trigger table program execution, once you convert the table program to a DataStream program (through `toAppendStream` or `toRetractStream` method), you must use `StreamExecutionEnvironment.execute` to trigger the DataStream program.
...