...
Page properties |
---|
Current state: "Under Discussion",
Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-64-Support-for-Temporary-Objects-in-Table-module-td32684.html
JIRA:
Released: <Flink Version>
|
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
In other words the TableSource & TableSink are to too physical to be exposed for the inline declaration. They are meant for predefined connectors. If a user wants to read from some inline source, it can be done with from/toDataStream.
...
As a result of the rework of the type system in UDFs, we will be able to merge the three methods into a single one. Moreover for permanent functions we need users to register a class instead of an instance. To keep this method in sync with SQL DDL we should encourage users to use a class name for temporary functions as well.
We use the “create” prefix rather than “register” to We use the “create” prefix rather than “register” to be closer to SQL DDL.
Suggested methods:
void
...
createTemporaryFunction(String
...
path,
...
Class<? extends UserDefinedFunction> functionClass);
There were concerns raised if we need a variant that registers instances. This FLIP does not make any assumptions if we should or not introduce that method. We will revisit that after FLIP-65
The method to be discussed:
void createTemporaryFunction(String path, UserDefinedFunction function);
ConnectTableDescriptor#registerTableSinkConnectTableDescriptor#registerTableSink,registerTableSource,registerTableSourceAndSink
...
Current call | Replacement | Comment |
registerTable | createTemporaryView | For the non temporary part we need to make `QueryOperation` string serializable. |
registerTableSink | (Deprecate) → to be removed | |
registerTableSource | (Deprecate) → to be removed | |
registerDataStream | createTemporaryView | |
registerScalarFunction/ registerAggergateFunction/ registerTableFunction | createTemporaryFunction | We can unify the 3 methods into one once we rework type inference for UDFs |
...
Current call | Replacement | SQL equivalent | Comment |
registerTableSource | (Deprecate) → to be removed | - | |
registerTableSink | (Deprecate) → to be removed | - | |
registerTableSourceAndSink | createTemporaryTable | CREATE TEMPORARY TABLE | We should not support CREATE TEMPORARY TABLE AS SELECT = CREATE TEMPORARY VIEW |
...
Call to add | SQL equivalent | Comment |
createView(Table) | CREATE VIEW | We need to make `QueryOperation`s string serializable. |
createTable(TableDescriptor) | CREATE TABLE | We need to rework TableDescriptor. We can temporary temporarily use ConnectTableDescriptor#connect |
createFunction | CREATE FUNCTION | We need serializable function representation |
...
createTemporaryView("temp", ...) → registers function with an identifier `current_cat`.`current_db`.`temp`
The same logic applies for looking up objects:
tEnv.scanfrom("cat.db.temp") → scans a view/table with an identifier `cat`.`db`.`temp`
tEnv.scanfrom("db.temp") → scans a view/table with an identifier `current_cat`.`db`.`temp`
tEnv.scanfrom("temp") → scans a view/table with an identifier `current_cat`.`current_db`.`temp`
...
createTemporaryFunction("temp", new Function().class) → registers function with an identifier `temp`
...
createTemporaryFunction("cat.db.temp", new Function().class) → registers function with an identifier `cat`.`db`.`temp`
createTemporaryFunction("db.temp", new Function(.class)) → registers function with an identifier `current_cat`.`db`.`temp`
Temporary objects should be stored in memory in a CatalogManager/FunctionCatalog
...
- 1-part path
- no other system has such semantics, all systems assign temporary tables & views to some schema (either with the same rules as regular objects or special temporary schema)
- Require special names for temporary objects, e.g. (#name as in SQL Server, or PTT_nam as in ORACLE)
- Register temporary objects in a special DB (as in SQL Server, Oracle, Postgres)
- Always assign temporary functions to some namespace (see FLIP-57).
References:
How other systems handle temporary objects:
MySQL:
MySQL allows creating temporary tables in any schema (even if the target database does not exist). Those temporary tables take precedence over permanent tables with the same name (and schema). Therefore user cannot access a permanent table unless the temporary table is dropped.
MySQL adds also a special syntax for dropping temporary tables (DROP TEMPORARY TABLE) in order to prohibit dropping permanent tables when the intention was to drop a temporary one.
https://dev.mysql.com/doc/refman/8.0/en/create-temporary-table.html
http://www.mysqltutorial.org/mysql-temporary-table/
Hive:
Hive implements similar behavior to MySQL. The difference is that the database must exist. Hive also does not add the DROP TEMPORARY TABLE syntax.
In hive temporary functions must not have a database.
SQL Server:
SQL server reserves a special schema for temporary tables (dbo). It also forces user to prefix table names with ‘#’ character. This needed to differentiate if the table should be temporary or permanent. Therefore, it is not possible to override a permanent table.
If you user includes a schema_name when he/she creates or accesses a temporary table, it is ignored. All temporary tables are created in the dbo schema.
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql?view=sql-server-2017
Oracle:
Oracle implements similar behavior to SQL Server.
https://oracle-base.com/articles/18c/private-temporary-tables-18c
Postgres:
Postgres allows overriding permanent tables with temporary objects, but does not allow arbitrary schemas. Objects identifiers consist of only a single object name.
https://www.postgresql.org/docs/9.3/sql-createtable.html
Compatibility, Deprecation, and Migration Plan
- We deprecate all methods mentioned above and remove them in the next release after the one when we deprecated them
Test Plan
Describe in a few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?
Rejected Alternatives
Compatibility, Deprecation, and Migration Plan
- Methods of TableEnvironment to be deprecated:
- void registerTable(String name, Table table)
- void registerTableSource(String name, TableSource<?> tableSource);
- void registerTableSink(String name, TableSink<?> configuredSink);
- Table scan(String... tablePath)
- void registerFunction(String name, ScalarFunction function)
- <T> void registerFunction(String name, TableFunction<T> tableFunction)
- <T, ACC> void registerFunction(String name, AggregateFunction<T, ACC> aggregateFunction)
- <T, ACC> void registerFunction(String name, TableAggregateFunction<T, ACC> tableAggregateFunction)
- <T> void registerDataStream(String name, DataStream<T> dataStream)
- <T> void registerDataStream(String name, DataStream<T> dataStream, String fields)
- <T> void registerDataSet(String name, DataStream<T> dataStream)
- <T> void registerDataSet(String name, DataStream<T> dataStream, String fields)
- Methods of ConnectTableDescriptor to be deprecated:
- public void registerTableSource(String name)
- public void registerTableSink(String name)
- public void registerTableSourceAndSink(String name)
- Methods of TableEnvironment to be dropped:
- void insertInto(Table table, StreamQueryConfig queryConfig, String sinkPath, String... sinkPathContinued)
- void insertInto(Table table, BatchQueryConfig queryConfig, String sinkPath, String... sinkPathContinued)
- void insertInto(Table table, String sinkPath, String... sinkPathContinued)
- Methods of Table to be dropped
- void insertInto(QueryConfig conf, String tablePath, String... tablePathContinued)
- void insertInto(String tablePath, String... tablePathContinued)
Implementation plan
The implementation of changes described for functions of this FLIP has to be postponed after type inference is exposed for UserDefinedFunctions.
References:
How other systems handle temporary objects:
MySQL:
MySQL allows creating temporary tables in any schema (even if the target database does not exist). Those temporary tables take precedence over permanent tables with the same name (and schema). Therefore user cannot access a permanent table unless the temporary table is dropped.
MySQL adds also a special syntax for dropping temporary tables (DROP TEMPORARY TABLE) in order to prohibit dropping permanent tables when the intention was to drop a temporary one.
https://dev.mysql.com/doc/refman/8.0/en/create-temporary-table.html
http://www.mysqltutorial.org/mysql-temporary-table/
Hive:
Hive implements similar behavior to MySQL. The difference is that the database must exist. Hive also does not add the DROP TEMPORARY TABLE syntax.
In hive temporary functions must not have a database.
SQL Server:
SQL server reserves a special schema for temporary tables (dbo). It also forces user to prefix table names with ‘#’ character. This needed to differentiate if the table should be temporary or permanent. Therefore, it is not possible to override a permanent table.
If you user includes a schema_name when he/she creates or accesses a temporary table, it is ignored. All temporary tables are created in the dbo schema.
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql?view=sql-server-2017
Oracle:
Oracle implements similar behavior to SQL Server.
https://oracle-base.com/articles/18c/private-temporary-tables-18c
Postgres:
Postgres allows overriding permanent tables with temporary objects, but does not allow arbitrary schemas. Objects identifiers consist of only a single object name.
https://www.postgresql.org/docs/9.3/sql-createtable.htmlIf there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.