Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


...

Page properties

Current state: "Under Discussion",

Discussion threadhttp://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-64-Support-for-Temporary-Objects-in-Table-module-td32684.html

JIRA:

Released: <Flink Version>


Discussion thread
Vote thread
JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-14485

Release1.11


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

In other words the TableSource & TableSink are to too physical to be exposed for the inline declaration. They are meant for predefined connectors. If a user wants to read from some inline source, it can be done with from/toDataStream.

...

As a result of the rework of the type system in UDFs, we will be able to merge the three methods into a single one.  Moreover for permanent functions we need users to register a class instead of an instance. To keep this method in sync with SQL DDL we should encourage users to use a class name for temporary functions as well.

We use the “create” prefix rather than “register” to We use the “create” prefix rather than “register” to be closer to SQL DDL.

Suggested methods:

void

...

createTemporaryFunction(String

...

path,

...

Class<? extends UserDefinedFunction> functionClass);

There were concerns raised if we need a variant that registers instances. This FLIP does not make any assumptions if we should or not introduce that method. We will revisit that after FLIP-65

The method to be discussed:

void createTemporaryFunction(String path, UserDefinedFunction function);

ConnectTableDescriptor#registerTableSinkConnectTableDescriptor#registerTableSink,registerTableSource,registerTableSourceAndSink

...

Current call

Replacement

Comment

registerTable

createTemporaryView

For the non temporary part we need to make `QueryOperation` string serializable.

registerTableSink

(Deprecate) → to be removed


registerTableSource

(Deprecate) → to be removed


registerDataStream

createTemporaryView


registerScalarFunction/

registerAggergateFunction/

registerTableFunction

createTemporaryFunction

We can unify the 3 methods into one once we rework type inference for UDFs

...

Current call

Replacement

SQL equivalent

Comment

registerTableSource

(Deprecate) → to be removed

-


registerTableSink

(Deprecate) → to be removed

-


registerTableSourceAndSink

createTemporaryTable

CREATE TEMPORARY TABLE

We should not support CREATE TEMPORARY TABLE AS SELECT

 = CREATE TEMPORARY VIEW

...

Call to add

SQL equivalent

Comment

createView(Table)

CREATE VIEW

We need to make `QueryOperation`s string serializable.

createTable(TableDescriptor)

CREATE TABLE

We need to rework TableDescriptor. We can temporary temporarily use ConnectTableDescriptor#connect

createFunction

CREATE FUNCTION

We need serializable function representation

...

createTemporaryView("temp", ...) → registers function with an identifier `current_cat`.`current_db`.`temp`

The same logic applies for looking up objects:
tEnv.scanfrom("cat.db.temp") → scans a view/table with an identifier `cat`.`db`.`temp`
tEnv.scanfrom("db.temp") → scans a view/table with an identifier `current_cat`.`db`.`temp`
tEnv.scanfrom("temp") → scans a view/table with an identifier `current_cat`.`current_db`.`temp`

...

createTemporaryFunction("temp", new Function().class) → registers function with an identifier `temp`

...

createTemporaryFunction("cat.db.temp", new Function().class) → registers function with an identifier `cat`.`db`.`temp`
createTemporaryFunction("db.temp", new Function(.class)) → registers function with an identifier `current_cat`.`db`.`temp`

Temporary objects should be stored in memory in a CatalogManager/FunctionCatalog

...

  1. 1-part path
    1. no other system has such semantics, all systems assign temporary tables & views to some schema (either with the same rules as regular objects or special temporary schema)
  2. Require special names for temporary objects, e.g. (#name as in SQL Server, or PTT_nam as in ORACLE)
  3. Register temporary objects in a special DB (as in SQL Server, Oracle, Postgres)
  4. Always assign temporary functions to some namespace (see FLIP-57).

References:

How other systems handle temporary objects:

MySQL:

MySQL allows creating temporary tables in any schema (even if the target database does not exist). Those temporary tables take precedence over permanent tables with the same name (and schema). Therefore user cannot access a permanent table unless the temporary table is dropped.

MySQL adds also a special syntax for dropping temporary tables (DROP TEMPORARY TABLE) in order to prohibit dropping permanent tables when the intention was to drop a temporary one.

https://dev.mysql.com/doc/refman/8.0/en/create-temporary-table.html

http://www.mysqltutorial.org/mysql-temporary-table/

Hive:

Hive implements similar behavior to MySQL. The difference is that the database must exist. Hive also does not add the DROP TEMPORARY TABLE syntax.

In hive temporary functions must not have a database.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-TemporaryTables

SQL Server:

SQL server reserves a special schema for temporary tables (dbo). It also forces user to prefix table names with ‘#’ character. This needed to differentiate if the table should be temporary or permanent. Therefore, it is not possible to override a permanent table.

If you user includes a schema_name when he/she creates or accesses a temporary table, it is ignored. All temporary tables are created in the dbo schema.

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql?view=sql-server-2017

https://docs.microsoft.com/de-de/sql/relational-databases/databases/tempdb-database?view=sql-server-2017

Oracle:

Oracle implements similar behavior to SQL Server.

https://oracle-base.com/articles/18c/private-temporary-tables-18c

Postgres:

Postgres allows overriding permanent tables with temporary objects, but does not allow arbitrary schemas. Objects identifiers consist of only a single object name.

https://www.postgresql.org/docs/9.3/sql-createtable.html

Compatibility, Deprecation, and Migration Plan

  • We deprecate all methods mentioned above and remove them in the next release after the one when we deprecated them

Test Plan

Describe in a few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

Compatibility, Deprecation, and Migration Plan

  • Methods of TableEnvironment to be deprecated:
    • void registerTable(String name, Table table)
    • void registerTableSource(String name, TableSource<?> tableSource);
    • void registerTableSink(String name, TableSink<?> configuredSink);
    • Table scan(String... tablePath)
    • void registerFunction(String name, ScalarFunction function)
    • <T> void registerFunction(String name, TableFunction<T> tableFunction)
    • <T, ACC> void registerFunction(String name, AggregateFunction<T, ACC> aggregateFunction)
    • <T, ACC> void registerFunction(String name, TableAggregateFunction<T, ACC> tableAggregateFunction)
    • <T> void registerDataStream(String name, DataStream<T> dataStream)
    • <T> void registerDataStream(String name, DataStream<T> dataStream, String fields)
    • <T> void registerDataSet(String name, DataStream<T> dataStream)
    • <T> void registerDataSet(String name, DataStream<T> dataStream, String fields)
  • Methods of ConnectTableDescriptor to be deprecated:
    • public void registerTableSource(String name) 
    • public void registerTableSink(String name) 
    • public void registerTableSourceAndSink(String name)
  • Methods of TableEnvironment to be dropped:
    • void insertInto(Table table, StreamQueryConfig queryConfig, String sinkPath, String... sinkPathContinued)
    • void insertInto(Table table, BatchQueryConfig queryConfig, String sinkPath, String... sinkPathContinued)
    • void insertInto(Table table, String sinkPath, String... sinkPathContinued)
  • Methods of Table to be dropped
    • void insertInto(QueryConfig conf, String tablePath, String... tablePathContinued)
    • void insertInto(String tablePath, String... tablePathContinued)

Implementation plan

The implementation of changes described for functions of this FLIP has to be postponed after type inference is exposed for UserDefinedFunctions.

References:

How other systems handle temporary objects:

MySQL:

MySQL allows creating temporary tables in any schema (even if the target database does not exist). Those temporary tables take precedence over permanent tables with the same name (and schema). Therefore user cannot access a permanent table unless the temporary table is dropped.

MySQL adds also a special syntax for dropping temporary tables (DROP TEMPORARY TABLE) in order to prohibit dropping permanent tables when the intention was to drop a temporary one.

https://dev.mysql.com/doc/refman/8.0/en/create-temporary-table.html

http://www.mysqltutorial.org/mysql-temporary-table/

Hive:

Hive implements similar behavior to MySQL. The difference is that the database must exist. Hive also does not add the DROP TEMPORARY TABLE syntax.

In hive temporary functions must not have a database.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-TemporaryTables

SQL Server:

SQL server reserves a special schema for temporary tables (dbo). It also forces user to prefix table names with ‘#’ character. This needed to differentiate if the table should be temporary or permanent. Therefore, it is not possible to override a permanent table.

If you user includes a schema_name when he/she creates or accesses a temporary table, it is ignored. All temporary tables are created in the dbo schema.

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql?view=sql-server-2017

https://docs.microsoft.com/de-de/sql/relational-databases/databases/tempdb-database?view=sql-server-2017

Oracle:

Oracle implements similar behavior to SQL Server.

https://oracle-base.com/articles/18c/private-temporary-tables-18c

Postgres:

Postgres allows overriding permanent tables with temporary objects, but does not allow arbitrary schemas. Objects identifiers consist of only a single object name.

https://www.postgresql.org/docs/9.3/sql-createtable.htmlIf there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.