Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As a result of the rework of the type system in UDFs, we will be able to merge the three methods into a single one.  Moreover for permanent functions we need users to register a class instead of an instance. To keep this method in sync with SQL DDL we should encourage users to use a class name for temporary functions as well.

We use the “create” prefix rather than “register” to be closer to SQL DDL.

...

void createTemporaryFunction(String path, UserDefinedFunction functionClass<? extends UserDefinedFunction> functionClass);

ConnectTableDescriptor#registerTableSink,registerTableSource,registerTableSourceAndSink

...

Current call

Replacement

Comment

registerTable

createTemporaryView

For the non temporary part we need to make `QueryOperation` string serializable.

registerTableSink

(Deprecate) → to be removed


registerTableSource

(Deprecate) → to be removed


registerDataStream

createTemporaryView


registerScalarFunction/

registerAggergateFunction/

registerTableFunction

createTemporaryFunction

We can unify the 3 methods into one once we rework type inference for UDFs

...

Current call

Replacement

SQL equivalent

Comment

registerTableSource

(Deprecate) → to be removed

-


registerTableSink

(Deprecate) → to be removed

-


registerTableSourceAndSink

createTemporaryTable

CREATE TEMPORARY TABLE

We should not support CREATE TEMPORARY TABLE AS SELECT

 = CREATE TEMPORARY VIEW

...

Call to add

SQL equivalent

Comment

createView(Table)

CREATE VIEW

We need to make `QueryOperation`s string serializable.

createTable(TableDescriptor)

CREATE TABLE

We need to rework TableDescriptor. We can temporary temporarily use ConnectTableDescriptor#connect

createFunction

CREATE FUNCTION

We need serializable function representation

...

createTemporaryView("temp", ...) → registers function with an identifier `current_cat`.`current_db`.`temp`

The same logic applies for looking up objects:
tEnv.scanfrom("cat.db.temp") → scans a view/table with an identifier `cat`.`db`.`temp`
tEnv.scanfrom("db.temp") → scans a view/table with an identifier `current_cat`.`db`.`temp`
tEnv.scanfrom("temp") → scans a view/table with an identifier `current_cat`.`current_db`.`temp`

...

createTemporaryFunction("temp", new Function().class) → registers function with an identifier `temp`

...

createTemporaryFunction("cat.db.temp", new Function(.class)) → registers function with an identifier `cat`.`db`.`temp`
createTemporaryFunction("db.temp", new Function().class) → registers function with an identifier `current_cat`.`db`.`temp`

Temporary objects should be stored in memory in a CatalogManager/FunctionCatalog

...

  1. 1-part path
    1. no other system has such semantics, all systems assign temporary tables & views to some schema (either with the same rules as regular objects or special temporary schema)
  2. Require special names for temporary objects, e.g. (#name as in SQL Server, or PTT_nam as in ORACLE)
  3. Register temporary objects in a special DB (as in SQL Server, Oracle, Postgres)
  4. Always assign temporary functions to some namespace (see FLIP-57).

Compatibility, Deprecation, and Migration Plan

  • Methods of TableEnvironment to be deprecated:
    • void registerTable(String name, Table table)
    • void registerTableSource(String name, TableSource<?> tableSource);
    • void registerTableSink(String name, TableSink<?> configuredSink);
    • Table scan(String... tablePath)
    • void registerFunction(String name, ScalarFunction function)
    • <T> void registerFunction(String name, TableFunction<T> tableFunction)
    • <T, ACC> void registerFunction(String name, AggregateFunction<T, ACC> aggregateFunction)
    • <T, ACC> void registerFunction(String name, TableAggregateFunction<T, ACC> tableAggregateFunction)
    • <T> void registerDataStream(String name, DataStream<T> dataStream)
    • <T> void registerDataStream(String name, DataStream<T> dataStream, String fields)
    • <T> void registerDataSet(String name, DataStream<T> dataStream)
    • <T> void registerDataSet(String name, DataStream<T> dataStream, String fields)
  • Methods of ConnectTableDescriptor to be deprecated:
    • public void registerTableSource(String name) 
    • public void registerTableSink(String name) 
    • public void registerTableSourceAndSink(String name)
  • Methods of TableEnvironment to be dropped:
    • void insertInto(Table table, StreamQueryConfig queryConfig, String sinkPath, String... sinkPathContinued)
    • void insertInto(Table table, BatchQueryConfig queryConfig, String sinkPath, String... sinkPathContinued)
    • void insertInto(Table table, String sinkPath, String... sinkPathContinued)
  • Methods of Table to be dropped
    • void insertInto(QueryConfig conf, String tablePath, String... tablePathContinued)
    • void insertInto(String tablePath, String... tablePathContinued)

Implementation plan

The implementation of changes described for functions of this FLIP has to be postponed after type inference is exposed for UserDefinedFunctions.

References:

How other systems handle temporary objects:

...

Postgres allows overriding permanent tables with temporary objects, but does not allow arbitrary schemas. Objects identifiers consist of only a single object name.

https://www.postgresql.org/docs/9.3/sql-createtable.html

Compatibility, Deprecation, and Migration Plan

  • We deprecate all methods mentioned above and remove them in the next release after the one when we deprecated them

Test Plan

Describe in a few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.