Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state: "Under Discussion",

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

As part of FLIP-30 a Catalog API was introduced that enables storing table meta objects permanently. At the same time the majority of current APIs create temporary objects that cannot be serialized. This FLIP aims to clarify the creation of meta objects (tables, views, functions) in a unified way.

...

We should choose one approach and unify it across all APIs.

Public Interfaces

registerTable & registerDataStream

The naming of Table objects is actually quite misleading. The Table object represents a relational query, which is actually a view rather than a Table. The difference between a view and a table is primarily that a Table is a physical storage of data. Whereas view is a virtual table on top of Tables that does not materialize data. Thus the flink org.apache.flink.table.api.Table object is actually a SQL View. The same applies to a DataStream, which is also a way to extract data from persistent storage and apply transformations on top of it.

...

void createTemporaryView(String path, Table view);
void createTemporaryView(String path, DataStream view);

registerTableSink & registerTableSource

We suggest to drop those methods entirely as they are misleading what they actually do. TableSource & Sinks long-term are meant to support the physical representation of the data without the logical part as e.g. computed columns (watermarks etc.). Those will be part of the CatalogTable abstraction.

...

This should be replaced with the properties approach (DDL, descriptor).

registerScalarFunction, registerAggregateFunction & registerTableFunction

As a result of the rework of the type system in UDFs, we will be able to merge the three methods into a single one.

...

void createTemporaryFunction(String path, UserDefinedFunction function);

ConnectTableDescriptor#registerTableSink,registerTableSource,registerTableSourceAndSink

The table descriptor describes properties of an external system, the physical data format and logical type of the data. Therefore it represents a Table concept. For queries that do not want to modify metastore permanently it makes sense to introduce a temporary table concept.

...

NOTE: We should not support CREATE TEMPORARY TABLE … AS SELECT syntax. As mentioned above flink does not own the data. Therefore this statement should not be supported in Flink. In Flink’s statement, such a query can be expressed with CREATE TEMPORARY VIEW.

Dropping temporary objects

The temporary objects can shadow permanent objects. Therefore it is vital to enable dropping them to switch from temporary (usually used for experimentations) to permanent objects. We suggest to introduce a separate methods for temporary objects to make the distinction really clear which objects are dropped. The dropTemporary* methods would remove only the temporary objects. They would not take permanent objects into consideration. The same should apply for the regular drop methods. They should only apply to permanent tables, but should throw an exception if a temporary object with same identifier exists. The methods would return true if an object existed under given path and was removed.

...

boolean dropTemporaryView(String path);
boolean dropTemporaryView(String path);
boolean dropTemporaryFunction(String path);
boolean dropTemporaryFunction(String path);

Summary:

Methods of TableEnvironment

Current call

Replacement

Comment

registerTable

createTemporaryView

For the non temporary part we need to make `QueryOperation` string serializable.

registerTableSink

(Deprecate)


registerTableSource

(Deprecate)


registerDataStream

createTemporaryView


registerScalarFunction/

registerAggergateFunction/

registerTableFunction

createTemporaryFunction

We can unify the 3 methods into one once we rework type inference for UDFs

New suggested methods:

Current call

Comment

dropTemporaryTable


dropTemporaryView


dropTemporaryFunction



Methods of ConnectTableDescriptor

Current call

Replacement

SQL equivalent

Comment

registerTableSource

(Deprecate)

-


registerTableSink

(Deprecate)

-


registerTableSourceAndSink

createTemporaryTable

CREATE TEMPORARY TABLE

We should not support CREATE TEMPORARY TABLE AS SELECT

 = CREATE TEMPORARY VIEW

Persistent API - not part of the FLIP

Implementation, nor design of those API calls is not part of the FLIP. It is just to show that the permanent API is a separate concept that requires further work.

Call to add

SQL equivalent

Comment




createView(Table)

CREATE VIEW

We need to make `QueryOperation`s string serializable.




createTable(TableDescriptor)

CREATE TABLE

We need to rework TableDescriptor. We can temporary use ConnectTableDescriptor#connect




createFunction

CREATE FUNCTION

We need serializable function representation





Referencing objects

I suggest to change the way we address objects in the API to unify it across SQL/Table API & different objects. We should always specify path as a single string and parse it into a catalog/database/object-name subparts.

Affected APIs

We should deprecate:

  • TableEnvironment
    • Table scan(String... tablePath);
    • void insertInto(Table table, String sinkPath, String... sinkPathContinued);
  • Table
    • void insertInto(QueryConfig conf, String tablePath, String... tablePathContinued);
    • void insertInto(String tablePath, String... tablePathContinued);

...

  • TableEnvironment
    • Table from(String tablePath)
    • void insertInto(String sinkPath, Table table);
  • Table
    • void insertInto(String sinkPath)
      • we need to immediately drop the “void insertInto(String tablePath, String... tablePathContinued);” for this to work. Otherwise this call would be ambiguous:

        Table t = …
        t.insertInto(“db.sink”)


Parsing logic

Parsing logic should follow the SQL standard logic for identifiers

  • Identifier should be 1-3 part identifier
  • Parts should be delimited with a . dot
  • Users can escape parts of identifier with ` backtick
  • Users can escape backtick by duplicating it

Proposed Changes   

Assumption:

All objects are identified with 3 part identifiers (catalog name, database name, object name).

...

private final Map<ObjectIdentifier, CatalogView> temporaryViews;
private final Map<ObjectIdentifier, CatalogTable> temporaryTables;
private final Map<ObjectIdentifier, CatalogFunction> temporaryFunctions;

References:

How other systems handle temporary objects:

MySQL:

MySQL allows creating temporary tables in any schema (even if the target database does not exist). Those temporary tables take precedence over permanent tables with the same name (and schema). Therefore user cannot access a permanent table unless the temporary table is dropped.

...

http://www.mysqltutorial.org/mysql-temporary-table/

Hive:

Hive implements similar behavior to MySQL. The difference is that the database must exist. Hive also does not add the DROP TEMPORARY TABLE syntax.

...

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-TemporaryTables

SQL Server:

SQL server reserves a special schema for temporary tables (dbo). It also forces user to prefix table names with ‘#’ character. This needed to differentiate if the table should be temporary or permanent. Therefore, it is not possible to override a permanent table.

...

https://docs.microsoft.com/de-de/sql/relational-databases/databases/tempdb-database?view=sql-server-2017

Oracle:

Oracle implements similar behavior to SQL Server.

https://oracle-base.com/articles/18c/private-temporary-tables-18c

Postgres:

Postgres allows overriding permanent tables with temporary objects, but does not allow arbitrary schemas. Objects identifiers consist of only a single object name.

https://www.postgresql.org/docs/9.3/sql-createtable.html

Compatibility, Deprecation, and Migration Plan

  • We deprecate all methods mentioned above and remove them in the next release after the one when we deprecated them

Test Plan

Describe in a few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

...