Status
Current state: "Under Discussion",
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
As part of FLIP-30 a Catalog API was introduced that enables storing table meta objects permanently. At the same time the majority of current APIs create temporary objects that cannot be serialized. This FLIP aims to clarify the creation of meta objects (tables, views, functions) in a unified way.
...
We should choose one approach and unify it across all APIs.
Public Interfaces
registerTable & registerDataStream
The naming of Table objects is actually quite misleading. The Table object represents a relational query, which is actually a view rather than a Table. The difference between a view and a table is primarily that a Table is a physical storage of data. Whereas view is a virtual table on top of Tables that does not materialize data. Thus the flink org.apache.flink.table.api.Table object is actually a SQL View. The same applies to a DataStream, which is also a way to extract data from persistent storage and apply transformations on top of it.
...
void createTemporaryView(String path, Table view);
void createTemporaryView(String path, DataStream view);
registerTableSink & registerTableSource
We suggest to drop those methods entirely as they are misleading what they actually do. TableSource & Sinks long-term are meant to support the physical representation of the data without the logical part as e.g. computed columns (watermarks etc.). Those will be part of the CatalogTable abstraction.
...
This should be replaced with the properties approach (DDL, descriptor).
registerScalarFunction, registerAggregateFunction & registerTableFunction
As a result of the rework of the type system in UDFs, we will be able to merge the three methods into a single one.
...
void createTemporaryFunction(String path, UserDefinedFunction function);
ConnectTableDescriptor#registerTableSink,registerTableSource,registerTableSourceAndSink
The table descriptor describes properties of an external system, the physical data format and logical type of the data. Therefore it represents a Table concept. For queries that do not want to modify metastore permanently it makes sense to introduce a temporary table concept.
...
NOTE: We should not support CREATE TEMPORARY TABLE … AS SELECT syntax. As mentioned above flink does not own the data. Therefore this statement should not be supported in Flink. In Flink’s statement, such a query can be expressed with CREATE TEMPORARY VIEW.
Dropping temporary objects
The temporary objects can shadow permanent objects. Therefore it is vital to enable dropping them to switch from temporary (usually used for experimentations) to permanent objects. We suggest to introduce a separate methods for temporary objects to make the distinction really clear which objects are dropped. The dropTemporary* methods would remove only the temporary objects. They would not take permanent objects into consideration. The same should apply for the regular drop methods. They should only apply to permanent tables, but should throw an exception if a temporary object with same identifier exists. The methods would return true if an object existed under given path and was removed.
...
boolean dropTemporaryView(String path);
boolean dropTemporaryView(String path);
boolean dropTemporaryFunction(String path);
boolean dropTemporaryFunction(String path);
Summary:
Methods of TableEnvironment
Current call | Replacement | Comment |
registerTable | createTemporaryView | For the non temporary part we need to make `QueryOperation` string serializable. |
registerTableSink | (Deprecate) | |
registerTableSource | (Deprecate) | |
registerDataStream | createTemporaryView | |
registerScalarFunction/ registerAggergateFunction/ registerTableFunction | createTemporaryFunction | We can unify the 3 methods into one once we rework type inference for UDFs |
New suggested methods:
Current call | Comment |
dropTemporaryTable | |
dropTemporaryView | |
dropTemporaryFunction |
Methods of ConnectTableDescriptor
Current call | Replacement | SQL equivalent | Comment |
registerTableSource | (Deprecate) | - | |
registerTableSink | (Deprecate) | - | |
registerTableSourceAndSink | createTemporaryTable | CREATE TEMPORARY TABLE | We should not support CREATE TEMPORARY TABLE AS SELECT = CREATE TEMPORARY VIEW |
Persistent API - not part of the FLIP
Implementation, nor design of those API calls is not part of the FLIP. It is just to show that the permanent API is a separate concept that requires further work.
Call to add | SQL equivalent | Comment | |||
createView(Table) | CREATE VIEW | We need to make `QueryOperation`s string serializable. | |||
createTable(TableDescriptor) | CREATE TABLE | We need to rework TableDescriptor. We can temporary use ConnectTableDescriptor#connect | |||
createFunction | CREATE FUNCTION | We need serializable function representation |
Referencing objects
I suggest to change the way we address objects in the API to unify it across SQL/Table API & different objects. We should always specify path as a single string and parse it into a catalog/database/object-name subparts.
Affected APIs
We should deprecate:
- TableEnvironment
- Table scan(String... tablePath);
- void insertInto(Table table, String sinkPath, String... sinkPathContinued);
- Table
- void insertInto(QueryConfig conf, String tablePath, String... tablePathContinued);
- void insertInto(String tablePath, String... tablePathContinued);
...
- TableEnvironment
- Table from(String tablePath)
- void insertInto(String sinkPath, Table table);
- Table
- void insertInto(String sinkPath) -
- we need to immediately drop the “void insertInto(String tablePath, String... tablePathContinued);” for this to work. Otherwise this call would be ambiguous:
Table t = …
t.insertInto(“db.sink”)
Parsing logic
Parsing logic should follow the SQL standard logic for identifiers
- Identifier should be 1-3 part identifier
- Parts should be delimited with a . dot
- Users can escape parts of identifier with ` backtick
- Users can escape backtick by duplicating it
Proposed Changes
Assumption:
All objects are identified with 3 part identifiers (catalog name, database name, object name).
...
private final Map<ObjectIdentifier, CatalogView> temporaryViews;
private final Map<ObjectIdentifier, CatalogTable> temporaryTables;
private final Map<ObjectIdentifier, CatalogFunction> temporaryFunctions;
References:
How other systems handle temporary objects:
MySQL:
MySQL allows creating temporary tables in any schema (even if the target database does not exist). Those temporary tables take precedence over permanent tables with the same name (and schema). Therefore user cannot access a permanent table unless the temporary table is dropped.
...
http://www.mysqltutorial.org/mysql-temporary-table/
Hive:
Hive implements similar behavior to MySQL. The difference is that the database must exist. Hive also does not add the DROP TEMPORARY TABLE syntax.
...
SQL Server:
SQL server reserves a special schema for temporary tables (dbo). It also forces user to prefix table names with ‘#’ character. This needed to differentiate if the table should be temporary or permanent. Therefore, it is not possible to override a permanent table.
...
Oracle:
Oracle implements similar behavior to SQL Server.
https://oracle-base.com/articles/18c/private-temporary-tables-18c
Postgres:
Postgres allows overriding permanent tables with temporary objects, but does not allow arbitrary schemas. Objects identifiers consist of only a single object name.
https://www.postgresql.org/docs/9.3/sql-createtable.html
Compatibility, Deprecation, and Migration Plan
- We deprecate all methods mentioned above and remove them in the next release after the one when we deprecated them
Test Plan
Describe in a few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?
Rejected Alternatives
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.
...