Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It will be more user friendly. In addition, the Hive dialect already has some support for CTAS.

Proposed Changes

Create First of all, make it clear, CTAS command create table must go through catalog, and in memory catalog is not support CTAS, must be a external catalog.

We can think that there are two types of catalogs in Flink, in-memory catalogs and external catalogs:

...

  1. Metadata is a copy of the metadata of the external system, and the user ensures that the entity exists in the external system and the metadata is consistency, otherwise, throw exception when running. CTAS need create table first, so it is hard to ensures that the entity exists in the external system and the metadata is consistency.
  2. Requires the The user needs to configure some information about the parameters of the external system through the with syntax, Flink can't get and Flink cannot obtain it through the in-memory catalogdirectory.

Such as kafka table, we need the user to tell us the address of the kafka server, the name of the topic, and the data serialization format, otherwise flink job will failed.

External catalog:

  1. Metadata directly refers to external systems, and there is no consistency problem. Create table also directly calls the external system, so it is naturally guaranteed that the entity exists in the external system.
  2. The with syntax parameter is optional, Flink can obtain the necessary parameters through the external catalog.

Such as hive table, we can obtain the table information required by the Flink engine through HiveCatalog. 


Both in-memory catalog and external catalog will support CTAS, if the CTAS command is executed in the in-memory catalog and the target store does not exist in the external system, the Flink job will fail, which is consistent with the current flink behavior.

Syntax

I suggest introducing a CTAS clause with a following syntax:

...

Do not do drop table operations in the framework, drop table is implemented in TableSink according to the needs of specific TableSink, because the operations of different sinks is different.

For example, in HiveTableSink, we need to delete the temporary directory and drop the metadata in the Metastore, but FileSystemTableSink only need to delete the temporary directory,

it is also possible that no operations is required.

Public API Changes

Table Environment


Support in Table API

The executeSql method will be reused

...