Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state: "Under Discussion"

Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Primary-keys-in-Table-API-td35138.html

JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

...

Page properties


Discussion thread
Vote thread
JIRA

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-14977

Release1.11


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Code Block
languagesql
CREATE TABLE [catalog_name.][db_name.]table_name
  [(col_name1 col_type1 [IN_LINE_CONSTRAINT] [COMMENT col_comment1], 
    ...,
   [OUT_OF_LINE_CONSTRAINT])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name1, col_name2, ...)]

[OUT_OF_LINE_CONSTRAINT]:=
  [CONSTRAINT constraint_name] ((PRIMARY KEY | UNIQUE) (column, ...)) [[NOT] ENFORCED]

[IN_LINE_CONSTRAINT]:=
  [CONSTRAINT constraint_name] (PRIMARY KEY | UNIQUE) [[NOT] ENFORCED]

-- possible extension, not part of the FLIP, as we do not have alter statements
ALTER TABLE [catalog_name.][db_name.]table_name
  ADD (OUT_OF_LINE_CONSTRAINT) |
  DROP CONSTRAINT constraint_name

...

In a CREATE TABLE statement, creating a primary key constraint will alter the columns nullability according to

If the <unique specification> specifies PRIMARY KEY, then for each <column name> in the explicit or implicit <unique column list> for which NOT NULL is not specified, NOT NULL is implicit in the <column definition>.

In alter statements if a users says to create a primary key on a nullable column the operation should fail.

DDL default value

If a user does not provide the enforcement flag the ENFORCED will be set implicitly. This mode will be unsupported in the initial version and it will throw an exception.

Usage of primary key information for sources and sinks

This FLIP does not discuss runtime handling of the primary key information by sources and sinks. This additional information will not change the way how upsert sources and sinks work (We do not support UPSERT sources yet. For sinks we will stick to the current validation implementation). Usage of the primary key for upsert sources and sinks shall be discussed in a separate FLIP. This will not alter any UpsertSink implementations. This is merely a metadata information for optimization. We might use it in the future for query validation, e.g. validate that we perform upserts on the whole key, or use autogenerated values for missing fields etc.

...

This change introduces a new feature that does not implies any compatibility concerns.

NOTE:

For the time being insert/upsert into tables with primary key defined will not be supported. The support should be discussed in a separate FLIP.

Implementation Plan

  1. Add the primary key information to CatalogBaseTable
  2. Use the primary key information in optimizer
  3. Modify HiveCatalog connector to provide the primary key information
  4. Introduce DDL (side note: this is not aimed for 1.10)

Test Plan

We want to make sure, we can leverage the primary key information stored in Hive. This might require changes to the Hive catalog connector in regards to the produced TableSchema.

...