Page History

...

DESCRIBE TABLE EXTENDED without partition definition output above columns too except partition.

Configuration

Session Options

In every table environment, the TableConfig offers `TableEnvironment.getConfig` offers options for configuring the current session.

We put necessary configurations in the global session configuration to avoid the need for users to configure each individual table.

If users need to configure a table separately, users can also configure it through options.

table-storage.

Key	Default	Type	Description
table-storage.log.system	kafka	String	Log system. Now only Kafka in the MVP.
table-storage.log.kafka.bootstrap.servers	(none)	Map	Kafka brokers. eg: localhost:9092
table-storage.log.retention	(none)	Duration	It means how long changes log will be kept. The default value is from the log system cluster.
table-storage.log.scan	full	String	Specifies the scan startup mode for log consumer. full: Performs a snapshot on the table upon first startup, and continue to read the latest changes. (Using HybridSource, the switching between snapshot and changes is exactly-once consistency because we store the offset of the corresponding log to snapshot when writing data) latest: Start from the latest. from-timestamp: Start from user-supplied timestamp.
table-storage.log.pk.consistency	transactional	String	Specifies the log consistency mode for table with primary key. transactional: only the data after the checkpoint can be seen by readers, the latency depends on checkpoint interval eventual: Immediate data visibility, you may see some intermediate states, but eventually the right results will be produced, only works in table with primary key
table-storage.log.pk.changelog-mode	upsert	String	Specifies the log changelog mode for table with primary key. auto: upsert for table with primary key, all for table without primary key. upsert: the log system does not store the UPDATE_BEFORE changes, the log consumed job will automatically add the normalized node, relying on the state to generate the required update_before. all: the log system stores all changes including UPDATE_BEFORE
table-storage.log.pk.key-format	json	String	Specifies the key message format of log system with primary key.
table-storage.log.format	debezium-json	String	Specifies the message format of log system.
table-storage.file.root-path	(none)	String	Root file path.
table-storage.file.format	parquet	String	Format name for file.
table-storage.bucket	1	Integer	Bucket number for file and Partition number for Kafka.
storage.file.path	(none)	String	Root file path.
table-storage.file.format	parquet	String	Format name for file.
table-storage.bucket	1	Integer	Bucket number for file and Partition number for Kafka.
table-storage.change-tracking	true	Boolean	If users do not need to consume changes from the table, they can disable Change Tracking. This can reduce resource consumption.

If users need to configure a table separately, users can also configure it through options without "table-storage." prefix, for example:

Code Block

language	sql
title	SQL

CREATE TABLE T (...) WITH ('log.consistency'='eventual');

Table Options

In addition to session options that can be configured individually for each table by removing the prefix, there are also some options that can be configured individually only for tables, they are the options that affect reading and writing:

Key

Default

Type

Description

log.scan

full

String

Specifies the scan startup mode for log consumer.

full: Performs a snapshot on the table upon first startup, and continue to read the latest changes. (Using HybridSource, the switching between snapshot and changes is exactly-once consistency because we store the offset of the corresponding log to snapshot when writing data)
latest: Start from the latest.
from-timestamp: Start from user-supplied timestamp.

change-tracking

true

Boolean

If users do not need to consume changes from the table, they can disable Change Tracking. This can reduce resource consumption.

Bucket

The record is hashed into different buckets according to the primary key (if have) or the whole row (without primary key):

...

Declaring the primary key in the table definition
'log.pk.consistency' = 'eventual'
'log.pk.changelog-mode' = 'upsert' – this is the default mode for table with primary key

When using upsert mode, a normalized node is generated in downstream consuming job, which will generate update_before messages for old data based on the primary key, meaning that duplicate data will be corrected to an eventual consistent state.

...

If the user wants to see the all changes of this table or remove downstream normalized node, he/she can configure:

'log.pk.changelog-mode' = 'all'

This requires

'log.pk.consistency' = 'transactional'
The sink query produces changes with UPDATE_BEFORE, If not, we can:
- Throws unsupported exception in the MVP
- In future, we can automatically add the normalize node before sink to generate required UPDATE_BEFORE messages

...

Page tree

Versions Compared

Old Version 20

New Version 21

Key

Configuration

Session Options

Table Options

Bucket