Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Motivation

Now many watermark-related features such as the watermark alignment have been implemented on the datastream API, and it is very convenient and flexible to configure and use these features through the datastream API. However, there is currently no way to use these features through SQL.

This FLIP proposes to enhance the availability of watermark options of SQL.

Proposed Change

Current capabilities of watermark in SQL layer

The event time attribute is defined using a WATERMARK statement in CREATE table DDL. A watermark statement defines a watermark generation expression on an existing event time field, which marks the event time field as the event time attribute. Some examples:

...

Note that although the syntax to use watermark in SQL is the same, the location of generating watermark may be different. The watermark of the source that implements the `SupportsWatermarkPushDown` interface is generated in the source operator, while the watermark of the source that does not implement the `SupportsWatermarkPushDown` interface is generated in a downstream operator named 'WatermarkAssigner'. If the watermark is generated in the downstream 'WatermarkAssigner' operator, many watermark-related features, such as watermark alignment, will can not be implemented. So the features that this flip intends to support are only for those sources that implement the `SupportsWatermarkPushDown` interface.

Watermark-related features

...

intends to support

In SQL layer, the watermark is closely related to each source table, so we plan to use the table-scan predicate's hint named 'WATERMARK_PARAMS' to extend these features.

...

extend these features in the dynamic table options and 'OPTIONS' hint.  If the user has configured these options both in the dynamic table options and in the 'OPTIONS' hint, then the options in the 'OPTIONS' hint are preferred. If the user uses 'OPTIONS' hint for the same source table in multiple places, the first hint will be used.

1. Configurable watermark emit strategy

On datastream API, we can decide whether to emit a watermark periodically or emit a watermark for each event by code logic from the implementation of the WatermarkGenerator interface:

...

For SQL, the default watermark emit-strategy is 'ON_PERIODIC', which can be set manually via hint :

Code Block
languagesql
-- configure in table options
CREATE TABLE user_actions (
  ...
  user_action_time TIMESTAMP(3),
  WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND
) WITH (
  'watermark.emit.strategy'='ON_PERIODIC',
  ...
);

-- use 'OPTIONS' hint
select ... from source_table /*+ WATERMARK_PARAMSOPTIONS('watermark.emit-.strategy'='ON_PERIODIC') */

...

If the user wants to configure the 'ON_EVENT' strategy, he/she can use hint like this:

Code Block
languagesql
-- configure in table options
CREATE TABLE user_actions (
  ...
  user_action_time TIMESTAMP(3),
  WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND
) WITH (
  'watermark.emit.strategy'='ON_EVENT',
  'watermark.emit.gap.on-event'='10000',
  ...
);

-- use 'OPTIONS' hint
select ... from source_table /*+ WATERMARK_PARAMSOPTIONS('watermark.emit-.strategy'='ON_EVENT', 'watermark.emit-.gap-.on-event'='100010000') */


Note that the option ‘emit-gap-'watermark.emit.gap.on-event’ event' which is used to configure how many events to emit a watermark only works for ‘ON_EVENT’ strategy.For the 'ON_EVENT' strategy,   option ‘emit-gap-on-event’ can configure how many events to emit a watermark This option is not required, the default value is 1. We will also add a global parameter 'table.exec.watermark-emit.gap' to achieve the same goal, which will be valid for each source and will ease the user's configuration to some extent.

2.Dealing with idle sources

On datastream API, We can configure idle-timeout  to handle idle sources in the following way:

...

However, in SQL layer, we can configure a global idle-timeout value through the parameter 'table.exec.source.idle-timeout', which means the multiple sources will share the value, we would like to configure 'idle-timeout' for each source separately, and it is expected to be configured in the following way:

Code Block
languagesql
-- configure in table options
CREATE TABLE user_actions (
  ...
  user_action_time TIMESTAMP(3),
  WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND
) WITH (
  'watermark.idle-timeout'='1min',
  ...
);

-- use 'OPTIONS' hint
select ... from source_table /*+ WATERMARK_PARAMSOPTIONS('watermark.idle-timeout'='1min') */

3.Watermark alignment

On datastream API,We can use watermark alignment feature in the following way :

...

However,the watermark alignment function is not currently supported in SQL layer. We hope the flip can support it. It is expected to be configured in the following way:

Code Block
languagesql
select -- configure in table options
CREATE TABLE user_actions (
... from source_table /*+ WATERMARK_PARAMS('align-group'='group1', 'align-
user_action_time TIMESTAMP(3),
  WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND
) WITH (
'watermark.alignment.group'='alignment-group-1',
'watermark.alignment.max-drift'='5s1min', 'align-
'watermark.alignment.update-interval'='1s',
...
) */

All the options described above can be configured in the following way:

Code Block
languagesql
;

-- use 'OPTIONS' hint
select ... from source_table /*+ WATERMARK_PARAMSOPTIONS('emit-strategywatermark.alignment.group'='ON_EVENT', 'emit-gap-on-event'='1000', 'idle-timeout'='1min', 'align-group'='group1', 'align-alignment-group-1', 'watermark.alignment.max-drift'='5s1min', 'align-watermark.alignment.update-interval'='1s') */

Of course, users can choose some of the items to configure according to their needsthe option 'watermark.alignment.update-interval' is not required. the default value is 1s.


Migration Plan and Compatibility

This feature is biased towards adding more support in for SQL layer, so there are no compatibility-related issues.

If some of the features are not supported, such as the watermark alignment feature on Kinesis connector, it will behave as FLIP-182 [2] and FLIP-217 [3] designed.

 

Rejected Alternatives

Adding watermark related options in the SQL DDL of watermark column

This idea is like extending FLIP-66[1]. However, since we already have many options for watermark related features, this would make the DDL complex and lengthy.

Adding watermark related options

...

with a new table-scan hint named 'WATERMARK_PARAMS'

We should be cautious about adding SQL syntax, WATERMARK_PARAMS is also SQL syntax to some extentWatermark related options should be treated as a general feature for reading from message queue or even files, these options shall not be part of the connector options.


Reference

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+Time+Attribute+in+SQL+DDL

...