Discussion thread | https://lists.apache.org/thread/d681bx4t935c30zl750gy6d41tfypbph |
---|---|
Vote thread |
https://lists.apache.org/thread/79thsvkfpgsqnktodj2901jp538js19j | |||||||||
JIRA |
| ||||||||
---|---|---|---|---|---|---|---|---|---|
Release | 1.18.0 |
Motivation
Now many watermark-related features such as the watermark alignment have been implemented on the datastream API, and it is very convenient and flexible to configure and use these features through the datastream API. However, there is currently no way to use these features through SQL.
...
Code Block | ||||
---|---|---|---|---|
| ||||
CREATE TABLE user_actions ( user_name STRING, `data` STRING, current_time as CURRENT_TIMESTAMPproctime(), WATERMARK FOR current_time AS current_time ) WITH ( ... ); |
...
In SQL layer, the watermark is closely related to each source table, so we plan to extend these features in the dynamic table options and 'OPTIONS' hint. If the user has configured these options both in the dynamic table options and in the 'OPTIONS' hint, then the options in the 'OPTIONS' hint are preferred. If the user uses 'OPTIONS' hint for the same source table in multiple places, the first hint will be used.
We provide a poc, you can check the poc[1] to look up the code details.
1. Configurable watermark emit strategy
...
For SQL, the default watermark emit-strategy is 'ON_PERIODICon-periodic', which can be set manually via table options and hint :
Code Block | ||
---|---|---|
| ||
-- configure in table options CREATE TABLE user_actions ( ... user_action_time TIMESTAMP(3), WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND ) WITH ( 'scan.watermark.emit.strategy'='ON_PERIODICon-periodic', ... ); -- use 'OPTIONS' hint select ... from source_table /*+ OPTIONS('scan.watermark.emit.strategy'='ON_PERIODICon-periodic') */ |
If the user wants to configure the 'ON_EVENTon-event' strategy, he/she can use table options or hint like this:
Code Block | ||
---|---|---|
| ||
-- configure in table options CREATE TABLE user_actions ( ... user_action_time TIMESTAMP(3), WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND ) WITH ( 'scan.watermark.emit.strategy'='ON_EVENT', 'watermark.emit.gap.on-event'='10000', ... ); -- use 'OPTIONS' hint select ... from source_table /*+ OPTIONS('scan.watermark.emit.strategy'='ON_EVENT', 'watermark.emit.gap.on-event'='10000') */ |
...
2.Dealing with idle sources
...
Code Block | ||
---|---|---|
| ||
-- configure in table options CREATE TABLE user_actions ( ... user_action_time TIMESTAMP(3), WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND ) WITH ( 'scan.watermark.idle-timeout'='1min', ... ); -- use 'OPTIONS' hint select ... from source_table /*+ OPTIONS('scan.watermark.idle-timeout'='1min') */ |
...
Code Block | ||
---|---|---|
| ||
-- configure in table options CREATE TABLE user_actions ( ... user_action_time TIMESTAMP(3), WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' SECOND ) WITH ( 'scan.watermark.alignment.group'='alignment-group-1', 'scan.watermark.alignment.max-drift'='1min', 'scan.watermark.alignment.update-interval'='1s', ... ); -- use 'OPTIONS' hint select ... from source_table /*+ OPTIONS('scan.watermark.alignment.group'='alignment-group-1', 'scan.watermark.alignment.max-drift'='1min', 'scan.watermark.alignment.update-interval'='1s') */ |
the option 'watermark.alignment.update-interval' is not required. the default value is 1s.
Note source connectors have to implement watermark alignment of source split in order to use the watermark alignment feature since 1.17 according flip-217 [2]. If source connector does not implement flip-217, the task will run with an error, user could set 'pipeline.watermark-alignment.allow-unaligned-source-splits'= 'true' to disable watermark alignment of source split, and watermark alignment will be working properly only when your number of splits equals to the parallelism of the source operator.
All the options described above :
Code Block | ||
---|---|---|
| ||
'scan.watermark.emit.strategy'='on-event',
'scan.watermark.idle-timeout'='1min',
'scan.watermark.alignment.group'='alignment-group-1',
'scan.watermark.alignment.max-drift'='1min',
'scan.watermark.alignment.update-interval'='1s' |
Of course, users can choose some of the items to configure according to their needs.
Public Interface
In order to implement the above features, it is expected to add a wrapper class for the watermark parameters, like this:
Code Block | ||
---|---|---|
| ||
public class WatermarkParams implements Serializable {
private static final long serialVersionUID = 1L;
private WatermarkEmitStrategy emitStrategy;
private String alignGroupName;
private Duration alignMaxDrift;
private Duration alignUpdateInterval;
private long sourceIdleTimeout;
} |
To describe the watermark emit strategy, we need to add an enumeration class:
Code Block | ||
---|---|---|
| ||
public enum WatermarkEmitStrategy {
ON_EVENT("on-event"),
ON_PERIODIC("on-periodic")
} |
The biggest change probably is that we need to modify the constructor of WatermarkPushDownSpec
to use the params wrapper class:
Code Block | ||
---|---|---|
| ||
@JsonCreator public WatermarkPushDownSpec( @JsonProperty(FIELD_NAME_WATERMARK_EXPR) RexNode watermarkExpr, @JsonProperty(FIELD_NAME_IDLE_TIMEOUT_MILLIS) long idleTimeoutMillis, @JsonProperty(FIELD_NAME_PRODUCED_TYPE) RowType producedType, @JsonProperty(FIELD_NAME_WATERMARK_PARAMS) WatermarkParams watermarkParams) { super(producedType); this.watermarkExpr = checkNotNull(watermarkExpr); this.idleTimeoutMillis = idleTimeoutMillis; this.watermarkParams = watermarkParams; } |
Same as GeneratedWatermarkGeneratorSupplier
:
Code Block | ||
---|---|---|
| ||
public GeneratedWatermarkGeneratorSupplier(
GeneratedWatermarkGenerator generatedWatermarkGenerator,
WatermarkParams watermarkParams) {
this.generatedWatermarkGenerator = generatedWatermarkGenerator;
this.watermarkParams = watermarkParams;
} |
Compiled Plan
After implementing the above features, The following contents will be added to the WatermarkPushDown module in the json plan:
Code Block |
---|
"watermarkParams":{
"emitStrategy" : "ON_PERIODIC",
"alignGroupName" : null,
"alignMaxDrift" : "PT0S",
"alignUpdateInterval" : "PT1S",
"sourceIdleTimeout" : -1
} |
But this doesn't cause compatibility issues, CompilePlan and ExecutePlan are backward compatible. We will add unit tests to validate the compatibility.
Migration Plan and Compatibility
...
If some of the features are not supported, such as the watermark alignment feature on Kinesis connector, it will behave as FLIP-182 [23] and FLIP-217 [32] designed.
Rejected Alternatives
...
This idea is like extending FLIP-66[14]. However, since we already have many options for watermark related features, this would make the DDL complex and lengthy.
...
We should be cautious about adding SQL syntax, WATERMARK_PARAMS is also SQL syntax to some extent.
Reference
[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+Time+Attribute+in+SQL+DDL
github.com/yuchengxin/flink/commits/yuankui/watermark_params
[2] FLIP-217: Support watermark alignment of source splits
[3] FLIP-182: Support watermark alignment of FLIP-27 Sources
[4] FLIP-66: Support Time Attribute in SQL DDL[3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-217%3A+Support+watermark+alignment+of+source+splits