Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Probe Side: The left side of the stream in a temporal join, sometimes referred to as the Fact Table. Usually, the data on the Probe Side doesn't need to be retained after processing.

Build Side: The right side of the stream in a temporal join, which can be a versioned table. It is also known as the Dimension Table. Typically, the Build Side has the latest data for each key.

Watermark: Inform operators that no elements with a timestamp older than or equal to the watermark timestamp should arrive at the operator.

...

  • When receiving a watermark with `useProcessingTime` set to false, the behavior remains the same as before.

  • When receiving a watermark with `useProcessingTime` set to true, the operator starts the `ScheduledThreadPoolExecutorthat` ScheduledThreadPoolExecutor` that trigger the event time timers as the system time progresses.

...

Code Block
languagejava
titleWatermarkCharacteristic
/**
 * WatermarkCharacteristic is used by the source implementation to describe the characteristic of
 * the watermark base on the WatermarkStrategy.
 */
@PublicEvolving
public enum WatermarkCharacteristic {

    /**
     * This implies that only watermark with useProcessingTime set to false can be sent and no
     * watermark will be sent in case of processing time. This is the default for all the source
     * implementations.
     */
    UNDEFINED,

    /**
     * This implies that source will send watermark with useProcessingTime set to false in case of
     * event time and with useProcessingTime set to true in case of processing time is required.
     */
    ANY_WATERMARK,
}


4)
Update `Source` interface,introducing interface, introducing `getWatermarkCharacteristic` method

...

5) Update Source Implementations


KafkaSource

  • Update the KafkaSourceReaderto emit watermark with useProcessingTimeset to true at the beginning, if NoWatermarkGeneratorUpdate the `KafkaSourceReader` to emit watermark with `useProcessingTime` set to true at the beginning, if `NoWatermarkGenerator` is used. 

  • Update KafkaSource to overwrite the getWatermarkCharacteristicmethod to returns  to overwrite the `getWatermarkCharacteristic` method to returns `ANY_WATERMAKWATERMAK`.


MySqlSourceUpdate to the 

MySqlSourceUpdate to the `MySqlSource` is documented in the appendix.


Proposed Changes

...

WatermarkToDataOutput


`WatermarkToDataOutput` is responsible for sending the watermark from the `SourceReader` downstream and ensuring that the watermark timestamps sent are monotonically increasing.


Original Behavior:

  • When the `WatermarkToDataOutput` receive a watermark, it checks the timestamp of the watermark. It only send the watermark downstream if the timestamp of the watermark is greater than the timestamp of the most recently sent watermark.


Updated Behavior:

  • When the `WatermarkToDataOutput` receive a watermark, it checks the timestamp and `useProcessingTime` field of the watermark. It only send the watermark downstream if the timestamp of the watermark is greater than the timestamp of the most recently sent watermark or if the `useProcessingTime` field is set to true.

  • If the watermark to be sent has `useProcessingTime` set to true, and the current system time is lesser than the timestamp of the most recently sent watermark, an exception is thrown.

  • If a watermark with `useProcessingTime` set to true has been previously sent, and the watermark to be sent has `useProcessingTime` set to false, an exception is thrown.


The updated `WatermarkToDataOutput` ensures that when `useProcessingTime` is false, the watermark's timestamp is monotonically increasing. It also ensures that information with `useProcessingTime` set to true is sent downstream and that the `useProcessingTime` will not set back to false once it is set to true.

Compatibility, Deprecation, and Migration Plan

...