Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

3. We add four APIs to PartitionWindowedStream,  including mapPartition, sortPartition, aggregate and reduce.

Proposed Changes

...

Support full window processing on

...

DataStream

If DataStream supports arbitrary window types for processing data on each subtask, it would not only support full window but also other window types such as We propose to only support full window processing on DataStream. It's difficult to support arbitrary types of windows on DataStream, including count windows, sliding windows, and session windows. However, we have chosen to support only full window processing on DataStream. The main reason The main reason is that the DataStream is non-keyed and does not support keyed statebackend and keyed raw state. This issue results in two conflicts on the usage of various windows:

1. The storage of window state relies on the InternalKvState provided by KeyedStateBackend. The storage of timers in time service also relies on keyed raw state.

2. The recovery of window state and timers relies on the assumption that every record has a key. The window state and timers will be stored and distinguished by the key. In fault tolerance and rescaling scenarios, each subtask must be assigned a key range to recover the window state and timers correctly.

Furthermore, based on community feedbacks, there is currently no demand for arbitrary window processing on DataStream. However, it is worth noting that the DataSet API already offers full window processing capabilities. Integrating this existing capability into DataStream would enhance its functionality and better meet the needs of users.

Therefore, we propose to only support only full window processing on DataStream. This feature is designed to work only when RuntimeExecutionMode=BATCH and does not support checkpoint. If user specifies RuntimeExecutionMode=STREAMING, the job will be failed to submit. Without the requirement for checkpoint, the underlying implementation can no longer rely on state and avoid the aforementioned conflict issues.

The definition of PartitionWindowedStream

...