Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

However, full window processing is not supported directly by non-keyed DataStream. This means the non-keyed DataStream cannot collect all records of each subtask (these records have no keys) separately into a full window and process them at the end of inputs. DataSet API already supports processing and sorting all records within each subtask through mapPartition API and sortPartition API.  As DataSet API has been deprecated in Flink 1.18, it is necessary to enhance the non-keyed DataStream to support handling full window processing on individual subtasks.

In this FLIP, we propose two main enhancements. Firstly, we propose enabling non-keyed DataStream to directly transform into a PartitionWindowedStream. The PartitionWindowedStream represents collecting all records of each subtask separately into a full window. Secondly, we propose supporting four APIs on PartitionWindowedStream, including mapPartition, sortPartition, aggregate and reduce.

...

3. We add four APIs to PartitionWindowedStream,  including mapPartition, sortPartition, aggregate and reduce.

Proposed Changes

...

Only support full window processing on non-keyed DataStream

We propose to only support full window processing on non-keyed DataStream. There are challenges in supporting arbitrary types of windows, such as count windows, sliding windows, and session windows. These challenges arise from the fact that the DataStream is non-keyed and does not support keyed statebackend and keyed raw state. This issue results in two conflicts on the usage of various windows:

...

Furthermore, based on community feedbacks, there is currently no demand for we have observed no users suggesting a need for support of arbitrary window processing on the non-keyed DataStreamDataStreams.

Full window processing has unique characteristics and is primarily applicable to batch processing scenarios. As such, it can be designed to work only when RuntimeExecutionMode=BATCH and does not support checkpoint. When specifying RuntimeExecutionMode=STREAMING, the job will fail to submit. Without the requirement for checkpoint, the underlying implementation can no longer rely on state and avoid the aforementioned conflict issues.

Importantly, the DataSet API already offers full window processing capabilities. Integrating this existing capability of DataSet into non-keyed DataStream will enhance its functionality and better meet the needs of users. Therefore, we propose to only support full window processing on non-keyed DataStream.

Introduce the PartitionWindowedStream

...