Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Next, we will propose the operator implementation for the mapPartition API . 

In our implementationIn our implementation, the operator will execute the MapPartitionFunction while receiving records, rather than waiting for the entire window of records to be collected. To achieve this, we add a seperate UDFExecutionThread inside the operator.

...

The TaskMainThread will invoke ReduceFunction#reduce for each record in the window and only send the final result to output at the end of inputs.


Rejected Alternatives

1. In the implementation of mapPartition API, make operator do not cache records. The UDFExecutionThread must execute synchronously with TaskMainThread to get the next record and process it in MapPartitionFunction. We choose not to use this approach because it will result in inefficient execution due to frequent thread context switching between TaskMainThread and UDFExecutionThread. Caching records enables concurrent execution of the two threads and reduces the frequency of context switching Caching records enables concurrent execution of the two threads and reduces the frequency of context switching.

Compatibility, Deprecation, and Migration Plan

...