...
Current state: Under Discussion
Discussion thread: here (<- link to https://mail-archiveslists.apache.org/thread.html/r11750db945277d944f408eaebbbdc9d595d587fcfb67b015c716404e%40%3Cdev.flink.apache.org%3Emod_mbox/flink-dev/)
JIRA:
Jira | ||||||
---|---|---|---|---|---|---|
|
Jira | ||||||
---|---|---|---|---|---|---|
|
...
Several new config options will be added to control the behavior of the sort-merge based blocking shuffle and by disable sort-merge based blocking shuffle by default, the default behavior of blocking shuffle stays unchanged.
Config Option | Description |
---|---|
taskmanager.network.sort-merge-blocking-shuffle.max-files-per-partition | The maximum number of files can be produced by each sort-merge blocking partition, files over this threshold will be merged. |
taskmanager.network.sort-merge-blocking-shuffle.buffers-per-partition | Number of network buffers required for each sort-merge blocking result partition. Larger value can reduce the number of shuffle files and bring better performance. |
taskmanager.network.sort-merge-blocking-shuffle.min-parallelism | For small parallelism, hash-based blocking shuffle will be used and for large parallelism, sort-merge based blocking shuffle will be used. |
A fixed number of network buffers per result partition makes the memory consumption decoupled with parallelism which is more friendly for large scale batch jobs.
...
The default behavior of Flink stays unchanged. Nothing need to do when migrating to new Flink version.
Appendix
Sort by Subpartition Index
...