...
Discussion thread: https://lists.apache.org/thread/pt2b1f17x2l5rlvggwxs6m265lo4ly7p
JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX) Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key FLINK-25636
Released: 1.15
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
This FLIP only changes the default value of 4 config options related to blocking shuffle, including taskmanager.network.blocking-shuffle.compression.enabled, taskmanager.network.sort-shuffle.min-parallelism, taskmanager.memory.framework.off-heap.batch-shuffle.size, taskmanager.network.sort-shuffle.min-buffers. No new public interface will be added and pipeline streaming mode is not influenced.
Proposed Changes
Key | Current Value | Target Value | Reason |
---|---|---|---|
taskmanager.network.blocking-shuffle.compression.enabled | false | true | Usually, data compression can reduce both disk and network IO which is good for performance. At the same time, it can save storage space. |
taskmanager.network.sort-shuffle.min-parallelism | Integer.MAX_VALUE | 1 | For high parallelism batch jobs, sort-shuffle is better for both stability and performance. (We tested setting this value to 1, 128, 256, 512 and 1024 with TPC-DS and the result showed that 1 is the best one.) |
taskmanager.memory.framework.off-heap.batch-shuffle.size | 32m | 64m | Aside from the improvement in FLINK-24954, increasing this value can also help to solve the read buffer request issue. (Previously, when choosing the default value, both ‘32M' and '64M' are OK for tests and we chose the smaller one in a cautious way.) |
taskmanager.network.sort-shuffle.min-buffers | 64 | 512 | The current default value is quite modest and the performance can be influenced especially after we enable sort shuffle for large parallelism batch jobs by default. |
Compatibility, Deprecation, and Migration Plan
...