...
8) ExecutionOptions.BATCH_SHUFFLE_MODE
Benchmark results
...
Data
number of records and emit results after both inputs have ended.
...
With this PR, when DataStream#coGroup is invoked with EndOfStreamWindows as the window assigner, EOFAggregationOperator
will be applied to sort two inputs first, and then output the coGroup results.
Code Block | ||
---|---|---|
| ||
data1.coGroup(data2) .where(tuple -> tuple.f0) .equalTo(tuple -> tuple.f0) .window(EndOfStreamWindows.get()) .apply(new CustomCoGroupFunction()) .addSink(...); |
We run
...
Here are the benchmark results:
...
each benchmark in 5 times, here are the throughput benchmark results:
Data Count | Key Count | Current | BATCH | Optimized | Hybrid Shuffle |
---|---|---|---|---|---|
2e6 | 2e6 | 61 ± 1 (115%) | 502 ± 75 (947%) | 868 ± 74 (1637%) | 899 ± 82 (1696%) |
5e6 | 5e6 | 60 ± 1 (113%) | 528 ± 28 (996%) | 1176 ± 39 (2218%) | 1262 ± 50 (2381%) |
2e7 | 2e7 | 56 ± 0 (105%) | 564 ± 34 (1064%) | 1492 ± 28 (2815%) | 1692 ± 14 (3192%) |
5e7 | 5e7 | 53 ± 1 (100%) | 602 ± 16 (1135%) | 1508 ± 18 (2845%) | 1712 ± 77 (3230%) |
...
...
Compatibility, Deprecation, and Migration Plan
...