Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

...

Page properties


Discussion thread

...

...

...

JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)

...


Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Currently, one Flink cluster can only use one shuffle service plugin configured by 'shuffle-service-factory.class'. This is not a problem for per-job cluster, but for session cluster, this is not flexible enough and cannot support use cases like selecting different shuffle service for different workloads (e.g. batch vs. streaming). As discussed in https://lists.apache.org/thread/k4owttq9q3cq4knoobrzc31bghf7vc0o, the community has considered to implement this feature when introducing the pluggable shuffle service abstraction, there are also some relevant followup issues:

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-19551
. We'd like to implement this feature in this FLIP.

Public Interfaces

To Implement this feature, we need to support job level shuffle service configuration. Currently, only cluster level shuffle service configuration is supported by configuring 'shuffle-service-factory.class'. If a user first starts a Flink cluster and then he/she tries to change the shuffle service by reconfiguring 'shuffle-service-factory.class', he/she will fail because per-job level configuration will not passed to runtime so does not take effect currently.

...

4. Share the same network memory pool (NetworkBufferPool) among different shuffle services. It is compatible with the Flink memory model which treats the network memory as one dimension of resource. In fact, we need a public NetworkMemoryProvider interface to expose the network resource to shuffle service plugin. We can leave this together with other user interface relevant work (see

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-12873
) as future work.


Here is a PoC implementation: https://github.com/wsry/flink/commit/bf6b56b1cd6591e7b3fad6fddd3f701a06c64868.

Compatibility, Deprecation, and Migration Plan

1. If not configured as default, the shuffle service must be implemented as Flink plugin and which can be loaded by Flink PluginManager. 

2. If a custom shuffle service plugin already creates the NetworkBufferPool (not a public interface) instance itself, it must switch to use the shared NetworkBufferPool.

...