Status
Current state: Under Discussion
Discussion thread: here (<- link to https://mail-archives.apache.org/mod_mbox/flink-dev/)
JIRA:
Released: <Flink Version>
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
The pluggable shuffle framework was Introduced to Flink by FLIP-31 and based on it we Implement our own remote shuffle service for Flink. During the implementation process, we found that the current interface incurs some limitations which can influence the usability of the
Public Interfaces
ShuffleMaster: the methods start, close, registerJob, unregisterJob are newly added, the signature of registerPartitionWithProducer is changed (adding a JobID). Other methods are kept unchanged.
JobShuffleContext: it is newly added interface which contains the job level context which can provide some basic capabilities and proxy to other components of the Flink cluster like JobMaster.
public interface JobShuffleContext {
/** Returns the corresponding {@link JobID}. */
JobID getJobID();
/**
* Stops tracking the target result partitions, which means these partitions will be removed and
* will be reproduced if used afterwards.
*/
CompletableFuture<?> stopTrackingPartitions(Collection<ResultPartitionID> partitionIDS);
}
ShuffleMasterContext:
ShuffleServiceFactory:
/**
* Job level shuffle context which can offer some job information like job ID and through it, the
* shuffle plugin can stop tracking the lost result partitions.
*/
public interface JobShuffleContext {
/** Returns the corresponding {@link JobID}. */
JobID getJobID();
/**
* Stops tracking the target result partitions, which means these partitions will be removed and
* will be reproduced if used afterwards.
*/
CompletableFuture<?> stopTrackingPartitions(Collection<ResultPartitionID> partitionIDS);
}
Proposed Changes
There are four main parts of the proposed changes:
Compatibility, Deprecation, and Migration Plan
This change to the ShuffleMaster#registerPartitionWithProducer and
Test Plan
The proposed changes will be tested by a real Flink shuffle plugin based on it with test jobs like TPC-DS benchmark suit.
Rejected Alternatives
No rejected alternatives.
Future Works
This FLIP only contains some minimal changes needed toward the final . More opti