...
Discussion thread: tba
JIRA:
Jira | ||||||
---|---|---|---|---|---|---|
|
Released: 1.15
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
Main limitation of the RJR is that we clear it once the job reaches a globally terminal state. Some of the issues that could be solved by introducing a component that persists job state after this job is finished and probably even after the Flink cluster is gone, which may lead to re-execution of the job by the failed-over dispatcher, because there is no persistent record of the successful job execution that would outlive the cluster.Another limitation is that are:
- Any standby Dispatcher would know about sucessfully finished jobs which do not need to be re-executed after a failover while the job was already terminated but the JobGraph was still around (
).Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key FLINK-11813 - The RJR does not provide access to the JobResult of a completed job
...
- . We have to return
...
- UNKNOWN as a result
...
- when we failover in application mode (
).Jira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key FLINK-21928 - Having access to a
JobResult
, after the job has completed, would also pave the road for supporting multi-stage jobs inApplicationMode
and highly available job drivers in general.
Public Interfaces
RunningJobsRegistry
will be replaced byJobResultStore
- k8s- and ZK-specific implementations of
RunningJobsRegistry
will be replaced by a file-based approach- Users that use a customized HA implementation might be affected by this change because the HAServices interface is going to be modified
- New configuration parameters are going to be introduced to make the file-based
JobResultStore
implementation configurable - REST API for the
JobResult
might be extended to also include the cleanup state of this Job
...