Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Discussion thread: tba

JIRA:

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-11813

Released: 1.15

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Main limitation of the RJR is that we clear it once the job reaches a globally terminal state. Some of the issues that could be solved by introducing a component that persists job state after this job is finished and probably even after the Flink cluster is gone, which may lead to re-execution of the job by the failed-over dispatcher, because there is no persistent record of the successful job execution that would outlive the cluster.Another limitation is that are:

  • Any standby Dispatcher would know about sucessfully finished jobs which do not need to be re-executed after a failover while the job was already terminated but the JobGraph was still around (
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyFLINK-11813
    ).
  • The RJR does not provide access to the JobResult of a completed job

...

  • . We have to return

...

  • UNKNOWN as a result

...

  • when we failover in application mode (
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyFLINK-21928
    ).
  • Having access to a JobResult, after the job has completed, would also pave the road for supporting multi-stage jobs in ApplicationMode and highly available job drivers in general.

Public Interfaces

  • RunningJobsRegistry will be replaced by JobResultStore
  • k8s- and ZK-specific implementations of RunningJobsRegistry will be replaced by a file-based approach
    • Users that use a customized HA implementation might be affected by this change because the HAServices interface is going to be modified
  • New configuration parameters are going to be introduced to make the file-based JobResultStore implementation configurable
  • REST API for the JobResult might be extended to also include the cleanup state of this Job

...