Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state: Under Discussion

...

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently, Flink provides the highly-available setup in an "all or nothing" manner. In highly-available setups, Flink offers two mechanisms: leader election/retrieval services for JobManager and persistent services for job metadata. The relevant interfaces for these two mechanisms are defined in the HighAvailabilityServices interface. At runtime, Flink will construct different implementations of HighAvailabilityServices based on user configuration, e.g. KubernetesLeaderElectionHaServices and ZooKeeperLeaderElectionHaServices. This means that these two mechanisms can only be enabled or disabled together.
However, in OLAP scenarios, we only need the leader election/retrieval services for components in JobManager. In our production environment, users submit a lot of short queries through the SQL Gateway. These jobs are typically completed within a few seconds. When an error occurs during job execution, users simply need to resubmit the job. Therefore, there is no need to restart the job from a JobManager failure, persist its state for recovery, or perform leader election for a specific job. On the contrary, the persistence of job states can lead to a decrease in the cluster's throughput for short query, which has been demonstrated by the HighAvailabilityServiceBenchmark(258 qps without HA v.s. 69 qps with ZK HA). At the same time, in this scenario, we consider the Flink session cluster as a service. To ensure the SLA (Service Level Agreement), we utilize JobManager's failover mechanism to minimize service downtime. Therefore, we need to enable the leader election for the components in the JobManager process, e.g. ResourceManager and Dispatcher.
In this FLIP, we propose to split the HighAvailabilityServices into LeaderServices and PersistentServices and allowing users to independently adjust the high availability strategies related to jobs through configuration.

Public Interfaces

  • Introduce the high-availability.enable-job-recovery to control the implementation of leader services and persistent services for JobMaster. This config option should only be valid in session mode and true by default.

Note: I don't mention HighAvailabilityServices here because it is not labeled as public, even though we do expose configurations to allow users to configure their own implementations.

Proposed Changes

We proposed separating the HighAvailabilityServices into LeaderServices and PersistentServices, and introducing high-availability.enable-job-recovery to control the behavior related to job recovery when HA enabled.

Refactoring of HighAvailabilityServices


The following class diagram illustrates the relationship between the classes and interfaces related to the HighAvailabilityServices:

...

Please note that this is pure refactoring and should not affect the functionality of current HA mechanism.

HighAvailability for OLAP Scenarios

As mentioned above, in OLAP scenarios, we only require the leader election services for the Dispatcher / ResourceManager and RestEndpoint in the JobManager process. Leader election services and persistent services are redundant for jobs and may impact cluster performance.
To generate HA services suitable for OLAP scenarios, we introduce the high-availability.enable-job-recovery parameter. When users enable HA with Kubernetes or ZooKeeper and set this option to false, we will select the combination of DefaultLeaderServices and EmbeddedPersistentServices. Additionally, we will set the JobMaster's LeaderElectionService and LeaderRetrieverService to the Standalone version.

You can see the POC in https://github.com/KarmaGYZ/flink/tree/FLIP-403 .

Compatibility, Deprecation, and Migration Plan

As mentioned before, the functionality of HA services doesn’t change. Besides, this should improve readability and testability.

Test Plan

The existing tests need to be refactored according to the new hierarchy.

Rejected Alternatives