Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Apache Curator is used in order to perform interactions with ZooKeeper in HA mode for Flink (link to the code). Current set up misses several configurations options, which could be useful in certain Flink deployments.
We want to ensure that related available options in Apache Curator are configurable for Flink users. Thus Flink users can have all mechanisms to allow Flink interacts with ZooKeeper.
Public Interfaces
There are some new configurations should be exposed for high-availability.zookeeper configuration.
Proposed option | Confgiration type | Motivation |
high-availability.zookeeper.client.authorization | Map<String, String> | Ability to fully utilise given set up of ZooKeeper for environment. For example: In certain cases ZooKeeper requires additional Authentication information. For example list of valid names for ensemble in order to prevent the accidental connecting to a wrong ensemble. |
high-availability.zookeeper.client.maxCloseWaitMs | Integer | Ability that would enable the user to adjust to different network speeds. |
high-availability.zookeeper.client.simulatedSessionExpirationPercent | Integer | Additional checking for Session expiration above what is provided by ZooKeeper. |
The rest of the options provided by Curator framework are considered as non-useful:
canBeReadOnly - allowing to read from the stale ZooKeeper could lead to the inconsistent state on the Flink side, e.g. two active JobManager
compressionProvider - since Flink doesn’t store a lot of information in Zookeeper there is no need to provide any compression
defaultData - could be useful for debugging purposes of Curator framework, but seems to be non-needed for Flink
dontUseContainerParents/ useContainerParentsIfAvailable - this sounds like a property that is useful for Flink's leader election cleanup. But I don't see extra value in exposing the property to the user.
namespace - This one is already in use (see high-availability.zookeeper.path.root)
runSafeService - That seems to be a feature that's Flink-specific and shouldn't be handled by the user.
schemaSet - seems that it shouldn’t be exposed to the end user
waitForShutdownTimeoutMs - considered as an internal Flink logic and shouldn’t be exposed.
Proposed Changes
We should incorporate the aforementioned options and translate configuration values into the corresponding Curator builder calls.
An issue arises due to a type mismatch between the Flink configuration parameter high-availability.zookeeper.client.authorization and the corresponding Curator method call. The Curator method anticipates an array of AuthInfo (see method javadoc) while the Flink configuration provides a different type. To address this, we can handle the conversion between String and byte[] by executing the getBytes() method.
Compatibility, Deprecation, and Migration Plan
N/A
Test Plan
Simple manual tests will do that given options are well applied.
For high-availability.zookeeper.client.authorization we can add a unit test which validates the conversion between the Map<String, String> and AuthInfo[].
Rejected Alternatives
Generic configuration for all Apache Curator options via namespaces
We could think about utilising the namespaces. The FLIP could propose adding namespace support for Apache Curator . E.g. metric high-availability.zookeeper.client.<config_option> could be translated into the appropriate <config_option> of the Curator configuration. That would allow to load any parameter supported by these systems.
Unfortunately Curator connection is configured via Builder pattern, when single configuration is translated into the proper call of the Builder object.