Page properties | ||
---|---|---|
|
Status
Current state: Under DiscussionDiscarded
Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FlinkFLIP-161-configurationConfiguration-fromthrough-environment-variables-td47946td48094.html (discussion prior to FLIP)
JIRA: –
Released: –
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Describe the problems you are trying to solveFlink currently requires configuration to be written to file. By allowing to override this configuration through environment variables, configuration can be made much more flexible. This becomes particularly useful in Kubernetes scenarios where some of the configuration can be defined through secrets exposed as environment variables, e.g. access keys. Furthermore, Flink can benefit from this internally as well as this mechanism provides an easy way to randomize end-to-end test configuration, see FLINK-19520.
Public Interfaces
Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.
A public interface is any change to the following:
- DataStream and DataSet API, including classes related to that, such as StreamExecutionEnvironment
- Classes marked with the @Public annotation
- On-disk binary formats, such as checkpoints/savepoints
- User-facing scripts/command-line tools, i.e. bin/flink, Yarn scripts, Mesos scripts
- Configuration settings
- Exposed monitoring information
Proposed Changes
...
No public interfaces apart from configuration are affected. Flink configuration is also not affected directly, but indirectly by virtue of allowing environment variable to override entries in the Flink configuration.
Proposed Changes
After the Flink configuration has been parsed from the file, environment variables (following a clearly defined naming schema, see below) are taken into consideration and are allowed to amend or even replace the parsed configuration. Allowing environment variables to take precedence over the configured values is a deliberate and important choice.
For example, with an environment
Code Block |
---|
FLINK_CONFIG_KEY_A="Environment=A"
FLINK_CONFIG_KEY_B="Environment=B" |
and a configuration file
Code Block |
---|
key.a: File=A
key.c: File=C |
the resulting, effective configuration would be equivalent to a configuration of
Code Block |
---|
key.a: Environment=A
key.c: File=C
key.b: Environment=B |
Naming of environment variables
A key limitation is that environment variables cannot be enforced to be named exactly like their configuration key counter-part. This stems from two reason:
- Some systems / shells do not support "." or "-" in variable names.
- Environment variables are idiomatically named using uppercase (e.g., $HOME and not $Home or $home), and are actually case-sensitive.
Due to this, a convention needs to be established on how configuration keys are looked up in the environment. We propose that environment variables for the Flink configuration (e.g. key.A-b
) follow the following schema:
- Prefix "FLINK_CONFIG_" →
FLINK_CONFIG_key.A-b
- Replace "." (period) with "_" (underscore) →
FLINK_CONFIG_key_A-b
- Replace "-" (dash) with "__" (double underscore) →
FLINK_CONFIG_key_A__b
- Uppercase →
FLINK_CONFIG_KEY_A__B
This provides a (semi-)bijective function between environment variable name and configuration key. More specifically, it allows parsing configuration keys from the environment without having to have prior knowledge of available configuration keys. Given an environment, we can look for all environment variables starting with the FLINK_CONFIG_ prefix and map them to their configuration key counterpart by following the inverse procedure:
- Remove FLINK_CONFIG_ prefix →
KEY_A__B
- Replace "__" (double underscore) with "-" (dash) →
KEY_A-B
- Replace "_" (underscore) with "." (period) →
KEY.A-B
- Lowercase →
key.a-b
As we can see, this yields the original (intended) configuration key, with the only difference being the casing. Configuration currently treats keys case-sensitively, but we propose to relax this requirement and treat them case-insensitively during the lookup of a specific key.
This mapping is not strictly bijective, but cases with consecutive periods or dashes in the key name are not considered here and should not (reasonably) be allowed. This should therefore be enforced in the implementation as well to prevent further development to run into such scenarios.
Implementation Notes
This proposal affects two code paths:
- In GlobalConfiguration, parsing the environment given the procedure above will be implemented. This overrides any configuration present from the configuration file.
- In Configuration, looking up a key in the internal data structure is changed to become case-insensitive.
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
- If we are changing behavior how will we phase out the older behavior?
- If we need special migration tools, describe them here.
- When will we remove the existing behavior?
Test Plan
Describe in few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?
Rejected Alternatives
No impact on existing users is expected. Hypothetically, if the environment happens to include a variable with a matching name, this change would cause a change in behavior. We consider this risk to be negligibly low due to the chosen naming schema, however.
Test Plan
These changes can be covered entirely through unit tests against Configuration and GlobalConfiguration.
Rejected Alternatives
Substitution
Initially, we also discussed a substitution solution where users modify their Flink configuration to use an environment variable substitution syntax such as
Code Block |
---|
key.a: ${ENV_VAR_A}
key.b: ${ENV_BAR_B} ms |
This approach has been rejected for several reasons:
- It changes the syntax of the configuration and would require additional details to be added to the syntax, i.e. to define default/fallback values and to escape variables so that they are not replaced.
- It requires the introduction of a new set of environment variables users have to memorize.
- If users want full flexibility to override any value, a configuration file would have to be maintained which simply maps all keys to some environment variable.
Lazy Evaluation
We initially proposed a naming schema for environment variables which closely follows how Spring does it, which includes several different alternatives per configuration key. However, this approach requires either complete knowledge of all configuration keys upfront, or lazy evaluation of the environment when a configuration key is looked up. During the discussion it was decided that neither approach seems favorable or very feasibleIf there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.