You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Status

Current state: Under Discussion

Discussion thread: – (discussion prior to FLIP)

JIRA: –

Released: 

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Flink currently requires configuration to be written to file. By allowing to override this configuration through environment variables, configuration can be made much more flexible. This becomes particularly useful in Kubernetes scenarios where some of the configuration can be defined through secrets exposed as environment variables, e.g. access keys. Furthermore, Flink can benefit from this internally as well as this mechanism provides an easy way to randomize end-to-end test configuration, see FLINK-19520.

The specific approach proposed here is inspired by, and follows in large parts, the design of the equivalent feature of the Spring framework. This provides confidence as the feature has been excessively used already, and familiarity with developers who have a knowledge overlap.

Public Interfaces

No public interfaces apart from configuration are affected. Flink configuration is also not affected directly, but indirectly by virtue of allowing environment variable to override entries in the Flink configuration.

Proposed Changes

After the Flink configuration has been parsed from the file, environment variables (following a clearly defined naming schema, see below) are taken into consideration and are allowed to amend or even replace the parsed configuration. Allowing environment variables to take precedence over the configured values is a deliberate and important choice.

For example, with an environment

KEY_A="Environment=A"
KEY_B="Environment=B"

and a configuration file

key.a: File=A
key.c: File=C

the resulting, effective configuration would be equivalent to a configuration of 

key.a: Environment=A
key.c: File=C
key.b: Environment=B

Naming of environment variables

A key limitation is that environment variables cannot be enforced to be named exactly like their configuration key counter-part. This stems from two reason:

  1. Some systems / shells do not support "." or "-" in variable names.
  2. Environment variables are idiomatically named using uppercase (e.g., $HOME and not $Home or $home), and are actually case-sensitive.

Due to this, a convention needs to be established on how configuration keys are looked up in the environment. We propose that each configuration key (example: key.A-b ) is looked up in the following ways and order, stopping if any yield a match:

  1. key.A-b  (no change)
  2. key_A-b  (periods → underscores)
  3. key.A_b  (hyphens → underscores)
  4. key_A_b  (periods + hyphens → underscores)
  5. KEY.A-B  (uppercase)
  6. KEY_A-B  (uppercase, periods → underscores)
  7. KEY.A_B  (uppercase, hyphens → underscores)
  8. KEY_A_B  (uppercase, periods + hyphens → underscores)

As motivated earlier, this follows the same specification as Spring.

Implementation Notes

Environment variables are evaluated lazily when the configuration option is requested. This is necessary as during parsing of the file there is no global knowledge of supported keys, and eagerly looking up all of them would likely lead to many unnecessary lookups. It is thus proposed to make use of Configuration#getRawValue to intercept querying a configuration parameter and perform the lookups described earlier. If a match is found, it should be cached such that further queries against the same key do not cause additional lookups.

Since Configuration used for various configurations and not just the Flink configuration, it should receive a flag which defines that environment lookups are to be done (disabled by default, enabled only in GlobalConfiguration).

Compatibility, Deprecation, and Migration Plan

No impact on existing users is expected. Hypothetically, if the environment happens to include a variable with a matching name, this change would cause a change in behavior. We consider this risk to be negligibly low, however.

Test Plan

This feature can be covered entirely through unit-tests against Configuration .

Rejected Alternatives

Initially, we also discussed a substitution solution where users modify their Flink configuration to use an environment variable substitution syntax such as 

key.a: ${ENV_VAR_A}
key.b: ${ENV_BAR_B} ms


This approach has been rejected for several reasons:

  1. It changes the syntax of the configuration and would require additional details to be added to the syntax, i.e. to define default/fallback values and to escape variables so that they are not replaced.
  2. It requires the introduction of a new set of environment variables users have to memorize.
  3. If users want full flexibility to override any value, a configuration file would have to be maintained which simply maps all keys to some environment variable.
  • No labels