ReasonOut of scope of Flink.

Status

Current state: Rejected

Discussion thread: https://lists.apache.org/thread/n8omkpjf1mk9jphx38b8tfrs4h3nxo3z

JIRA: Unable to render Jira issues macro, execution error.

Released: <Flink Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

As requested by many Flink users in the community, it will be very helpful to have a feature to change the level of certain loggers dynamically at runtime without restart, so that the issue and bug that is not able to be easily reproduced locally can be detected with the help of additional logs.

The level is able to be changed currently if the cluster is running on an environment where log4j2 is used and log4j.properties can be modified, for example, running on Kubernetes while the log4j.properties file is backed by a ConfigMap. However, there are still cases where the log configuration file cannot be changed. Yarn deployment is quite a good example of this.

Public Interfaces

REST APIs

Change the logging level of a logger

URL: /logconfig

Verb: POST

Request Payload Example:

Set the level of the logger "org.apache.flink.runtime" to DEBUG.

request.json
{
	"loggerName": "org.apache.flink.runtime",
	"level": "DEBUG"
}

Or reset the level to its original one.

request.json
{
	"loggerName": "org.apache.flink.runtime",
	"level": null
}

Response Code: 200 OK

Response Payload: empty

Proposed Changes

In general, the aforementioned public REST API will be introduced. Besides that, the following two RPC methods are also introduced to support this feature.

RPCs

ResourceManagerGateway

ResourceManagerGateway.java
public interface ResourceManagerGateway {


    /**
     * Changes the level of the logger at runtime.
     *
     * <p>By providing a {@code null} LogLevel, the previously-changed level is reverted to its
     * original value.
     *
     * @param loggerName the name of the logger
     * @param level the log level
     * @return future which is completed exceptionally if the operation fails
     */
    CompletableFuture<Void> changeLogLevel(String loggerName, @Nullable LogLevel level);
}

This RPC method will be called by the handler that serves the /logconfig request. When it is called, it does two thing,

  • It changes the logging level on this job manager.
  • It broadcasts this change to all the task managers that are currently registered on this job manager, by calling the RPC method TaskExecutorGateway.changeLogLevel as describe below.

TaskExecutorGateway

TaskExecutorGateway.java
public interface TaskExecutorGateway {

    /**
     * Change the level of the logger at runtime.
     *
     * <p>By providing a {@code null} LogLevel, the previously-changed level is reverted to its
     * original value.
     *
     * @param loggerName the name of the logger
     * @param level the log level
     * @return future which is completed exceptionally if the operation fails
     */
    CompletableFuture<Void> changeLogLevel(String loggerName, @Nullable LogLevel level);
}

When this method is called by a ResourceManager, it does only one thing: changes the logging level on this task manager.

Logging Abstraction

Finally, the problem left is how the logging level is changed on a job manager or task managers. Although log4j2 is the default logging implementation that is included in the distribution, other logging frameworks, including log4j1 and logback, are also recommended by the documentation "How to use logging". We need to have some kind of abstraction to not directly depend on the logging implementation. The following interface and classes are introduced so that not only do they suit the currently supported ones but also we can have any other logging frameworks including the user-defined ones.

logging-abstraction-class-diagram

As illustrated by the class diagram above, the interface LogginerProvider is introduced where the setLogLevel() method varies among different logging implementations. The isEnabled() method is invoked during the initialization. All the LoggingProvider implementation classes, that are registered using the Java service loading facility, are tested and only the first enabled one will be used. Any subclass of the Slf4jLoggingProvider is considered to be enabled as long as the factory name returned from StaticLoggerBinder.getLoggerFactoryClassStr() is the same as the one of its logging implementation factory. If no one is enabled, NoOpLoggingProvider is used, and warning messages will be printed both duration initialization and when setLogLevel() method is called.

Supported Log Levels and Their Mappings

Here is the table of the corresponding native levels of all the supported logging implementations.

Level / ImplementationLog4jLogbackJava Logging
TRACEorg.apache.logging.log4j.Level.TRACEch.qos.logback.classic.TRACEjava.util.logging.FINEST

DEBUG

org.apache.logging.log4j.Level.DEBUGch.qos.logback.classic.DEBUGjava.util.logging.FINE
INFOorg.apache.logging.log4j.Level.INFOch.qos.logback.classic.INFOjava.util.logging.INFO
WARNorg.apache.logging.log4j.Level.WARNch.qos.logback.classic.WARNjava.util.logging.WARNING
ERRORorg.apache.logging.log4j.Level.ERRORch.qos.logback.classic.ERRORjava.util.logging.SEVERE
SEVEREorg.apache.logging.log4j.Level.ERRORch.qos.logback.classic.ERRORjava.util.logging.SEVERE
OFForg.apache.logging.log4j.Level.OFFch.qos.logback.classic.OFFjava.util.logging.OFF

Limitations

Considered to be an MVP (Minimal Viable Product), this improvement does not support the features, such as defining the scope (e.g. job manager or task manager only) and a timer. The log levels of the target logger are going to be changed cluster-wide, on the job manager leader and all the currently-registered task managers. To unset the previously-changed level, pass a null log-level argument.

Another limitation is that changing the log levels is not considered to be a persistent operation for simplicity. Without storing the log settings into a HighAvailabilityService, it is also impossible to change the log level on a follower job manager. If a task manager is (re-)joined or a standby job manager becomes the leader, the log levels on these processes remain unchanged and only will be changed upon the next call.

Compatibility, Deprecation, and Migration Plan

It does not affect any compatibility or have any deprecation or migration.

Test Plan

Besides unit tests, system tests will be included to cover the cases where different logging providers are used or no logging provider is enabled.

Rejected Alternatives

N/A