Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Start a Connect cluster with three workersworkers 
    1. Ensure that the last modified timestamp for all reported logging namespaces is null
  2. Modify the logging level for a specific namespace for single worker
    1. Ensure that the last modified timestamp for that namespace on the affected worker is non-null and at least as recent as the time at which the request was issued (some margin of error may be necessary in the highly unlikely but technically possible event that the node responsible for running tests and the one running the worker have skewed clocks)
    2. Ensure that the logging level for that namespace on the affected worker is reported (via the admin REST API) with the correct level
    3. Ensure that the last modified timestamp for that namespace on all other workers is still null
    4. Ensure that the logging level for that namespace on all other workers is unchanged
  3. Modify the logging level for a specific namespace for all workers (using scope=cluster)
    1. Ensure that, after a reasonable timeout, the logging level for that namespace on all workers is reported with the correct level
    2. Ensure that the last modified timestamp for that namespace on all workers is non-null and at least as recent as the time at which the request was issued
  4. Modify the logging level for the root namespace for all workers (using scope=cluster)
    1. Ensure that, after a reasonable timeout, the logging level for all reported namespaces on all workers is reported with the correct level
    2. Ensure that the last modified timestamp for all namespaces on all workers is non-null and at least as recent as the time at which the request was issued
  5. Modify the logging level for a specific namespace for a single worker (again)
    1. Ensure that the last modified timestamp for that namespace on the affected worker is at least as recent as the time at which the request was issued
    2. Ensure that the logging level for that namespace on the affected worker is reported with the correct level
    3. Ensure that the last modified timestamp for all namespaces except the modified namespace on the affected worker, and all namespaces for all other workers, is unchanged since the root level was modified for all workers*
    4. Ensure that the logging levels for all namespaces except the modified namespace on the affected worker, and all namespaces for all other workers, is unchanged since the root level was modified for all workers*

...

No efforts will be made to verify the actual contents of the logs for any workers. KIP-495 was published several years ago and has proven to be effective; since we anticipate that the logic for reading/writing log levels will be largely preserved, it should be enough to rely on the API for querying the Kafka Connect-reported levels of logging namespaces.

A system test is used here instead of one or more integration tests because the latter colocate workers with the same JVM, making it difficult to distinguish between changes to the logging levels of a single worker and the whole cluster.

Rejected Alternatives

Request-time modified tracking

Instead of tracking the last modified timestamp for a logging namespace based on when it was applied by a worker, we could track it by when the request was received, or when it was written to the config topic. This would provide some nice advantagesat least one advantage: assuming all workers have the same view of the config topic, are caught up on the config topic, every worker would give the exact same response for requests to view the levels of loggers. However, it would also be less accurate: users may be dismayed to see that the logging level for a given namespace had a last modified time of T, but that the actual level of logs emitted by that worker for that namespace was different until time T+n, for some non-negative number n.

Versioned request format

In order to guarantee that tools with strict deserialization logic will not break after these changes are appliedwork better with tools that use strict deserialization, we could add either opt-out or opt-in logic to receive requests from endpoints that provide levels for logging namespaces with the newly-proposed format (i.e., with the last modified timestamp). This could come, for example, in the form of a new request header that dictates which version of the API that clients expect.

This change may be slightly smoother for users, but would come with some significant downsidescosts:

  • The Higher maintenance burden would be increased (: we would have to be able to serve requests that expect both kinds of response format)
  • This is Setting an expensive precedent to set for the Kafka Connect REST API; : unless absolutely necessary, we should encourage consumers of the API to tolerate unknown fields in order to permit flexibility in future changes we may opt to make that would only involve adding new fields

...

This proposal introduces a second kind of record to the config topic that's used for cluster-wide communication, but that and is meant to be ignored by any workers brought up after it has been written (the first kind being the one added in KIP-745). These kinds of records runs the risk of flooding the config topic with many records that, due to the compacted nature of the topic, will never be discarded, leading to a monotonically-growing topic.

...

However, we may still want to invest some time in cleanup logic for the config topic, where "control records " like the ones proposed here and introduced in KIP-745 are are followed up with corresponding tombstone records after enough time has elapsed, so that when compaction takes place, they are effectively removed from the topic. These tombstones could possibly be emitted after a fixed delay has elapsed, or possibly after a rebalance has taken place (since every worker reports its current offset in the config topic).