Table of Contents |
---|
Status
Current state: Under discussionAccepted
Discussion thread: here
Voting thread: here
JIRA: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
Modification times will be tracked in-memory and determined by when they are applied by the worker, as opposed to when they are requested by the user or persisted to the config topic (details below). If no modifications to the namespace have been made since the worker was started, they finished startup, the timestamp will be null.
The GET /admin/loggers
endpoint will have this new response format, where ${last_modified}
is the last modified timestamp:
...
Workers that have not yet completed startup will ignore these records, including if the worker reads one during the read-to-end of the config topic that all workers perform during startup.
Standalone mode
Given that standalone mode by definition only supports one worker, this feature does not seem applicable on the surface. And, for the underlying dynamic log adjustment logic, no changes will be made. However, for the sake of consistency with distributed mode, the scope
query parameter will still be recognized and, if set to cluster
, will cause a 204 response with no body to be returned.
Compatibility, Deprecation, and Migration Plan
Setting logging levels
Existing behavior is preserved as the default for this API. The proposed feature is only available in an opt-in basis.
Getting logging levels
By adding the new last_modified
field to the response format for these endpoints, we introduce some risk of breaking existing tooling that works with the Kafka Connect REST API. If strict deserialization of JSON responses is performed by these tools, then the new field (which will be unrecognized) will cause failures. These tools will need to be updated to either ignore unrecognized fields, or account for the new field.
Worker downgrades
If a worker is downgraded to an earlier version of Kafka Connect that does not recognize dynamic log adjustment records in the config topic, it will log an error message in response to reading a record from that topic with an invalid key. There will be no other impact (for example, the worker won't fail and the availability of its REST API and the connectors/tasks it's assigned will not be compromised).
Test Plan
Unit tests
Restarting a worker will cause it to discard all cluster-wide dynamic log level adjustments, and revert to the levels specified in its Log4j configuration. This mirrors the current behavior with per-worker dynamic log level adjustments.
There may be some delay between when a REST request with scope=cluster
is received and when all workers have read the corresponding record from the config topic. The last modified timestamp (details above) can serve as a rudimentary tool to provide insight into the propagation of a cluster-wide log level adjustment.
Standalone mode
Given that standalone mode by definition only supports one worker, this feature does not seem applicable on the surface. And, for the underlying dynamic log adjustment logic, no changes will be made. However, for the sake of consistency with distributed mode, the scope
query parameter will still be recognized and, if set to cluster
, will cause a 204 response with no body to be returned.
Compatibility, Deprecation, and Migration Plan
Setting logging levels
Existing behavior is preserved as the default for this API. The proposed feature is only available in an opt-in basis.
Getting logging levels
By adding the new last_modified
field to the response format for these endpoints, we introduce some risk of breaking existing tooling that works with the Kafka Connect REST API. If strict deserialization of JSON responses is performed by these tools, then the new field (which will be unrecognized) will cause failures. These tools will need to be updated to either ignore unrecognized fields, or account for the new field.
Worker downgrades
If a worker is downgraded to an earlier version of Kafka Connect that does not recognize dynamic log adjustment records in the config topic, it will log an error message in response to reading a record from that topic with an invalid key. There will be no other impact (for example, the worker won't fail and the availability of its REST API and the connectors/tasks it's assigned will not be compromised).
Test Plan
Unit tests
- Ensure that records produced to the config topic have the expected format
- Ensure that updates to a logging level are reported with the correct last modified timestamp
- Ensure that logging levels that
- Ensure that records produced to the config topic have the expected format
- Ensure that updates to a logging level are reported with the correct last modified timestamp
- Ensure that logging levels that have not been updated have a null last modified timestamp
- Ensure that distributed workers that have completed startup correctly handle logging adjustment config topic records
- Ensure that distributed workers that have not completed startup ignore logging adjustment config topic records
- Ensure that requests to the existing
PUT /admin/loggers/{logger}
endpoint with noscope
query parameter, and withscope=worker
result in the same herder-level behavior as before (mostly likely accomplished by verifying that no interactions with theHerder
object have taken place) - Ensure that cluster-scoped requests with invalid logging levels are rejected with a 404 responseEnsure that repeated requests to set the same logging level for a namespace do not cause its last modified timestamp to be updated
Integration tests
- the existing
PUT /admin/loggers/{logger}
endpoint with an unrecognized value for thescope
query parameter result in the same herder-level behavior as before, but also cause a warning log message to be emitted - Ensure that cluster-scoped requests with invalid logging levels are rejected with a 404 response
- Ensure that repeated requests to set the same logging level for a namespace do not cause its last modified timestamp to be updated
Integration tests
A new integration test will be A new integration test will be added for standalone mode, which will run through this series of scenarios and assertions:
...
- Start a distributed Connect cluster with three workers
- Ensure that the last modified timestamp for all reported logging namespaces is null
- Modify the logging level for a specific namespace for single worker
- Ensure that the response body is non-empty and matches the same format it had prior to this KIP
- Ensure that the last modified timestamp for that namespace on the affected worker is non-null and at least as recent as the time at which the request was issued (some margin of error may be necessary in the highly unlikely but technically possible event that the node responsible for running tests and the one running the worker have skewed clocks)
- Ensure that the logging level for that namespace on the affected worker is reported (via the admin REST API) with the correct level
- Ensure that the last modified timestamp for that namespace on all other workers is still null
- Ensure that the logging level for that namespace on all other workers is unchanged
- Modify the logging level for the root namespace for all workers (using
scope=cluster
)- Ensure that the response body is empty
- Ensure that, after a reasonable timeout, the logging level for all reported namespaces on all workers is reported with the correct level
- Ensure that the last modified timestamp for all namespaces on all workers is non-null and at least as recent as the time at which the request was issued
- Modify the logging level for a specific namespace for all workers (using
scope=cluster
)- Ensure that the response body is empty
- Ensure that, after a reasonable timeout, the logging level for that namespace on all workers is reported with the correct level
- Ensure that the last modified timestamp for that namespace on all workers is non-null and at least as recent as the time at which the request was issued
- Issue a second request to set the same logging level for the same namespace for all workers (using
scope=cluster
)- No assertions will be made for this step
- Modify the logging level for a different specific namespace for all workers (using
scope=cluster
)- Ensure that, after a reasonable timeout, the logging level for that namespace on all workers is reported with the correct level
- Ensure that the last modified timestamp for that namespace on all workers is non-null and at least as recent as the time at which the request was issued
- Ensure that the last modified timestamp for the namespace affected in steps 4 and 5 is unchanged from when it was tested in step 4 (i.e., the second request in step 5 did not affect it)
- Modify the logging level for the root namespace for all workers (using
scope=cluster
)- No assertions will be made for this step
- Modify the logging level for a specific namespace for a single worker (again)
- Ensure that the response body is non-empty and matches the same format it had prior to this KIP
- Ensure that the last modified timestamp for that namespace on the affected worker is at least as recent as the time at which the request was issued
- Ensure that the logging level for that namespace on the affected worker is reported with the correct level
- Ensure that the last modified timestamp for all namespaces except the modified namespace on the affected worker, and all namespaces for all other workers, is unchanged since the root level was modified for all workers*
- Ensure that the logging levels for all namespaces except the modified namespace on the affected worker, and all namespaces for all other workers, is unchanged since the root level was modified for all workers*
...
- Higher maintenance burden: we would have to be able to serve requests that expect both kinds of response format
- Setting an expensive precedent for the Kafka Connect REST API: unless absolutely necessary, we should encourage consumers of the API to tolerate unknown fields in order to permit flexibility in future changes we may opt to make that would only involve adding new fieldsprecedent for the Kafka Connect REST API: unless absolutely necessary, we should encourage consumers of the API to tolerate unknown fields in order to permit flexibility in future changes we may opt to make that would only involve adding new fields
Persistent logging level updates
Both the new cluster-wide API proposed in this KIP and the existing worker-local API added in KIP-495 only support ephemeral updates: any dynamic logging level changes will be discarded if a worker restarts, and the worker will revert to the levels specified in its Log4j configuration.
The rationale for keeping these updates ephemeral is to continue to give priority to workers' Log4j configuration files, with the underlying philosophy that this endpoint is still only intended for debugging purposes, as opposed to cluster-wide configuration. Permanent changes can already be made by tweaking the Log4j file for a worker and then restarting it. If a restart is too expensive for a permanent change, then the change can be applied immediately via the REST API, and staged via the Log4j configuration file (which will then be used the next time the worker is restarted, whenever that happens).
Future work
More scope types
...