Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If a task becomes blocked, the results are less severe. The framework gives each task its own dedicated thread, and all interactions with the task (such as invoking start , stop , and poll / put) take place on that thread. If a task becomes blocked during any of these methods, the impact is limited to the task itself. If the worker has to shut down a task, the task is given a graceful shutdown period to allow it to de-allocate resources, finish committing data, etc. If the shutdown timeout is exceeded, the task is abandoned and the framework moves on. No more calls to put or poll are made on that task, so the only major problem posed by this now-abandoned task is that it may still hold onto resources such as . Most of these resources are task-specific and include things like file descriptors, memory, network connections, or resource locks. However, there are also some worker resources that are also still retained, including the thread for the task itself, the memory allocated for it, and the producer or consumer set up by the framework on its behalf.

Motivation

The goal of this KIP is to add some lightweight improvements to the Connect framework that should make it easier to deal with these abandoned connectors and tasks. These improvements include:

...

A new configuration property will be introduced that will allow users to fine-tune how long connectors are allowed to take during shutdown.

NameTypeDefaultImportanceDoc
connector.shutdown.graceful.timeout.msLONG5000LOW

Amount of time to wait for connectors and tasks to shutdown gracefully. This is the total amount of time, not per connector or task. All connectors and tasks have shutdown triggered, then they are waited on sequentially. If specified, will take priority over the now-deprecated task.shutdown.graceful.timeout.ms property

Note that the default value for this property matches the current default for the task.shutdown.graceful.timeout.ms property.

Additionally, the task.shutdown.graceful.timeout.ms property will be deprecated and removed in a later release.

Logging Additions

Tracking method calls

...

Rejected because: there is no clear advantage to being able to specify separate timeout periods for connectors and tasks, and it'd be a configuration burden to have to specify timeouts for both even if a single timeout is all that is desired.

Co-Opting task.shutdown.graceful.time.ms

Summary: instead of deprecating the task.shutdown.graceful.timeout.ms property and introducing the connector.shutdown.graceful.timeout.ms property, keep the task.shutdown.graceful.timeout.ms property and apply it to both connectors and tasks.

Rejected because: "task" has a very specific meaning with Connect and it's important to respect that definition. "Task" only ever refers to a connector task and not the connector itself, whereas "connector" can be used to refer to a connector and all of its tasks (the "logical" connector that the user configures, as opposed to the "physical" Java Connector object that the framework brings up).

Replacing "worker"

...

Log Context Scope with "connector"

Summary: because connectors may be given their own dedicated threads if/when work on 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-9374
 is complete, the scope "worker" is a bit of a misnomer, and it would probably be more intuitive to users to replace it with "connector".

Rejected because: breaks backwards compatibility and doesn't add any extra information to the logs emitted by workers since the new term would be used in the exact same places the old one currently is. Originally, the term "connector" may have been preferable, but it's not worth it now to change things and risk breaking users' log collection tools.

Using Existing Connector and Task Metrics to Report on Abandoned Instances

Summary: instead of adding the new (METRICS GO HERE) MBeans to report on abandoned instances of specific connectors and tasks, use the existing kafka.connect:type=connector-metrics,connector=([-.\w]+) and kafka.connect:type=connector-task-metrics,connector=([-.\w]+),task=([\d]+) MBeans.

Rejected because: the lifetimes of the existing MBeans are directly tied to the lifetimes of the connectors and tasks that they report on. Retaining those MBeans even after a connector or task has apparently been deleted (but has in reality only been abandoned by the worker after failing to shut down in time) may break existing tooling that relies on that correlation. Also, a new namespace provides more room to add new connector- and task-specific metrics for abandoned instances if we'd like to.