Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The 202 ACCEPTED response signifies that the “restart request” has been durably written to the config topic and all the workers in the Connect cluster will (eventually) see the restart request. If the worker reads the restart request as part of worker startup, it can ignore the restart request since the worker will subsequently attempt to start all of its assigned Connector and Task instances, effectively achieving the goal of restarting the instances assigned to that worker. If the worker reads the restart request after worker startup, then the DistributedHerder will enqueue the request to be processed within its next tick() invocation. As part of the tick() methodthe herder's main thread loop. During this main thread loop, the herder will dequeue all pending restart requests and for each request use the current connector status and the herder’s current assignments to determine which of its Connector and Task instances are to be restarted, and will then stop and restart them. Note that because this is done within the tick() methodmain thread loop, the herder will not concurrently process any assignment changes while it is executing the restart requests.

The “restart request” written to the config topic, which already is where the connector and task config records, task state change records, and session key records are written. This topic also make sense since all records related to restarts and configuration changes are totally ordered, and are all processed within the herder's `tick()` methodmain thread loop. The "restart request" records will not conflict with any other types of config records, will be compatible with the compacted topic, and will look like:

...

On the other hand, the current approach is more reliable, since once the restart request is written to the config topic it will be eventually consumed by all workers. The current proposal also builds upon and reuses much more of the existing functionality in the worker, making the overall implementation more straightforward. There is also no chance for changing worker assignments to interfere with the restarts, since the current approach performs the restarts during the same herder tick method that thread loop that reacts to all rebalance changes. And, the new approach is more efficient, as some restart requests can be ignored if the worker will subsequently (re)start its assigned instances. For example, if a restart for a connector is requested but one of the worker is itself restarted (or joins the cluster), the worker as part of startup will start all of its assigned Connector and Task instances, making the restart unnecessary.

...