Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • This doesn't work for cases where a source partition is reassigned across tasks. For example, if task T1 of a connector is responsible for reading table A, and then the connector reassigns A to task T2, task T1 needs to be shut down completely before task T2 starts up. This can be referred informally to as the "source partition reshuffling problem".
  • This doesn't work for cases where the number of tasks in a connector is reduced. For example, if the number of tasks is brought down from 8 to 5, then tasks 6, 7, and 8 will never be fenced out by successors since none will be brought up.

Fencing during rebalance

Summary: instead of exposing an internal REST endpoint, perform zombie fencing automatically during rebalances.

Rejected because: tightly coupling zombie fencing and rebalancing makes fencing fairly heavyweight and complicates the rebalancing process. On top of that, if fencing fails and the user wants to retry the operation, requiring a rebalance for that retry attempt is unnecessarily high-cost.

Connector owners performing fencing and writing directly to config topic

Summary: instead of only allowing the leader to perform zombie fencing and write task count records and task configs to the config topic, let individual workers handle this responsibility. The owner of the Connector object for a connector would be given all three of these responsibilities.

Rejected because: in order to maintain the integrity of the config topic, it's imperative that only a single worker be able to access it at a time for a given connector. This could be accomplished by allowing each worker to write to the config topic with a transactional producer whose transactional ID is mapped in a 1:1 fashion from the name of the connector. However, if a rebalancing bug occurs and two non-zombie workers believe they both own the same Connector object, it's unclear how the cluster could gracefully recover from this, and it's likely that manual intervention by the user would be required.

Connector-defined transaction boundaries

...