Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Remove (now-addressed) open question

...

Rejected because: Achieving consensus in a Connect cluster about whether to begin engaging in this new topic-based protocol would require either reworking the Connect group coordination protocol or installing several new configurations and a multi-stage rolling upgrade in order to enable it. Requiring new configurations and a multi-stage rolling upgrade for the default use case of a simple version bump for a cluster would be a much worse user experience, and if the group coordination protocol is going to be reworked, we might as well just use the group coordination protocol to distribute keys instead. Additionally, the added complexity of switch from a synchronous to an asynchronous means of communication for relaying task configurations to the leader would complicate the implementation enough that reworking the group coordination protocol might even be a simpler approach with smaller changes required.

Open Questions

  • Will it be necessary to support multiple keys at once, in the event that a follower worker makes a request to the internal endpoint during a rebalance (in which case the follower and worker would be using different keys)? Is this event even possible?
    • The DistributedHerder class appears to retry infinitely when failures are encountered in task reconfiguration. If this happens on a separate thread from (or just doesn't block) the rebalance logic (which would be responsible for updating the key used by the herder) then it's possible this is fine. However, if this happens on the same thread as (and effectively blocks) the rebalance logic, then there will be deadlock as the worker will have to successfully complete the request for task reconfiguration before receiving its new key, and it will have to receive its new key before it can successfully complete the request for task reconfiguration.