Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When the active controller decides that a standby controller should start a snapshot, it will communicate that information in its response to the periodic heartbeat sent by that node.  When the active controller decides that it itself should create a snapshot, it will first try to give up the leadership of the Raft quorum in order to avoid a unnecessary delays while writing the snapshot.

Because the snapshots are centrally coordinated by the active controller, we can avoid initiating more than one snapshot at once.  The controller will also snapshot less frequently when too many members of the quorum have fallen behind.  Specifically, if losing a node would probably impact availability, we will use a separate set of configurations for determining when to snapshot.

...

When the broker accepts the registration, it grants or renews a broker ID lease associating the broker process with its ID.  Leases are time-bounded. The length of the lease is 10 times the length of the configured broker heartbeat interval, which puts it at 30 seconds by default.

  A broker cannot continue using a lease indefinitely after sending a single heartbeat.  When brokers are rejected by the controller, or otherwise unable to renew their lease before it expires, they enter the "fenced" state.

...