Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When a new broker is added, we will automatically move some partitions from existing brokers to the new one. Out goal is to minimize the amount of data movement while maintaining a balanced load on each broker. We use a standalone coordinator process to do the rebalance and the algorithm is given below.

Take brokers offline

We’d also like to support shrinking a cluster by taking existing set of brokers offline using an administrative command like the following.

Code Block

alter cluster remove brokers broker-1

This will start the reassignment process for partitions currently hosted on broker-1. Once that is complete, the broker-1 will be taken offline. This command will also delete the state change path for broker-1

Data replication

We’d like to allow a client to choose either asynchronous or synchronous replication. In the former case, a message to be published is acknowledged as soon as it reaches 1 replica. In the latter case, we will make our best effort to make sure that a message is only acknowledged after it reaches multiple replicas. When a client tries to publish a message to a partition of a topic, we need to propagate the message to all replicas. We have to decide:

...

  1. How can atomicity be guaranteed in the 2nd type of leader failurein the 2nd type of leader failure ?
  2. How can we avoid the problem of multiple leaders for the same partition at the same time ?
  3. If the brokers are in multiple racks, how to guarantee that at least one replica goes to a different rack?

...

  1. Stores the information of all live brokers.
    Code Block
     /brokers/ids/[broker_id] --> host:port (ephemeral; created by admin) 
  2. Stores for each partition, a list of the currently assigned replicas. For each replica, we store the id of the broker to which the replica is assigned. The first replica is the preferred replica. Note that for a given partition, there is at most 1 replica on a broker. Therefore, the broker id can be used as the replica id
    Code Block
    /brokers/topics/[topic]/[partition_id]/replicas --> {broker_id …}  (created by admin)
     
     
  3. Stores the id of the replica that’s the current leader of this partition
    Code Block
     /brokers/topics/[topic]/[partition_id]/leader --> broker_id (ephemeral) (created by leader) 
  4. Stores the id of the set of replicas that are in-sync with the leader
    Code Block
     /brokers/topics/[topic]/[partition_id]/leaderISR --> {broker_id, …}(ephemeral)created 
    (created
    by leader)
     Stores the id of the set of replicas that are in-sync with the leader
     
  5. This path is used when we want to reassign some partitions to a different set of brokers. For each partition to be reassigned, it stores a list of new replicas and their corresponding assigned brokers. This path is created by an administrative process and is automatically removed once the partition has been moved successfully
    Code Block
     /brokers/topicspartitions_reassigned/[topic]/[partition_id]/ISR --> {broker_id, …} (created by leaderadmin) 
  6. This path is used when we want to reassign some partitions to a different set of brokers. For each partition to be reassigned, it stores a list of new replicas and their corresponding assigned brokers. This path is created by an administrative process and is automatically removed once the partition has been moved successfully by the leader of a partition to enqueue state change requests to the follower replicas. The various state change requests include start replica, close replica, become follower. This path is created during the create topic admin command. This path is only deleted by the remove brokers admin command. The purpose of making this path persistent is to cleanly handle state changes like delete topic and reassign partitions even when a replica is temporarily unavailable (for example, being bounced).
    Code Block
      /brokers/topics
    Code Block
     /brokers/partitions_reassigned/[topic]/[partition_id]/state/[broker_id] --> {broker_id … state change requests ... } (created by admin) 

Key data structures

...