Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • AdminUtils.assignReplicasToBrokers will be updated to create broker-rack mapping from ZooKeeper data before doing replica assignment. If none of the brokers have rack information, the algorithm will create the same assignment as the current implementation. If some brokers have rack, and some do not, the algorithm will thrown an exception. This is to prevent incorrect assignment caused by user error. 
  • When making the rack aware assignment, it tries to keep has the following properties:
    • Even distribution of replicas among brokers
    • When the number of partition is N (where N is a positive integer) times number of brokers
      • if each rack has the same broker count, each broker will have the same leader count and replica count.
      • if each rack has different broker count, each broker will have the same leader count, but may have different replica count
      Even distribution of partition leadership among brokers
    • Assign to as many racks as possible. That means if the number of racks are more than or equal to the number of replicas, each rack will have at most one replica. On the other hand, if the number of racks is less than the the number of replicas (which should happen very infrequently), each rack should have at least one replica and no other guarantees are made on how the replicas will be distributed among racks. For example, if there are 2 racks and 4 replicas, one rack can have 3 replicas, 2 replicas or 1 replica. This is to keep the algorithm simple while still keeping other replica distribution properties and fault tolerance from the racks.
  • Implementation detail of the rack aware assignment (see more in the pull request https://github.com/apache/kafka/pull/132):
    • Before doing the rack aware assignment, sort the broker list such that they are interlaced according to the rack. In other words, adjacent brokers in the sorted list should not be in the same rack if possible . For example, assuming 6 brokers mapping to 3 racks: 0 -> "rack1", 1 -> "rack1", 2 -> "rack2", 3 -> "rack2", 4 -> "rack3", 5 -> "rack3", the sorted broker list could be (0, 2, 4, 1, 3, 5)
    • Apply the same assignment algorithm to assign replicas, with the addition of skipping a broker if its rack is already used for the same partition
  • If one or more brokers does NOT have rack information
    • For auto topic creation, AdminUtils.assignReplicasToBrokers will create the same assignment as the current implementation (as if no broker has the rack information) and continue with topic creation. This allows auto topic creation to work when doing rolling upgrade.
    • For command line tools (TopicCommand and ReassignPartitionsCommand), an exception will be thrown. This will alert the user that a broker may be misconfigured. An additional command line argument --ignore-racks can be supplied to suppress such error and continue with topic creation ignoring all rack information.
  • UpdateMetadataRequest should be updated to correctly handle rack for both controller protocol version 0 and version 1.

...