You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

 

Status

Current state: Under Discussion

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

JIRA: Unable to render Jira issues macro, execution error.

Github Pull Request: https://github.com/apache/kafka/pull/132

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Machines in data center are sometimes grouped in racks. Racks provide isolation as each rack may be in a different physical location and has its own power source. When resources are properly replicated across racks, it provides fault tolerance in that if a rack goes down,  the remaining racks can continue to serve traffic.

In Kafka, if there are more than one replica for a partition, it would be nice to have replicas placed in as many different racks as possible so that the partition can continue to function if a rack goes down. In addition, it makes maintenance of  Kafka cluster easier as you can take down the whole rack at a time.

In AWS, racks are usually mapped to the concept of availability zones

Public Interfaces

An optional broker property will be added 

case class Broker(id: Int, endPoints: Map[SecurityProtocol, EndPoint], rack: Option[String] = None)

rack will be an optional field of version 2 JSON schema for a broker. e.g.:

{"version":2,
  "host","localhost",
  "port",9092
  "jmx_port":9999,
  "timestamp":"2233345666",
  "endpoints": ["PLAINTEXT://host1:9092",
                "SSL://host1:9093"],
  "rack": "dc1"
}

If no rack information is specified, the field will not be included in JSON.

Rack can be specified in the broker property file as

rack=<rack ID as string>

For example, 

rack=dc1

rack=us-east-1c

The rack property will be part of meta data response. If there is no rack, the rack property will not be written to the wire.

Proposed Changes

  • AdminUtils.assignReplicasToBrokers will be updated to create broker-rack mapping from ZooKeeper data before doing replica assignment. If none of the brokers has rack, it will produce the same assignment as the current implementation without using rack. If some brokers have rack, and some do not have, the API will throw AdminOperationException. This is to prevent none rack aware assignment being created during version upgrade and human error where the rack information is missing. When making the rack aware assignment, it tries to keep the following properties:
    • Even distribution of replicas among brokers
    • Even distribution of partition leadership among brokers
    • Assign to as many racks as possible. That means if the number of racks are more than or equal to the number of replicas, each rack will have at most one replica. On the other hand, if the number of racks is less than the the number of replicas (which should happen very infrequently), each rack should have at least one replica and no other guarantees are made on how the replicas will be distributed among racks. For example, if there are 2 racks and 4 replicas, one rack can have 3 replicas, 2 replicas or 1 replica. This is to keep the algorithm simple while still keeping other replica distribution properties and fault tolerance from the racks.
  • Implementation detail of the rack aware assignment (see more in the pull request https://github.com/apache/kafka/pull/132):
    • Before doing the rack aware assignment, sort the broker list such that they are interlaced according to the rack. In other words, adjacent brokers in the sorted list should not be in the same rack if possible . For example, assuming 6 brokers mapping to 3 racks: 0 -> "rack1", 1 -> "rack1", 2 -> "rack2", 3 -> "rack2", 4 -> "rack3", 5 -> "rack3", the sorted broker list could be (0, 2, 4, 1, 3, 5)
    • Apply the same assignment algorithm to assign replicas, with the addition of skipping a broker if its rack is already used for the same partition

Compatibility, Deprecation, and Migration Plan

  • Rack property will be included in version 2 broker JSON schema and version 1 of UpdateMetaDataRequest. 
  • No labels