Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When the number of available brokers is sufficient and no assignment is specified, creating an under-replicated topic/partition may be preferred: e.g. replication factor 3, in a 3-zones region with 2 brokers per region, but a zone is down.
In such case the create request could make use of all observed brokers rather than the currently online ones.

We propose a protocol upgrade to CreateTopicsRequest and CreatePartitionsRequest to specify a minimum number of replicas to be available at creation to satisfy the request. 
A new error code will be added to indicate a topic /partition creation was completed without the full replication factor.
Abroker configuration will be added to select the desired behavior handling a create request that accepts under-replication.
Topic policies could be used to add more complex validations, for exampleto allow creation with min-isr factor but not less.

Public Interfaces

Network protocol and api behavior

Code Block
titleCreateTopicsRequest
CreateTopics Request (Version: 4) => [create_topic_requests] timeout validate_only 
  create_topic_requests => topic num_partitions replication_factor [replica_assignment] [config_entries] min_required_replication_factor
    topic => STRING
    num_partitions => INT32
    replication_factor => INT16
    replica_assignment => partition [replicas] 
      partition => INT32
      replicas => INT32
    config_entries => config_name config_value 
      config_name => STRING
      config_value => NULLABLE_STRING
    min_required_replication_factor => INT16  //new field
  timeout => INT32
  validate_only => BOOLEAN
Code Block
CreateTopicsResponse (Version: 4) //same schema as version 3
Code Block
titleCreatePartitionsRequest
CreatePartitions Request (Version: 2) => [topic_partitions] timeout validate_only 
  topic_partitions => topic new_partitions 
    topic => STRING
    new_partitions => count [assignment] min_required_replication_factor
      count => INT32
      assignment => ARRAY(INT32)
      min_required_replication_factor => INT16  //new field
  timeout => INT32
  validate_only => BOOLEAN
Code Block
CreatePartitionsResponse (Version: 2) //same schema as version 1

Error Codes

...

ERROR

...

CODE

...

RETRIABLE

...

DESCRIPTION

...

CREATED_UNDER_REPLICATED

...

79

...

False

will only be created if there are at least "min.insync.replicas" brokers available, ie a topic can be created under-replicated but it will always be at least at min.insync at creation.

Public Interfaces

...

Code Block
languagejava
titleErrors
package org.apache.kafka.common.protocol;
public enum Errors {
...
    CREATED_UNDER_REPLICATED(79, "The server created a topic or partitions without the full set of replicas being active.",
        CreatedUnderReplicatedException::new);
}

AdminClient APIs

The AdminClient will be updated to expose the new fields. 

Code Block
languagejava
titleNewTopic
// Keep existing constructors but add a getter/setter pair 
// to specify the minimum required replication factor if different from full 
package org.apache.kafka.clients.admin;
public class NewTopic {
    ....   
    // return `this` for chaining 
    public NewTopic minRequiredReplicationFactor(short requiredReplicationFactor) {}
    public short minRequiredReplicationFactor() {}
}
Code Block
languagejava
titleNewPartition
// Keep existing factory methods and add a getter/setter pair (like NewTopic)  
// to specify the minimum required replication factor if different from full
package org.apache.kafka.clients.admin;
public class NewPartition {
    ....   
    // return `this` for chaining 
    public NewPartition minRequiredReplicationFactor(short requiredReplicationFactor) {}
    public short minRequiredReplicationFactor() {}
}

Broker Configuration

NAME

DESCRIPTION

TYPE

DEFAULT

VALID VALUES

IMPORTANCE

DYNAMIC UPDATE MODE

enable.under.replicated.creation.policy

Behavior on Topic or Partition creation without the full set of replicas being active.

STRINGBOOLEAN

disabledfalse

true, falsedisabled, enabled, enabled_prefer_observed

Medium

cluster-wide

Administrators can disable (disabledfalse), or enable (enabledtrue) creations of topics/partitions without the full replication factor. When enabled a sub-option is to always prefer using all known brokers and their racks and have some under replication, rather than using currently available online brokers (enabled_prefer_observed)

Command line tools and arguments

New optional argument to the kafka-topics script:

...


...


Zookeeper changes:

New non-ephemeral node under /brokers called observed_ids.
Upon starting brokers will create/update the non ephemeral node for their current id containing the same information as the existing ephemeral brokers/ids node. 

...

  • For compatibility the default behavior remains unaltered: a number of active brokers equals (or greater) to the full replication factor is required for successful creation.

  • No migration is necessary.

  • Updates to the Decommissioning brokers section in the documentation
    • will mention that if a broker id is never to be reused then its corresponding node in zookeeper /brokers/observed_ids will need to be removed manually

Rejected Alternatives

  • Async replica assignment: Instead of relying on previously known brokers, we considered creating replicas on the available brokers at topic/partition create time and saving in zookeeper some information to asynchronously create missing replicas on the next suitable brokers that would join the cluster. Such a process would involve the controller checking if missing replicas existed when a broker joins the cluster. This alternative felt too complicated. Also the next joining broker may not be on the preferred rack assignment.
  • Instead of specifying a minimum replication factor on creation, we could have used a boolean to allow creation at min-isr size only. This approach looked a bit less flexible but not significantly simplerUpdate CreateTopicsRequest and CreatePartitionsRequest to allow users to disable creation of under replicated topics. This approach felt too complicated.

  • Update schema of CreateTopicsResponse and CreatePartitionsResponse to contain the actual replication factor at creation. Like assignments, it's not data the client can act on. We think the error message associated with the new error code can suffice.

...