Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.

Status

Current state[One of "Under Discussion", "Accepted", "Rejected"]Drafting

Discussion thread: Not ready yethere [Change the link from the KIP proposal email archive to your own email thread]

JIRA: KAFKA-7904

Motivation

Describe the problems you are trying to solve.

Public Interfaces

Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.

A public interface is any change to the following:

  • Binary log format

  • The network protocol and api behavior

  • Any class in the public packages under clientsConfiguration, especially client configuration

    • org/apache/kafka/common/serialization

    • org/apache/kafka/common

    • org/apache/kafka/common/errors

    • org/apache/kafka/clients/producer

    • org/apache/kafka/clients/consumer (eventually, once stable)

  • Monitoring

  • Command line tools and arguments

  • Anything else that will likely break existing users in some way when they upgrade

Proposed Changes

...

Today, a topic partition may be categorized as:

(1) Fully in sync (inSyncReplicas = allReplicasMap)

(2) UnderReplicated

(3) UnderMinIsr

(4) Offline


(3) and (4) are failure scenarios in which clients will face unavailability (producers with acks=ALL will fail to produce if ISR count is under the configured "min.insync.replicas" count).

(2) Under-replicated partitions occur whenever the inSyncReplicas set is not equal to the allReplicasMap, which can happen when:

  • Repartitioning
  • Broker restarts
  • Transient network issues
  • Broker failure


The current categorization of topic partitions has a gap as an UnderReplicatedPartition does not tell operators if the reduced ISR set is intentional (repartitioning/restarts) or if there may be something wrong such as a broker has completely failed. This makes it hard for operators as setting an alert for UnderReplicatedPartitions may not be effective as it may be too noisy, and increasing the # of samples needed to trigger the alert increases the time to detect failures.

This KIP aims to fill this gap by proposing a new categorization of partitions: AtMinIsr, which consists of partitions that only have the minimum number of insync replicas remaining (as configured by "min.insync.replicas").

If a partition is "AtMinIsr", then it suggests something severe has happened, but more importantly that one more failure can result in unavailability so some sort of action should be taken (ex. repartitioning).


Example 1:

1 partition

minIsrCount=2

ISR=[0,1,2]


1. Broker 0 fails

  • ISR=[1,2]
  • partition is UnderReplicatedPartition and AtMinIsr

2. Broker 1 fails

  • ISR=[2]
  • partition is UnderReplicatedPartition and UnderMinIsr

In this example, AtMinIsr triggers when UnderReplicatedPartition triggers and tells us that 1 more failure will cause producers with ack=ALL to be unavailable.


Example 2: 

1 partition

minIsrCount=1

ISR=[0,1,2]


1. Broker 0 fails

  • ISR=[1,2]
  • partition is UnderReplicatedPartition


2. Broker 1 fails

  • ISR=[2]
  • partition is UnderReplicatedPartition and AtMinIsr


3. Broker 2 fails

  • ISR=[]
  • partition is OfflinePartition


In this example, AtMinIsr triggers when there is only 1 insync replica remaining, and tells us that 1 more failure will cause the partition to go completely offline!

Public Interfaces

We will introduce a new metric and a new TopicCommand option to identify AtMinIsr partitions.

Code Block
ReplicaManager.AtMinIsrPartitions


--at-min-isr-partitions


Proposed Changes

We will add the gauge to Partition.scala:

Code Block
newGauge("AtMinIsr",
  new Gauge[Int] {
    def value = {
      if (isAtMinIsr) 1 else 0
    }
  },
  tags
)


...


def isAtMinIsr: Boolean = {
  leaderReplicaIfLocal match {
    case Some(leaderReplica) =>
      inSyncReplicas.size == leaderReplica.log.get.config.minInSyncReplicas
    case None =>
      false
  }
}


And TopicCommand.scala:

Code Block
private val reportAtMinIsrPartitionsOpt = parser.accepts("at-min-isr-partitions",
  "if set when describing topics, only show partitions whose isr count is equal to the configured minimum. Not supported with the --zookeeper option.")


private def hasAtMinIsrPartitions(partitionDescription: PartitionDescription) = {
  partitionDescription.isr.size == partitionDescription.minIsrCount
}


private def shouldPrintAtMinIsrPartitions(partitionDescription: PartitionDescription) = {
  opts.reportAtMinIsrPartitions && hasAtMinIsrPartitions(partitionDescription)
}


Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior how will we phase out the older behavior?
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

Rejected Alternatives

The new TopicCommand option requires use of AdminClient so it will not be available with the --zookeeper option.

Rejected Alternatives

None so farIf there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.