You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Status

Current state: Under Discussion

Discussion thread: here

JIRA: KAFKA-7236

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

A topic partition can be in one of four states (assuming replication factor of 3):


(ISR = in sync replica)


3/3 ISRs: OK

2/3 ISRs: WARNING (under-replicated partition)

1/3 ISRs: CRITICAL (under-replicated partition)

0/3 ISRs: FATAL (offline/unavailable partition)


TopicCommand already has the --under-replicated-partitions and --unavailable-partitions flags, but it would be beneficial to include an additional --critical-partitions option that specifically lists out partitions in CRITICAL state (only one remaining ISR left).

With this new option, Kafka users can use this option to identify the exact topic partitions that are critical and need immediate repartitioning. Kafka users can also set up critical alerts to trigger when the output of this command contains partitions.

A couple cases where identifying this CRITICAL state is useful in alerting:

  • Users that have a large amount of topics in a single cluster, making it incredibly hard to manually repartition all topics that have under-replicated partitions, so they only take action when it hits CRITICAL state
  • Users with a high replication-factor that can tolerate some broker failures and only take action when it hits CRITICAL state

My group is currently one of these cases and we created a script to periodically run the describe-topics command and extract any CRITICAL topic partitions from the output to alert us. The new option in this KIP would allow us to bypass the parsing.

Public Interfaces

This change would only modify kafka-topics.sh script to include an additional option that follows the existing output of the under-replicated-partitions and offline-partitions options.

Proposed Changes

When a user has specified the --critical-partitions option, TopicCommand will only print out topic partitions with ISR count equal to 1 if the replication factor of the topic is greater than 1.

We will not include topic partitions with a replication factor of 1 as they are intended to be single replica partitions so it would not be useful to list them out in this command.

The output will be in the same exact format as the --under-replicated-partitions and --unavailable-partitions options.

Compatibility, Deprecation, and Migration Plan

As this change adds a new option instead of modifying existing ones, there will not be any compatibility issues or a migration.

Rejected Alternatives

single-replica-partitions option

We could add this option that lists out all topic partitions that have only one in sync replica. This would include all partitions with a single in sync replica (RF >= 1).

This is not ideal however as the Kafka user will still have to parse and extract the critical partitions (where RF > 1) that need repartitioning. It is better to provide an option that lists out only partitions with RF > 1 so the user does not have to do any filtering.


  • No labels