Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.

Status

Current state:  "Under Discussion"

Discussion thread: here

JIRAKAFKA-7362

Motivation

When partition reassignment removes topic partitions from a offline broker, those removed partitions become orphan partitions to the offline broker. When the offline broker comes back online, it is not able to clean up both data and folders that belong to orphan partitions.  Log manager will scan all log dirs during startup, but the time based The retention policy on a topic partition will not be kicked out until the replicaHighWatermark  has been set. replicaHighWatermark of a partition is set when known to the broker becomes as either leader partition or follower of the partition.  Orphan partitions will never have chance to get replicaHighWatermark setpartition (i.e., until "replicaHighWatermark" is set).  Orphan partitions will not become leader or follower partitions and thus will not have retention.  In addition, we do not have logic to delete folders that belong to orphan partition today.  This KIP provides a mechanism for brokers to remove orphan partitions automatically. 

Public Interfaces

A public interface is any change to the following:

Add a broker config "auto.orphan.partition.removal.delay.ms" :  the delay after which orphan partitions will start to be removed.   The default value is set to -1, which indicates the orphan partition removal is disabled.

  • Add two metrics: 
    1) kafka.log:type=LogManager,name=OrphanLogPartitionCount  
    type = gauge
    value = the number of orphan partitions
    2) kafka.log:type=LogManager,name=OrphanLogPartitionSize
    type = gauge
    value = the size of orphan partitions

Proposed Changes

a) Provide a mechanism to remove orphan partition automatically.  The orphan partitions removal works as follows:

  1. Collect orphan partitions.
    The orphan partition set will be determined based on the first leaderandISR request when a broker starts. Topic Partitions that are present in the broker but not in the leaderandISR request are determined as orphan partitions. 

  2. Schedule the orphan partition removal
    Schedule the orphan partition removal after a certain time interval (such as 2 hours, defined internally in the broker). This time interval enables the broker to revive orphan partitions for pending partition reassignment in the cluster. Since partition reassignment request might come as separated leaderandISR request, the delayed orphan partition removal makes it possible for the broker to re-host the same topic partition without actually fetching entire toptic partition data from partition leader. 

  3. Remove the orphan partitions

Proposed Changes

The orphan partitions removal works in three phases. 

  1. Initialize phase
    During a broker startup, broker calculates the initial set of orphan partitions based on the partition information from the first leaderandISR request. 
  2.  timeout/correction phase (such as 24 hours of timeout, defined by "auto.orphan.partition.removal.delay.ms")
    Timeout phase serves as two purposes:
    2-a) Update the broker’s knowledge about partitions over time. The firstleaderandISR request the broker receives might be outdated (due to dual controllers, outdated requests, etc.). However, during the timeout phase , the broker will receive more leaderandISR requests and use partitions information from leaderandISR requests to remove partitions that the broker is responsible for from the initial orphan partition set.
    2-b) Serve as a grace period to reuse the orphan partitions. During this timeout period, if the broker receives any request to reassign the orphan partition to itself, the broker can then remove the partition from its initial set of orphan partitions. 
  3. Deletion phase.
    The broker removes orphan partitions (including partition folders) whose log segments are all older than the broker default retention period.  Broker will not distinguish between the log compacted topic and time-retention topics for those partitions in orphan partition set. The default retention period of a broker is used for all orphan partitions. Broker only removes orphan partitions whose log segments are all older than the default retention period. This is to ensure broker will not try to delete new data. If some orphan partitions cannot be removed immediately because the retention period has not been reached, a new deletion will be scheduled at a future time (defined by "auto.orphan.partition.removal.delay.ms") to perform deletion.again to perform deletion.

b) Adding metrics to keep track of the number of orphan partitions and the size of these orphan partitions as described in the public Interface section. 

Compatibility, Deprecation, and Migration Plan

  • There is no compatibility issue.  By default, this feature is disabled.   

Rejected Alternatives

1 ) Manual deletion of orphan partitions via provided API. Kafka provides a new API for orphan partition deletion and require manual trigger of partition deletion. Kafka   Kafka provides an API that the user can specify what topic partitions that he wants to delete and what is the time retention rule. Kafka only remove a partition if all the following conditions are met:

...