Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Add a broker config "auto.orphan.partition.removal.delay.ms" :  the delay after which orphan partitions will start to be removed.    The The timer starts when the broker receives the first leaderAndISR request.  The default value is set to -1, which indicates the orphan partition removal is disabled.


Proposed Changes

a) Provide a mechanism to remove orphan partition automatically. 

The orphan partitions removal works in three phases. 

  1. Initialize phase
    During a broker startup, broker calculates the initial set of orphan partitions based on the partition information from the first leaderandISR request. 
  2.  timeout/correction phase (such as 24 hours of timeout, defined by "auto.orphan.partition.removal.delay.ms")
    Timeout phase serves as two purposes:
    2-a) Update the broker’s knowledge about partitions over time. The firstleaderandISR request the broker receives might be outdated (due to dual controllers, outdated requests, etc.). However, during the timeout phase , the broker will receive more leaderandISR requests and use partitions information from leaderandISR requests to remove partitions that the broker is responsible for from the initial orphan partition set.
    2-b) Serve as a grace period to reuse the orphan partitions. During this timeout period, if the broker receives any request to reassign the orphan partition to itself, the broker can then remove the partition from its initial set of orphan partitions. 
  3. Deletion phase.
    The broker removes orphan partitions (including partition folders) whose log segments are all older than the broker default retention period.  Broker will not distinguish between the log compacted topic and time-retention topics for those partitions in orphan partition set. The default retention period of a broker is used for all orphan partitions. Broker only removes orphan partitions whose log segments are all older than the default retention period. This is to ensure broker will not try to delete new data. If some orphan partitions cannot be removed immediately because the retention period has not been reached, a new deletion will be scheduled at a future time (defined by "auto.orphan.partition.removal.delay.ms") to perform deletion.

b) Adding metrics to keep track of the number of orphan partitions and the size of these orphan partitions. 

Compatibility, Deprecation, and Migration Plan

...