Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.

Status

Current state: Under Discussion

Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]

JIRA: If the idea is approved then I will create the Jirahere [Change the link from KAFKA-1 to your own ticket]


Motivation

The main motivation is to have a clear metric (in spite of the OS) to see when the produce requests become "async" .In a normal situation the produce requests will be written to disk via teh lib->syscall->etc.., as we know this will end up in a memory page (dirty page from now on)

...

Public Interfaces

  • Monitoring

Proposed Changes

Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.

...

Code Block
object SegmentAppendStats {
  private val metricsGroup = new KafkaMetricsGroup(SegmentAppendStats.getClass)
  val SegmentAppendTimer: Timer = metricsGroup.newTimer("SegmentAppendRateAndTimeMs", TimeUnit.MILLISECONDS, TimeUnit.SECONDS)
}



Compatibility, Deprecation, and Migration Plan

  • I need confirmation if tracking this metric could have a performance impact (Thanks in advance)

Test Plan

If the KIP is accepted I can easily test the scenario producing records, checking the new metric before and after (sync vs async) writes

I can play using the dirty_ratio and background_dirty_radio values.


Rejected Alternatives

The best alternative IMHO would be to get the information before "the disaster happens" so at OS level we can check the nr_dirty and the  nr_dirty_threshold

nr_dirty is the amount of current dirty pages and nr_dirty_threshold is the limit when the OS will block the writes in the pages until some are flushed.

Having this relation could give us a hint when we are getting closer to the limit and add more resources or tune the OS settings.

This is possible as an "in house" metric but not for Kafka as it runs in the JVM and only god know in which OS (smile) If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.