Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The first step to addressing the above changes is to make the fetching of metadata asynchronous within the producer. This directly fixes (3), and opens the path for resolving (1) by enabling the metadata requests to be batched together. Since the producer's interface is asynchronous and it inherently batches the sending of records to partitions, subjecting the metadata fetching to a subset of the batching delay doesn't change the interaction or expectations of the client. This change alone should be good enough to bring performance back to acceptable, pending verification.

Specific modifications would be to make KafkaProducer#waitOnMetadata to be asynchronous when it must block. A client queue of records for uncached topics will be maintained to ensure proper ordering of submission and callback invocation, where the records would flow back into the current execution logic when metadata is resolved. Proper care must be taken to handle batch sizing and the efforts to maintain the linger timeout.

To address (2), the producer maintains a staleness duration threshold for every topic, but it does not act upon this for metadata fetching, instead falls back to fetching information about all topics in the cluster. Further optimization could be done to only request metadata updates for topics whose staleness thresholds have been exceeded. A soft threshold could also be added such that best-effort fetching could be performed on a subset of the topics, so that metadata updates are staggered over time and performed in smaller batches.

...

Rejected Alternatives

None with serious consideration.