You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.

Status

Current state: "Under Discussion"

Discussion thread: here

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

The current `Consumer#poll(Duration)` method is designed to block until data is available or the provided poll timeout expires. This implies, that if fetch requests fail the consumer retries them internally and eventually returns an empty set of records. – Thus, from a user point of view, returning an empty set of records can mean that no data is available at broker side or that the broker cannot be reached.

Besides, we sometimes wants to "peek" the records incoming, to do some testing, without affecting the offsets, like the "peek" method provided in many data structures. So, in this "peek" method, we won't increase the position offset in the partition. That means, under the `enable.auto.commit = true` (default setting), the committed offsets won't be incremented, and in the next "poll", the returned data will still include the records returned by `peek`. (of course if user manually commit the offsets, the offsets will be incremented)


So, we should have a `consumer#peek()` to allow consumers to:

  1. peek what records existed at broker side and no increasing the position offsets.
  2. test if there is connection error existed between consumer and broker

Public Interfaces

Add a `peek` method into `Consumer` interface

1
2
3
4
5

/**
* @see KafkaConsumer#poll(long)
*/
ConsumerRecords<K, V> peek();


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

/**
* Peek data for the topics or partitions specified using one of the subscribe/assign APIs.
* It is an error to not have subscribed to any topics or partitions before polling for data.
*
* <p>
* On each peek, consumer will try to use the last consumed offset as the starting offset and fetch sequentially. The last
* consumed offset can be manually set through {@link #seek(TopicPartition, long)} or automatically set as the last committed
* offset for the subscribed list of partitions
*
* <p>
* This method returns immediately if there are records available or exception thrown.
* Otherwise, it will await the passed timeout.
* If the timeout expires, and there is no other exceptions, an empty record set will be returned.
* Note that this method may block beyond the
timeout in order to execute custom
* {
@link ConsumerRebalanceListener} callbacks.
*
*
* @param timeout The maximum time to block (must not be greater than {@link Long#MAX_VALUE} milliseconds)
*
* @return map of topic to records since the last fetch for the subscribed list of topics and partitions
*
* @throws java.io.IOException if unexpected error during I/O <-- different from #poll
* @throws org.apache.kafka.clients.consumer.InvalidOffsetException if the offset for a partition or set of
* partitions is undefined or out of range and no offset reset policy has been configured
* @throws org.apache.kafka.common.errors.WakeupException if {@link #wakeup()} is called before or while this
* function is called
* @throws org.apache.kafka.common.errors.InterruptException if the calling thread is interrupted before or while
* this function is called
* @throws org.apache.kafka.common.errors.AuthenticationException if authentication fails. See the exception for more details
* @throws org.apache.kafka.common.errors.AuthorizationException if caller lacks Read access to any of the subscribed
* topics or to the configured groupId. See the exception for more details
* @throws org.apache.kafka.common.KafkaException for any other unrecoverable errors (e.g. invalid groupId or
* session timeout, errors deserializing key/value pairs, your rebalance callback thrown exceptions,
* or any new error cases in future versions)
* @throws java.lang.IllegalArgumentException if the timeout value is negative
* @throws java.lang.IllegalStateException if the consumer is not subscribed to any topics or manually assigned any
* partitions to consume from
* @throws java.lang.ArithmeticException if the timeout is greater than {@link Long#MAX_VALUE} milliseconds.
* @throws org.apache.kafka.common.errors.InvalidTopicException if the current subscription contains any invalid
* topic (per {@link org.apache.kafka.common.internals.Topic#validate(String)})
* @throws org.apache.kafka.common.errors.UnsupportedVersionException if the consumer attempts to fetch stable offsets
* when the broker doesn't support this feature
* @throws org.apache.kafka.common.errors.FencedInstanceIdException if this consumer instance gets fenced by broker.
*/
@Override
public ConsumerRecords<K, V> peek(final Duration timeout) { }


Proposed Changes

Provided a new method `peek(timeout)` in Consumer to allow user to:

  1. peek what records existed at broker side and no increasing the position offsets.
  2. test if there is connection error existed between consumer and broker

Compatibility, Deprecation, and Migration Plan

This is a new added method in Consumer interface. There will be no impact to the existing users.

Rejected Alternatives


  • No labels