Table of Contents

Status

Current state: Under DiscussionWithdrawn, because the committers do not seem to be convinced that you cannot control on what thread code runs with an asyn runtime.

Discussion thread: here discussion thread, though the discussion was mostly on the vote thread

JIRA: KAFKA-14972

Proposed implementation: pull request 13914

...

Examples of affected async runtimes are Kotlin co-routines (see KAFKA-7143) and Zio.

Here follows a condensed example of how we'd like to use ZIO in the rebalance listener callback from the zio-kafka library.

Code Block

language	scala
firstline	1
title	onRevoked callback
linenumbers	true

def onRevoked(revokedTopicPartitions: Set[TopicPartition], consumer: KafkaConsumer) = {
  for {
    _            <- ZIO.logDebug(s"${revokedTps.size} partitions are revoked")
    state        <- currentStateRef.get
    streamsToEnd = state.assignedStreams.filter(control => revokedTps.contains(control.tp)) // Note, we run 1 stream per partition.
    _            <- ZIO.foreachDiscard(streamsToEnd)(_.end(consumer))    // <== Streams will commit not yet committed offsets
    _            <- awaitCommitsCompleted(consumer).timeout(15.seconds)
    _            <- ZIO.logTrace("onRevoked done")
  } yield ()
}

This code is run using the ZIO-runtime as follows from the {{ConsumerRebalanceListener::onPartitionsRevoked}} method:

Code Block

language	scala
title	Running ZIO code from callback
linenumbers	true

def onPartitionsRevoked(partitions: java.util.Collection[TopicPartition]): Unit = {
  Unsafe.unsafe { implicit u =>
    runtime.unsafe
           .run(onRevoked(partitions.asScala.toSet, consumer))
           .getOrThrowFiberFailure()
   ()
  }
}

(Note that this code is complex on purpose, starting a ZIO workflow from scratch is not something you would normally do.)

Look at line 6 of the first code block. In method end the stream will try to call consumer::commitAsync(offsets, callback). In awaitCommitsCompleted() we call consumer::commitSync(Collections.emptyMap) to wait untill all callbacks are invoked.

Since this code is running in the rebalance listener callback, KafkaConsumer enforces that the commit methods must be invoked from the same thread as the thread that invoked onPartitionsRevoked. Unfortunately, the ZIO runtime is inherently multi-threaded; tasks can be executed from any thread. There is no way Zio could support this limitation without a major rewrite.

Why can this code not run on a single thread?

We want to use the ZIO runtime. ZIO cannot support this (same argument applies to Cats-effects, a similar and also popular Scala library). To understand why, you first need to know how these libraries work.

In both libraries one creates effects (aka workflows) which are descriptions of a computation. For example, when executing the Scala code val effect = ZIO.attempt(println("Hello world!")) one creates only a description; it does not print anything yet. The language to describe these effects is very rich, enough to describe entire applications. Things like concurrency, resource management, timeouts, retries, etc. can all be expressed in an effect. Then to execute the effect, one gives it to the runtime. The runtime then schedules the work on one of the threads in its thread-pool. Zio, nor Cats-effects supports running an effect on the thread that manages the thread-pool. Nor is it possible to do so; for example, how would one implement a timeout?

Another reason can be read in

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	KAFKA-7143

which talks about Kotlin coroutines. For more information about those: https://kotlinlang.org/docs/coroutines-overview.html

Public Interfaces

Two new methods will be added to org.apache.kafka.clients.consumer.KafkaConsumer: getThreadAccessKey and setThreadAccessKey.

...

When one of the described checks in acquire or release fail, we throw a ConcurrentModificationException similar to current behavior of acquire and release.

Thread safety

Methods acquire and release need to make sure that memory writes from all threads involved are visible for each other.

The proposed implementation accomplishes this by using a synchronized block on a shared variable. This is sufficient as can be read in the JSR-133 FAQ:

But there is more to synchronization than mutual exclusion. Synchronization ensures that memory writes by a thread before or during a synchronized block are made visible in a predictable manner to other threads which synchronize on the same monitor. After we exit a synchronized block, we release the monitor, which has the effect of flushing the cache to main memory, so that writes made by this thread can be visible to other threads. Before we can enter a synchronized block, we acquire the monitor, which has the effect of invalidating the local processor cache so that variables will be reloaded from main memory. We will then be able to see all of the writes made visible by the previous release.

For reference, here follows a copy of the proposed implementation of acquire and release.

Code Block

language	java
firstline	597
title	Class members
linenumbers	true

    // Holds the key that this thread needs to access the consumer, it is used to prevent multi-threaded access.
    private final ThreadLocal<ThreadAccessKey> threadAccessKeyHolder = new ThreadLocal<>();
    // The stack of allowed thread access keys. The top of the stack contains the access key of the thread that is
    // currently allowed to use the consumer. When the stack is empty, any thread is allowed. Access is synchronized on
    // the instance.
    private final Deque<ThreadAccessKey> threadAccessStack = new ArrayDeque<>(4);

Code Block

language	java
firstline	2582
title	acquire
linenumbers	true

    private void acquire() {
        final ThreadAccessKey threadAccessKey = threadAccessKeyHolder.get();
        final ThreadAccessKey nextKey = new ThreadAccessKey();

        synchronized (threadAccessStack) {
            // Access is granted when threadAccess is empty (consumer is currently not used), or
            // when the top value is the same as current key (consumer is used from callback)
            if (threadAccessStack.isEmpty() || threadAccessStack.getFirst() == threadAccessKey) {
                threadAccessKeyHolder.set(nextKey);
                threadAccessStack.addFirst(nextKey);
            } else {
                final Thread thread = Thread.currentThread();
                throw new ConcurrentModificationException("KafkaConsumer is not safe for multi-threaded access. " +
                        "currentThread(name: " + thread.getName() + ", id: " + thread.getId() + ")" +
                        " could not provide access key (" + threadAccessStack.getFirst() + ")"
                );
            }
        }
    }

Code Block

language	java
firstline	2605
title	release
linenumbers	true

    private void release() {
        final ThreadAccessKey threadAccessKey = threadAccessKeyHolder.get();

        synchronized (threadAccessStack) {
            if (threadAccessStack.isEmpty()) {
                throw new AssertionError("KafkaConsumer invariant violated: `release` invoked without `acquire`");
            } else if (threadAccessStack.getFirst() == threadAccessKey) {
                threadAccessStack.removeFirst();
                if (threadAccessStack.isEmpty()) {
                    threadAccessKeyHolder.set(null);
                } else {
                    threadAccessKeyHolder.set(threadAccessStack.getFirst());
                }
            } else {
                final Thread thread = Thread.currentThread();
                throw new ConcurrentModificationException("KafkaConsumer is not safe for multi-threaded access. " +
                        "currentThread(name: " + thread.getName() + ", id: " + thread.getId() + ")" +
                        " returned from callback but not provide access key (" + threadAccessStack.getFirst() + ")"
                );
            }
        }
    }

Performance impact of the synchronized block is minimal because there will be no contention. Contention can only be caused by a badly written client and always results in a ConcurrentModificationException.

Compatibility, Deprecation, and Migration Plan

...

Space shortcuts

Child pages

Versions Compared

Old Version 32

New Version Current

Key

Status

Why can this code not run on a single thread?

Public Interfaces

Thread safety

Compatibility, Deprecation, and Migration Plan

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 32

New Version Current

Key

Status

Why can this code not run on a single thread?

Public Interfaces

Thread safety

Compatibility, Deprecation, and Migration Plan