Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently Kafka Streams uses consumer membership protocol to coordinate the stream task assignment. When we scale up the stream application, KStream group will attempt to revoke active tasks and let the newly spinned up hosts take over them. New hosts need to restore assigned tasks' state before transiting to "running". For state heavy application, it is not ideal to give up the tasks immediately once the new player joins the party, instead we should buffer some time to let the new player accept a fair amount of restoring tasks, and finish state reconstruction first before officially taking over the active tasks. Ideally, we could realize no downtime transition during cluster scaling.

...

Version 1.0

Delivery goal: Stateful Task declarationScale up support, conservative rebalance, new instance only

The goal of first version is to to realize the foundation of learner algorithm by solving the following questions:

  • Which task is qualified to be learned ? Require us to implement the tagging for stateful tasks.
  • Which worker/instance should take learner task? We shall only apply the learner tasks to newly spinned hosts.

A bit explanation of 1.0 goal:

The reason we are hoping to only implement the learner task logic on new hosts is because there is a potential edge case that could break the existing naive learner assignment. When the number of tasks are much smaller than total cluster capacity, we could fall in endless resource shuffling.

We will care more about smooth transition over resource balance for stage one. This is because we do have some historical discussion on marking weight for different types of tasks. If we go ahead to We will care more about smooth transition over resource balance for stage one. This is because we do have some historical discussion on marking weight for different types of tasks. If we go ahead to aim for task balance too early, we are potentially in the position of over-optimization.

So we only gonna assign learner task only to "new comers", which means every stream worker will denote itself as "new member" when they don't have local information. The assignment, however, will be on the task .

...

Delivery goal: Scale down featuresupport

Specifically

  1. Add tag Tag for leaving members. 
  2. Create new tooling for marking instances as ready to scale down
  3. Add scale Scale down timeout implementation

...

We want to have a detailed analysis and benchmark support before fully devoting into this feature, because intuitively most applications should be able to tolerate minor discrepancy of task replaying time, however the cost of extra rebalances and increased debugging complexity are definitely things we are not in favor of. 

...

Compatibility, Deprecation, and Migration Plan

...

Minimum Version

...

Requirement

This change requires Kafka broker version >= 0.9, where broker will react with a rebalance when a normal consumer changes the encoded metadata. Client application needs to update to the latest version in order to utilize this earliest version which includes KIP-429 version 1.0 change.

Switching Protocol Type

...