Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Sometimes end user wants to reach a sweet spot between ongoing task transfer and streaming resource free-up. So we want to take a similar approach as KIP-415, where we shall introduce a client config to make sure the scale down is time-bounded. If the time takes to migrate tasks outperforms this config, the leaving member will shut down itself immediately instead of waiting for the final confirmation. And we could simply transfer learner tasks to active because they are now the best shot to own new tasks.

Task Tagging

...

Note that to make sure the above resource shuffling could happen as expected, we need to have the following task status indicators to be provided:

...

scale.down.timeout.ms

Default: infinity

Timeout in milliseconds to force terminate the stream worker when informed to be scaled down.


stream.worker.balancing.factor

Default: 2

The tolerance of task imbalance factor between hosts to trigger rebalance.


stream.rebalancing.mode

Default: incremental

The setting to help ensure no downtime upgrade of online application.

Options : upgrading, incremental

to help user define their customized strategy.

...

Only client side upgrade is required .  just need to change the encoded metadata so that the server could as long as the Kafka broker version >= 0.9, where broker will react with a rebalance based on:

...

when a normal consumer changes the encoded metadata.

Switching Protocol Type

As we have mentioned above, a new protocol type shall be created. To ensure smooth transition, we need to make sure the existing job doesn't fail. So we need to explicitly set the `stream.rebalancing.mode` to `upgrading`, where we

shall apply "consumer" as protocolType for existing stream jobs. Otherwise the job upgrade would fail due to incompatible protocol type.

Stream Instance Replacement

...

The right approach for a global application instance replacement is

  • Increase the capacity of the current stream job to 2
  • Mark existing stream instances as leaving
  • Learner tasks finished on new hosts, shutting down old ones.

FAQ

Why do we call stream workers?

...