Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are basically two somewhat separate areas that we're talking about within the scope of this KIP.

Client

...

discovery

The first area is discovery on the client side. How do clients of Kafka become aware of broker capabilities?  For example, how does the client find out if the broker supports a new type of event delivery semantics, or a new type of broker storage functionality? Note that here the "client" here may be a producer, a consumer, a command-line tool, a graphical interface, a third-party application built on top of Kafka (for purposes such as administration/monitoring), or may be even the broker itself trying to communicate with other brokers via RPCs.

Today, the client relies on the ApiVersions{Request|Response} to determine at a per-request level which version of request should be used to send to the brokers. This is sufficient for client-side logic that only relies on the specific versions of request, however for other logic that requires client-side behavior to be different as well, such request-level ApiVersions check would not be sufficient. Detailed examples can be found below.

Feature

...

gating

Development in Kafka (from a high level) is organized into features, each of which is tracked by a separate version number. Feature gating tries to answer the question, How does the broker decide what features to enable?  For example, When is it safe for the brokers to begin handling EOS traffic? Should the controller send some additional information when it sends out LeaderAndIsrRequest? Should the broker write its log data in the "new" message format or one of the "old" ones? 

...

The above areas translate into separate problems within the scope of this KIP, as described below.

Client

...

discovery

The ApiVersionsRequest enables the client to ask a broker what APIs it supports. For each API that the broker enumerates in the response, it will also return a version range it supports. This response may be insufficient for cases where a client needs to enable a feature based on whether all brokers in the cluster support it. Consider the following problem. Users could upgrade their client apps before or after upgrading their brokers. If it happens before the broker upgrade, any new features on the client apps that depend on certain broker-side support (from all brokers in the cluster) must stay disabled, until all the brokers have also been upgraded to the required version. The question then becomes, at what point in time can these new features be safely enabled on the client side? Today, this has to be decided manually by a human, and it is error-prone. Clients don’t have a way to programmatically learn what version of a certain feature is guaranteed to be supported by all brokers (i.e. cluster-wide finalized feature versions), and take suitable decisions.

For example, consider the Kafka Streams use case. For KIP-447, Streams would need to switch their clients to use the new thread level producer model only after all brokers support the same. But, switching to the new model at a time when some brokers don’t support the new model, can have consequences that break EOS semantics. Note that, if only some brokers support the new model, then Streams can continue to still use the old per-task producer model, without switching to the new model — this is not a problem.

Feature

...

gating

There is a configuration key in Kafka called inter.broker.protocol (IBP). Currently, the IBP can only be set by changing the static configuration file supplied to the broker during startup. Although the IBP started out as a mechanism to control the version of RPCs that the brokers used for communication with each other, it has grown since then to gate a huge number of other features. For example, the IBP controls whether the broker will enable static consumer group membership, whether it will enable exactly once functionality, and whether it will allow incremental fetch requests.

...

As such, we don’t support downgrades in this versioning scheme/system (see Non-goals and Future work). So, there is a danger if the cluster operator tries to downgrade the group_coordinator feature flag from v1 to v0. This is because the stream client side shall not downgrade at all, and this would crash the EOS applications. A safe practice is to first downgrade the stream clients, before downgrading  the group_coordinator feature flag.

Potential

...

features in Kafka

There could be many features benefitted by the above system, following are a selected few high level examples:

  1. group_coordinator: As described in the above section.

  2. transaction_coordinator: This feature flag could cover changes to the message format for the transaction state internal topic schema, and protocol changes related to transaction coordinator. The advantages are similar to #2.

  3. consumer_offsets_topic_schema: Currently the message format used in the consumer internal topics is tied to IBP. It’s a candidate for a feature, because we will be able to dynamically finalize a new version of the topic schema, without having to roll the brokers twice.

  4. inter_broker_protocol: For transitional purposes, the inter.broker.protocol itself can be made a coarse-grained feature. This could be a way to operationally migrate away avoiding the double roll during IBP bumps.

Compatibility,

...

deprecation and

...

migration plan

Migrating from IBP-based setup to versioning scheme

...

Support for downgrades: As part of future work, we may consider adding constraints to the controller to restrict the downgrade of specific flags. Certain downgrades may be disallowed unless it was forced. For example, group_coordinator feature is tied to data schemas and EOS processing semantics that may not be forward compatible.

Rejected

...

alternatives

  1. We considered the idea of using the existing AlterConfigsRequest instead of introducing a new API to update features. Reasons to decide against using it:
    1. The AlterConfigsRequest format is not well equipped to elegantly express the notion of Set<FeatureUpdate> operations. The request format is more favorable towards expressing a list of key-value pairs, where each new value replaces the existing value for the specified key. Furthermore, the response format doesn't let us conveniently return the list of finalized feature versions to the caller.
    2. AlterConfigsRequest can be issued to any broker, which is not our requirement. For the future, anything that modifies ZooKeeper needs to go through the controller. This is part of the work that we are doing to build the bridge release for KIP-500.
  2. As opposed to fine-granular features, we considered the idea of exposing the notion of IBP directly to the client for discovery. There are a few issues here:

    1. We are introducing a leaky abstraction here. Users/clients (excluding clients that are brokers) have to now operate with the notion of a "broker protocol version" – this is merely a detail that's internal to a cluster of brokers behind an API.
    2. This still doesn't solve the "double roll" problem related to IBP. We would need to still solve this problem.

...