Status

Current state: Under Discussion

Discussion thread: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

This KIP tries to address the following issues in Kafka.

In most message systems (JMS, QPID etc), streaming systems and most transport systems(HTTP, TCP), it is typical to have a concept of headers and payload.

The payload is traditionally for the business object, and headers are traditionally used for transport routing, filtering etc. Headers are most typically key=value pairs of strings.

In its current state Kafka does not support the ability to have headers natively in its message/record format.

Examples where having separate supported custom headers becomes useful (this is not an exhaustive list).

Automated routing of messages based on header information between clusters
Enterprise APM tools (e.g. Appdynamics, Dynatrace) need to stitch in 'magic' transaction ids for them to provide end to end transaction flow monitoring.
Audit metadata to be recorded with the message, e.g. clientId that produced the record, unique message id, originating clusterId the message was first produced into for multi cluster routing.
Business payload needs to be end to end encrypted and signed without tamper, but eco-system components need access to metadata to achieve tasks.

Kafka currently has Record<K, V> structure which originally could be used to follow this semantic where by K could contain the headers information, and the V could be the payload.

Since message compaction feature it is no longer possible to add metadata to K, else compaction would treat each message as a different keyed message .
It is not currently possible to use value part and use some form of a wrapper e.g. Message<H, V>, as for compaction to perform a delete a record is sent with a NULL value, as such for where a delete record is sent using a message wrapper to carry the metadata would not work, as the value technically would no longer be null.

This issue has been flagged by many people over the past period in forums.

Further details and a more detailed case for headers can be seen here : A Case for Kafka Headers

Public Interfaces

This KIP has the following public interface changes:

Add a new headers length and value (byte[]) to the core message format.
Add a headers (Map<int, byte[]>) field to ProducerRecord and ConsumerRecord. A producer/interceptors will be able to set headers on a ProducerRecord. A consumer/interceptors will see the message headers when it sees the messages.
Add accessor methods on the Producer/ConsumerRecord void setHeader(int, byte[]) and a byte[] getHeader(int)
Add ProduceRequest/ProduceResponse V3 which uses the new message format.
Add FetchRequest/FetchResponse V3 which uses the new message format.
The serialization of the [int, bye[]] header set will on the wire using a strict format
Each headers value will be custom serializable by the interceptors/plugins that use the header.
As int key based headers for compactness ranges will be reserved for different usages:
1. client headers adopted by the kafka open source (not necessary the kafka open source owns it)
2. broker headers adopted by the kafka open source (not necessary the kafka open source owns it)
3. commercial vendor space
4. custom inhouse

For more detail information of the above changes, please refer to the Proposed Changes section.

Proposed Changes

Describe the new thing you want to do in appropriate detail. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences. Use judgement based on the scope of the change.

There are four options proposed before this proposal. This details our proposed solution of Option 1 described here. The other options Option 2, Option 3 and Option 4 are in the Rejected Alternatives section.

The advantages of this proposal are:

Adds the ability for producers to set standard header key=value value pairs
No incompatible client api change (only new methods)
Allows users to specify the serialization of the header value per header
Compact key space
Provides a standardized interface to eco systems of tools can grow around the feature

The disadvantage of this proposal is:

Change to the message object

Add a headers field Map<int, byte[]> to both ProducerRecord and ConsumerRecord

Accessor methods of void setHeader(int, byte[]) and byte[] getHeader(int) added to interface of both.

Wire protocol change - use attribute bit4 as flag for if headers present. and add (optional) header size (variable length int) and field to the message format

The below is for the core Message wire protocol change needed to fit headers into the message.

A key to this is to ensure the headers cause as little overhead or none if not present as possible.

The attributes flag bit is used to keep the message size the same as before if no headers are used
HeadersLength is a variable length encoded int saving bytes where headers are small in size/number

MessageAndOffset => Offset MessageSize Message
  Offset => int64  
  MessageSize => int32
  
  Message => Crc MagicByte Attributes Timestamp KeyLength Key HeadersLength Headers ValueLength Value
    Crc => int32
    MagicByte => int8  <---------------------- Bump up magic byte to 2
    Attributes => int8 <---------------------- Use Bit 4 as boolean flag for if headers present
    Timestamp => int64
    KeyLength => int32
    Key => bytes
    (optional) HeadersLength => variable int32 <------------------ NEW [optional] size of the byte[] of the serialized headers if headers
    (optional) Headers => bytes <--------------------------------- NEW [optional] serialized form of the headers Map<int, byte[]>
    ValueLength => int32
    Value => bytes

Wire protocol of the headers bytes (if present not above mentioned attributes bit flag)

Headers (bytes) => Count Set(Key, ValueLength, Value)
  Count => variable length encoded int32  
  Set =>
	Key => variable length encoded int32
    ValueLength => variable length encoded int32
    Value => bytes

Key Allocation

As mentioned above ranges of keys will be reserved for different usages, we use variable encode int keys to reduce the key size overhead.

Whilst the open space of headers may bring to fruition many possible keys, the likely hood of a cluster/broker using 100's is unlikely so we should assign/reserve key space for the most used areas.

With this where ints are in the below ranges we get the benefits of less bytes for an int than standard 4 byte allocation and best benefit.

3 bytes :: -129 ->

2 bytes :: -33 -> -128

1 bytes :: 0 -> -32

1 bytes :: 0 -> 127

2 bytes :: 128 -> 255

3 bytes :: 256 ->

As such we propose that:

+ve ints 0->255 are reserved for the apache kafka open registered headers as these are more likely to be more commonly used as such saves space

-ve ints -1 -> -128 are reserved for in-house registered headers as these are the next most likely to be heavily used

-ve ints -129 and below can be used as a scratch space either where more in-house header space is required or devleopment

+ve ints 256 and above can be used for other vendor released or as a spill over for open source headers.

Sample register that would end up having in the open source space.

key	name	description	by	url
1	client.id	producers client id	Apache Kafka	some url to a document about it
2	cluster.id	cluster id of where the message first originated	Apache Kafka	some url to a document about it
3	correlation.id	correlation id for where a message is for mutex response from a request	Apache Kafka	some url to a document about it

260	new.relic	stores the transaction linking guid for transaction sticking by new relic	Appdynamics	some url to a document about it
451	appdynamics	stores the transaction linking guid for transaction stiching by app dynamics	Appdynamics	some url to a document about it

To assit and help ensure ease of use and uniformity a constants class should be kept and updated with the above (similar to how java sql codes work) e.g.

package org.apache.kafka.common.config;

public class KafkaHeaderKeys
{

   public static final int CLIENT_ID_KEY = 1;
   public static final int CLUSTER_ID_KEY = 2;
   public static final int CORRELATION_ID_KEY = 3;
   
}

Sample register that would end up having in-house custom per user/company.

key	name	description	by	notes
-5	app.name	igs unique app name for the producer	IG	some internal document about it
-10	charge.tag	tag to make per message chargebacks for	IG	some internal document about it

Compatibility, Deprecation, and Migration Plan

Current client users should not be affected, this is new api methods being added to client apis
Message version allows clients expecting older message version, would not be impacted
- older producers simply would produce a message without headers
  - new consumer would simply get a message with empty headers
- older consumers simply would consume a message oblivious to there being any headers
Message version migration would be handled as like in KIP-32

Out of Scope

Some additional features/benefits were noticed and discussed on the above but are deemed out of scope and should be tackled by further KIPS.

Core message size reduction
- remove overhead of 4 bytes for KeyLength when no headers using attributes bit
- reduce overhead of 4 bytes for KeyLength by using variable length encoded int
- reduce overhead of 4 bytes for ValueLength by using variable length encoded int
Broker side interceptors
- with headers we could start introducing broker side message interceptors to append meta data or handle messages
Single Record consumer API
- There is many uses cases where single record consumer/listener api is more user friendly - this is evident by the fact spring kafka have already created a wrapper, it would be good to support this natively.

Rejected Alternatives

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.

Map<String, String> Headers added to the ProducerRecord

The concept is similar to the above proposed but with a few more disadvantages.

The advantages of this proposal are:

Adds the ability for producers to set standard header key=value string value pairs
No incompatible client api change (only new methods)
Allows users to specify the serialization of the key=value map (String(&=), JSON, AVRO).
Provides a standardized interface to eco systems of tools can grow around the feature

The disadvantage of this proposal is:

Change to the message object
String key cause a large key, this can cause a message payload thats of 60bytes to be dwarfed by its headers
String value again doesn't allow for compact values, and restricts that a value must be a String
Not able to use headers on the broker side with custom serialization

Value Message Wrapper - Message<H, P>

This concept is the current defacto way many users are having to temporally deal with the situation, but has some core key issues that it does not resolve.

Benefits
- This will cause no broker side changes, and or message format changes
Disadvantages
- This would not work with compaction where headers are needed to be sent on delete record which then would not deliver on many of the requirements.
- Couple Serialization and Deserialization of the value for both the header and payload.

ProducerRecord<K, H, V>, ConsumerRecord<K, H, V>

The proposed change is that headers are Map<String, String> only, this alternative is that headers can be of any type denoted by H

Benefits
- Complete customization of what a header is.
Disadvatages
- As generics don't allow for default type, this would cause breaking client interface compatibility if done on Producer/ConsumerRecord.
  - Possible work-around would be to have HeadersProducer/ConsumerRecord<K, H, V> that then Producer/ConsumerRecord extend where H is object, this though becomes ugly fast if kept for a time or would require a deprecation / refactor v2 period.

Space shortcuts

Child pages

Status

Motivation

Public Interfaces

Proposed Changes

Add a headers field Map<int, byte[]> to both ProducerRecord and ConsumerRecord

Wire protocol change - use attribute bit4 as flag for if headers present. and add (optional) header size (variable length int) and field to the message format

Key Allocation

Sample register that would end up having in the open source space.

Sample register that would end up having in-house custom per user/company.

Compatibility, Deprecation, and Migration Plan

Out of Scope

Rejected Alternatives

Map<String, String> Headers added to the ProducerRecord

Value Message Wrapper - Message<H, P>

ProducerRecord<K, H, V>, ConsumerRecord<K, H, V>

Space shortcuts

Child pages

KIP-82 - Add Record Headers

Status

Motivation

Public Interfaces

Proposed Changes

Add a headers field Map<int, byte[]> to both ProducerRecord and ConsumerRecord

Wire protocol change - use attribute bit4 as flag for if headers present. and add (optional) header size (variable length int) and field to the message format

Key Allocation

Sample register that would end up having in the open source space.

Sample register that would end up having in-house custom per user/company.

Compatibility, Deprecation, and Migration Plan

Out of Scope

Rejected Alternatives

Map<String, String> Headers added to the ProducerRecord

Value Message Wrapper - Message<H, P>

ProducerRecord<K, H, V>, ConsumerRecord<K, H, V>