Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Introduction

This is a draft document. We hope to turn it into a useful guide to implementing a Kafka client, so any feedback on how to make it more useful would be much appreciated. The 0.8 release has not yet happened so there may yet be changes in the protocol described, though we hope to minimize them. We feel this is stable enough that people can build against it without major disruption.

Introduction

This document covers the protocol implemented in Kafka 0.8 and beyond. It is meant to give a readable guide to the protocol that covers the available requests, their binary format, and the proper way to make use of them to implement a client. This document assumes you understand the basic design and terminology described here.

The protocol used in 0.7 and earlier is similar to this, but we chose to make a one document covers the protocol implemented in Kafka 0.8 and beyond. It is meant to give a readable guide to the protocol that covers the available requests, their binary format, and the proper way to make use of them to implement a client. This document assumes you understand the basic design and terminology described here.The protocol used in 0.7 and earlier is similar to this, but we chose to make a one time (we hope) break in compatibility to be able to clean up cruft and generalize things.

Overview

The Kafka protocol is fairly simple, there are only four client requests APIs.

...

Each of these will be described in detail below.

Preliminaries

Network

Kafka uses a binary protocol over TCP. The protocol defines all apis as request response message pairs. All messages are size delimited and are made up of the following primitive types.

...

Field

Description

MessageSize

The MessageSize field gives the size of the subsequent request or response message in bytes. The client can read requests by first reading this 4 byte size as an integer N, and then reading and parsing the subsequent N bytes of the request.

...

Requests

...

Requests all have the following formatA request looks like this:

Code Block
RequestMessage => ApiKey ApiVersion CorrelationId ClientId RequestMessage
  ApiKey => int16
  ApiVersion => int16
  CorrelationId => int32
  ClientId => string
  RequestMessage => MetadataRequest | ProduceRequest | FetchRequest | OffsetRequest

Field

Description

ApiKey

This is a numeric id for the API being invoked (i.e. is it a metadata request, a produce request, a fetch request, etc).

ApiVersion

This is a numeric version number for this api. We version each API and this version number allows the server to properly interpret the request as the protocol evolves. Responses will always be in the format corresponding to the request version. Currently the supported version for all APIs is 0.

CorrelationId

CorrelationId

This is a user-supplied integer. It will be passed back in the response by the server, unmodified. It is useful for matching request and response between the client and server.

ClientId

This is a user supplied identifier for the client application. The user can use any identifier they like and it will be used when logging errors, monitoring aggregates, etc.

The various request and response messages will be described below.And the response:

...

Responses

...

Code Block
Response => ResponseVersion CorrelationId ResponseMessage
CorrelationId => int32
ResponseMessage => MetadataResponse | ProduceResponse | FetchResponse | OffsetResponse

...

Code Block
MetadataRequest => [TopicName]
  TopicName => string
Metadata Response

Field

Description

TopicName

The topics to produce metadata for. If empty the request will yield metadata for all topics.

Metadata Response

The response contains metadata for each partition, with partitions grouped together by topic. This metadata refers to brokers by their broker id. The brokers each have a host and port.

Code Block

MetadataResponse => [TopicMetadata][Broker]
  TopicMetadata => 
Code Block

MetadataResponse => [TopicMetadata][Brokers]
  TopicMetadata => TopicErrorCode TopicName [PartitionMetadata]
  PartitionMetadata => PartitionErrorCode PartitionId LeaderExists Leader [Replica]Replicas [Isr]
  PartitionErrorCode => int16
  PartitionId => int32
  LeaderExists => int8
  Leader => int32
  Replicas => [int32]
  Isr => [int32]
  Broker => NodeId Host Port
  NodeId => int32
  Host => string
  Port => int32

Field

Description

Leader

The node id for the kafka broker currently acting as leader for this partition. If no leader exists because we are in the middle of a leader election this id will be -1.

Replicas

The set of alive nodes that currently acts as slaves for the leader for this partition.

Isr

The set subset of the replicas that are "caught up" to the leader

Broker

The node id, hostname, and port information for a kafka broker

Produce API

The produce API is used to send message sets to the server. For efficiency it allows sending message sets intended for many topic partitions in a single request.

...