Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When this occurs, we will close the Log for A_p0_id0, and move A_p0_id0 to the deleting directory as described in the LeaderAndIsrRequest description above.

Should we remove topic name from the protocol where possible?

It is unnecessary to include the name of the topic in the following Request/Response calls:

  • StopReplica
  • Fetch
  • ListOffsets
  • OffsetForLeader

Including the topic name in the request may make it easier to debug when issues arise, as it will provide more information than the topic ID alone. However, it will also bloat the protocol (especially relevant for FetchRequest), and if they are incorrectly used it may prevent topic renames from being easily implemented in the future.

For the time being, we may wish to use the latest protocol versions with clients that do not support topic IDs yet. Until the clients have been updated to refer to partitions by topic ID, we should include both topic name and (optional) ID in every request.

Migration

Upon a controller becoming active, the list of current topics is loaded from /brokers/topics/[topic]. When a topic without a topic ID is found, one will be assigned, and the payload will be rewritten to /brokers/topics/[topic] with id filled with the schema version bumped to version 3. LeaderAndIsrRequest(s) will only be sent by this controller once a topic ID has been successfully assigned to the topic. This process can take place without an inter-broker protocol bump, as the format stored in /brokers/topics/[topic] will be compatible with older broker versions.

When a broker receives a LeaderAndIsrRequest containing a topic ID for an existing partition without an associated topic ID, it will associate the topic ID with the partition. This will effectively migrate a broker's local replicas to include topic IDs.

Configuration

The following configuration options will be added:

...

Migration

Upon a controller becoming active, the list of current topics is loaded from /brokers/topics/[topic]. When a topic without a topic ID is found, one will be assigned, and the payload will be rewritten to /brokers/topics/[topic] with id filled with the schema version bumped to version 3. LeaderAndIsrRequest(s) will only be sent by this controller once a topic ID has been successfully assigned to the topic. This process can take place without an inter-broker protocol bump, as the format stored in /brokers/topics/[topic] will be compatible with older broker versions.

When a broker receives a LeaderAndIsrRequest containing a topic ID for an existing partition without an associated topic ID, it will associate the topic ID with the partition. This will effectively migrate a broker's local replicas to include topic IDs.

Configuration

The following configuration options will be added:

OptionUnitDefaultDescription
delete.stale.topic.delay.ms ms14400 (4 hours)When a FULL LeaderAndIsrRequest is received and the request does not contain a partition that exists on a broker, a deletion event will be staged for that partition which will complete after delete.stale.topic.delay.ms milliseconds.

Storage

Partition Metadata file

To allow brokers to resolve the topic name under this structure, a metadata file will be created at logdir/partitiondir/partition.metadata.

This metadata file will be human readable, and will include:

  • Metadata schema version (schema_version: int32)
  • Topic ID (id: UUID)
  • Topic name (name: String)
  • Partition (partition: int32)

This file can either be plain text (key/value pairs) or JSON.

org.apache.kafka.common.TopicPartition

At some point it would be useful to modify TopicPartition to include the topic ID. This may be tricky until all APIs support topic IDs.

Compatibility, Deprecation, and Migration Plan

We will need to support all API calls which refer to a partition by either (topicId, partition) or (topicName, partition) until clients are updated to interact with topics by ID. No deprecations are currently planned.

Rejected Alternatives

Sequence ID

As an alternative to a topic UUID, a sequence number (long) could be maintained that is global for the given cluster.

This sequence number could be stored at /topicid/seqid.

Upon topic creation, this sequence number will incremented, and the ID assigned to the created topic. Sequential topic ID generation can use the same approach to broker id generation.

If global uniqueness across clusters is required for topic IDs the first N bits of the ID could consist of a cluster ID prefix, followed by the sequence number. However, to achieve global uniqueness, this would require a large number of bits for the cluster ID prefix.

Use of a UUID has the benefit of being globally unique across clusters without partitioning the ID space by clusterID, and is conceptually simpler.

Topic Deletion

We considered and rejected two other strategies for performing topic deletes.

Best Effort Strategy

Under this stategy, the controller will attempt to send a StopReplicaRequest to all replicas. The controller will give up after a certain number of retries and will complete the delete. Although this will not simplify the topic deletion code, it will prevent delete topic requests from being blocked if one of the replicas is down. This would now be relatively safe, as stale topics will be deleted when a broker receives an initial LeaderAndIsrRequest, however it could prevent space from being reclaimed from a broker that does not respond to a StopReplicaRequest(s) before it is timed out, but is otherwise alive.

Send StopReplicaRequest(s) to online brokers only

In this approach, the controller will send StopReplicaRequests to only the brokers that are online, and will wait for a response from these brokers before marking the delete as successful. This will allow a topic delete to take place while some replicas are offline. If any replicas return to being online, they will receive an initial LeaderAndIsrRequest that will allow them to clear up any stale state. This is similar to the "best effort strategy above".

Removal of Topic Names from Request and Responses

It is unnecessary to include the name of the topic in the following Request/Response calls:

  • StopReplica
  • Fetch
  • ListOffsets
  • OffsetForLeader

Including the topic name in the request may make it easier to debug when issues arise, as it will provide more information than the topic ID alone. However, it will also bloat the protocol (especially relevant for FetchRequest), and if they are incorrectly used it may prevent topic renames from being easily implemented in the future.

For the time being we will wish to use the latest protocol versions with clients that do not support topic IDs yet. Until the clients have been updated to refer to partitions by topic ID, we should include both topic name and (optional) ID in every request

Storage

Partition Metadata file

To allow brokers to resolve the topic name under this structure, a metadata file will be created at logdir/partitiondir/partition.metadata.

This metadata file will be human readable, and will include:

  • Metadata schema version (schema_version: int32)
  • Topic ID (id: UUID)
  • Topic name (name: String)
  • Partition (partition: int32)

This file can either be plain text (key/value pairs) or JSON.

org.apache.kafka.common.TopicPartition

At some point it would be useful to modify TopicPartition to include the topic ID. This may be tricky until all APIs support topic IDs.

Compatibility, Deprecation, and Migration Plan

We will need to support all API calls which refer to a partition by either (topicId, partition) or (topicName, partition) until clients are updated to interact with topics by ID. No deprecations are currently planned.

Rejected Alternatives

Sequence ID

As an alternative to a topic UUID, a sequence number (long) could be maintained that is global for the given cluster.

This sequence number could be stored at /topicid/seqid.

Upon topic creation, this sequence number will incremented, and the ID assigned to the created topic. Sequential topic ID generation can use the same approach to broker id generation.

If global uniqueness across clusters is required for topic IDs the first N bits of the ID could consist of a cluster ID prefix, followed by the sequence number. However, to achieve global uniqueness, this would require a large number of bits for the cluster ID prefix.

Use of a UUID has the benefit of being globally unique across clusters without partitioning the ID space by clusterID, and is conceptually simpler.

Topic Deletion

We considered and rejected two other strategies for performing topic deletes.

Best Effort Strategy

Under this stategy, the controller will attempt to send a StopReplicaRequest to all replicas. The controller will give up after a certain number of retries and will complete the delete. Although this will not simplify the topic deletion code, it will prevent delete topic requests from being blocked if one of the replicas is down. This would now be relatively safe, as stale topics will be deleted when a broker receives an initial LeaderAndIsrRequest, however it could prevent space from being reclaimed from a broker that does not respond to a StopReplicaRequest(s) before it is timed out, but is otherwise alive.

Send StopReplicaRequest(s) to online brokers only

In this approach, the controller will send StopReplicaRequests to only the brokers that are online, and will wait for a response from these brokers before marking the delete as successful. This will allow a topic delete to take place while some replicas are offline. If any replicas return to being online, they will receive an initial LeaderAndIsrRequest that will allow them to clear up any stale state. This is similar to the "best effort strategy above".

Future Work

Requests

The following requests could be improved by presence of topic IDs, but are out of scope for this KIP.

...