Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

NOTE: This part is drafted based on the assumption that KIP-31 and KIP-32 will be implemented in one patch.

The proposed protocol change is not backward compatible. The migration plan are as below:

Phase 1 (MessageAndOffset V0 on disk):

  1. Set message.format.version=0 on brokers. (Broker will write MessageAndOffset V0 to disk)
  2. Create internal ApiVersion 0.9.0-1** which uses ProducerRequest V2 and FetchRequest V2.
  3. Configure the broker to use ApiVersion 0.9.0 (ProduceRequest V1 and FetchRequest V1).
  4. Do a rolling upgrade of the brokers to let the broker pick up the new code supporting ApiVersion 0.9.0-1.
  5. Bump up ApiVersion of broker to 0.9.0-1
  6. Do a rolling bounce of the brokers to let the broker use FetchRequest V2 for replication.
  7. Bump up ProducerRequest and FetchRequest version to V2, which supports both MessageAndOffset V0 and V1.
  8. Upgraded brokers support both ProducerRequest V2 and FetchRequest V2 which uses magic byte 1 for MessageAndOffset.
    1. When broker sees a producer request V1 (MessageAndOffset = V0), it will decompress the message, assign offsets using absolute offsets and NOT re-compress the message.
    2. When broker sees a producer request V2 (MessageAndOffset = V1), it will decompress the message, assign offsets using absolute offsets and do re-compression.  i.e. down-convert the message format to MessageAndOffset V0
  9. Bump up ProducerRequest and FetchRequest version to V2, which uses MessageAndOffset V1.
     
  10. Upgrade broker to support both ProducerRequest and FetchRequest V2 which uses magic byte 1 for MessageAndOffset.
    1. When broker sees a producer request V1 (MessageAndOffset = V0), it will decompress the message, assign offsets using relative offsets and re-compress the message, set CreateTime=-1. i.e. upconvert the message format to mag.
    2. When broker sees a producer request V2 (MessageAndOffset = V1), it will decompress the message for verification, assign the offset to outer message and NOT do recompression. 
    3. When broker sees a fetch request V1 (MessageAndOffset = V0), because the data format on disk is MessageAndOffset V1, it will no use the zero-copy transfer, but read the message to memory, do down-conversion, then send fetch response V1.
    4. When broker sees a fetch request V2 V1 (Supporting MessageAndOffset = V1V0), because the data format on disk is MessageAndOffset V0, it will use the zero-copy transfer to reply with fetch response V2.
  11. Upgrade consumer to support both V0 and V1.
  12. Upgrade producer to send MessageAndOffset V1.

For producer, there will be no impact.

For consumers using MessageAndOffset V0, there will be some performance penalty because there is no zero-copy transfer.

    1. V1 with MessageAndOffset V0.
    2. When broker sees a fetch request V2 (Supporting MessageAndOffset = V0, V1), because the data format on disk is MessageAndOffset V0, it will use zero-copy transfer to reply with fetch response V2 with MessageAndOffset V0.
  1. Upgrade consumer to send FetchRequest V2.
  2. Upgrade producer to send ProducerRequest V2.

Phase 2 (MessageAndOffset V1 on disk):

  1. After most of the consumers are upgraded, Bump up message.format.version=1 and rolling bounce the brokers.
  2. Upgraded brokers do the followings:
    1. When broker sees a producer request V1 (MessageAndOffset = V0), it will decompress the message, assign offsets using relative offsets and re-compress the message. i.e. up-convert the message format to MessageAndOffset V1.
    2. When broker sees a producer request V2 (MessageAndOffset = V1), it will decompress the message, assign offsets using relative offsets and NOT do re-compression.
    3. When broker sees a fetch request V1 (Supporting MessageAndOffset = V0), because the data format on disk is MessageAndOffset V1, it will NOT use the zero-copy transfer. Instead the broker will read the message from disk, down-convert them to V0 and reply using fetch response V1 with MessageAndOffset V0.
    4. When broker sees a fetch request V2 (Supporting MessageAndOffset = V0, V1), because the data format on disk is MessageAndOffset V1, it will use zero-copy transfer to reply with fetch response V2 with MessageAndOffset V1.

For producer, there will be no impact.

In phase 1, there will be no impact for consumers.

In phase 2, there will be some performance penalty for consumers that only supports MessageAndOffset V0, because there is no zero-copy transfer.

At the beginning of phase 2, there will be some time the log segment contains both MessageAndOffset V0 and V1. The broker will always do down conversion for FetchRequest V1 and zero-copy transfer for FetchRequest V2.

** We introduce internal ApiVersion here to help the user who are running on trunk to upgrade in the future. Otherwise the interim ApiVersion between two official releases will require users to downgrade ApiVersion then upgrade.

To canary a broker

After phase 1, it is possible for user to canary a broker in phase 2 and roll back if something goes wrong. The procedure is:

  1. Set message.format.version=1 on one of the brokers (broker B).
  2. Broker B will start to act like what described in phase 2.
    1. It will sends FetchRequest V2 to other brokers for replication.
    2. It will only see ProduceRequest/FetchRequest V1 from other brokers and clietns.
  3. If something goes wrong, we can do the following to rollback:
    1. shutdown broker B
    2. nuke the data of the topics it was serving as leader before shutdown
    3. set message.format.version=0
    4. restart the broker to let the broker replicate from leaders. At this point the data on disk will be in MessageAndOffset V0.

In step 2, it is recommended to put only small amount of leaders on the broker, because at that point the broker needs to do down conversion for all the fetch requests.

 During step 2 and step 3, the majority of the consumers may be still using consumers using MessageAndOffset V0, broker could consume more memory.

 

Rejected Alternatives

Option 2 - Adding only LogAppendTime to the message

...