Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In order to address the "consistent versioning" problem mentioned above, an observer needs to be able to tell when it has reached an offset such that the materialized snapshot at that offset is guaranteed to be consistent among all replicas of the log. The challenge is that an observer which is fetching the log from the leader does not know which portion of the log has already been cleaned. For example, using the diagram above, if we attempt to materialize the state after only reading up to offset 5, then our snapshot will not contain keys k3 and k4 even though they may have been present in the original log at offsets.

(TODO: Better diagram for compaction above which shows both before and after a round of cleaning. There is one in http://cloudurable.com/blog/kafka-architecture-log-compaction/index.html, not sure we need to reference it in the KIP)

Our solution to address this problem is simple. We require the leader to indicate its current dirty offset in each FetchQuorumRecords response. An observer will know if its current snapshot represents a consistent version if and only if its local log end offset after appending the records from the response is greater than or equal to the dirty offset received from the leader.

...

The goal for Raft quorum is to replace Zookeeper dependency and reach higher performance for metadata operations. In the first version, we will be building necessary metrics to monitor the end-to-end latency from admin request (AlterQuorum) and client request being accepted to being committed. We shall monitor the time spent on local, primarily the time to fsync the new records and time to apply changes to the state machine, which may not be really a trivial operation. Besides we shall also monitor the time needed used to propagate change on the remote, which is the time to detect the advance of I.E. latency to advance the high watermark. Benchmarks will also be built to compare the efficiency for a 3-node broker cluster with using Zookeeper vs Raft, under heavy load of metadata changes. We will build our own benchmark as well as shall also be exploring existing distributed consensus system load frameworks at the same time, but this may beyond the scope of KIP-595

Rejected Alternatives

Use an existing Raft library: Log replication is at the core of Kafka and the project should own it. Dependence on a third-party system or library would defeat one of the central motivations for KIP-500. There would be no easy way to evolve a third-party component according to the specific needs of Kafka. For example, we may eventually use this protocol for partition-level replication, but it would make compatibility much more difficult if we cannot continue to control the log layer. So if we must control both the log layer and the RPC protocol, then the benefit of a third-party library is marginal and the cost is an unnecessary constraint on future evolution. Furthermore, Raft libraries typically bring in their own RPC mechanism, serialization formats, have their own monitoring, logging, etc. All of this requires additional configuration the user needs to understand.

...