Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This design doc provides a solution to increase partition number of the input streams of a stateful Samza job while still ensuring the correctness of Samze job output. The solution should work for all input systems that satisfy the operation requirement described below, i.e. hash algorithm should be implemented in the form of hash(key) % partitionNum and partitionNum should be multiplied by power of two when Kafka is used we increase it. In the rest of the documentation we will assume Kafka as the input system. We expect this solution to work similarly with other input system as well. The motivation of increasing partition number of Kafka topic is 1) increase performance of Kafka broker and 2) increase throughput of Kafka consumer in the Samza container.

...