...
This design doc provides a solution to increase partition number of the input streams of a stateful Samza job while still ensuring the correctness of Samze job output. The solution should work for all input systems that satisfy the operation requirement described below, i.e. hash algorithm should be implemented in the form of hash(key) % partitionNum
and partitionNum
should be multiplied by power of two when Kafka is used we increase it. In the rest of the documentation we will assume Kafka as the input system. We expect this solution to work similarly with other input system as well. The motivation of increasing partition number of Kafka topic is 1) increase performance of Kafka broker and 2) increase throughput of Kafka consumer in the Samza container.
...