...
- The change made in this proposal is both source backward-compatible and binary backward compatible. Their code can compile and run correctly without change.
- For users who want to enable partition expansion for its input streams, they can do the following:
- Set grouper class to GroupByPartitionWithFixedTaskNum if the job is using GroupByPartition as grouper class
- Set grouper class to GroupBySystemStreamPartitionWithFixedTaskNum if the job is using GroupBySystemStreamPartition as grouper class
- Change their custom grouper class implementation to extend the new interface if the job is using a custom grouper class implementation.
- Set job.coordinator.monitor-partition-change to true
in the job configuration
- Run ConfigManager
Rejected Alternatives
1. Allow task number to increase instead of creating a new grouper class.
...
The current proposal relies on the input system to meet the specified operational requirements. While these requirements can be satisfied by Kafka, they may or may not be satisfied by other systems such as Kinesis. We can support partition expansion for more input systems than Kafka if user is able to express the new-partition to old-partition mapping strategy. However, since we currently don't know how user is going to use such config/class to do it, we choose to keep the current SEP simple and only add new config/class when we have specific use-case for them.
4. Use additional repartitioning stage to repartition data from input stream to another internal stream of the old partition count
Future work
1. Enable task number expansion in Samza
2. Add new config and class for user to specify the new-partition to old-partition mapping strategy based on the input system.