Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current state: [ UNDER DISCUSSION ]

Discussion thread: <link to mailing list DISCUSS thread> https://lists.apache.org/thread/vjtl5fnf64kpkoxc591466y92dlt2bsb

JIRASAMZA-2687

Released: 

Problem

Some Samza jobs hit throughput issues which cannot be alleviated by sizing the job. One such issue often faced is the inability to scale up task count (and hence container count) beyond the number of input partitions. The need for this arises due to the process-time of the job’s logic which is not under Samza’s control. Then the only way to address the job’s low throughput is to manually increase the partition count of the input streams achieving a higher number of tasks. This design doc aims to solve the problem of throughput dependency on input partition count by introducing virtual tasks which are similar to tasks but consume a portion of the input partition.

...