Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In Yarn runtime environment, we support a fixed set of processors. Where as, in Zookeeper runtime environment, we support an elastic model where the number of processors can vary during the lifetime of the job (Note: the number of processors within a job is limited by the number of tasks in the Samza job).

Furthermore, the StreamProcessor API requires the user to specify a unique processorId. This adds undue burden on the user to guarantee uniqueness of identifier within the job, when processors can be added / removed at any time. 
Hence, we need a more flexible model where ProcessorId can be generated by Samza framework or the custom configured by the user.

Proposed Changes

  • Converts "processorId" parameter in StreamProcessor to a String
  • Introduces a ProcessorIdentifier interface that can be used the users of StreamProcessor API to generate a unique processorId for each processor 
  • Job Coordinator has the responsibility of generating a unique processorId within the runtime environment. Hence, this needs to be a pluggable component. (For more details, see SAMZA-881

Note: With the introduction of ApplicationRunner in SAMZA-1067, ApplicationRunner will be the Samza user-api, instead of StreamProcessor. 

...