Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here’re the list of important and notable differences in processor and JobModel generation semantics between yarn and standalone deployment model:

  • Number of containers is of processors is a static configuration in yarn deployment model and a job restart is required to change the number of containersof processors. However, an addition/deletion of a processor to a processors group in standalone is quite common and an expected behavior.
  • A stream processor is assigned a physical host by ContainerAllocator after the JobModel generation phase in yarn. Physical host in which a stream processor is going to run is known before the JobModel generation phase in standalone(ContainerAllocator phase is not needed in standalone to associate the processor with the physical host).

Existing host affinity implementation in yarn is accomplished through the following two phases:
A. ApplicationMaster(JobCoordinator) in yarn deployment model generates the Job model(optimal processor to task assignment) and persists the JobModel in coordinator stream(kafka topic) associated with the samza job.
B. ContainerAllocator phase(which happens after JobModel generation) schedules each container processor to run on a physical host by coordinating with the underlying ClusterManager and orchestrates the execution of the containerprocessor.

Zookeeper is used in standalone for coordination between the stream processors of a stream application. Amongst all the available processors of a stream application, a single processor will be elected as a leader in standalone. In the standalone deployment model, the JobModel is stored in zookeeper. The leader will generate the JobModel and propagate the JobModel to all the other processors in the group. Distributed barrier in zookeeper will be used to block the message processing until the latest JobModel is picked by all the processors in the group. 

...