Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

JobModel is the data model that logically represents a samza job. The JobModel hierarchy is that samza jobs have one to many containers, and containers have each container has one to many tasks. Each data model contains relevant information, such as logical id, partition information, etc. In the standalone deployment model, the JobModel is stored in zookeeper. Coordinator stream(kafka topic) is used to store the JobModel in the yarn deployment model.

Existing implementation of host affinity in yarn is accomplished through the following two phases:
A. ApplicationMaster(JobCoordinator) in yarn deployment model generates the Job model(optimal processor to task assignment) and saves persists the JobModel in coordinator stream(kafka topic) of associated with the samza job.
B. ContainerAllocator phase(which happens after JobModel generation) requests physical host(resources) from the cluster manager to facilitate execution of processors in JobModelschedules each stream processor to run on a physical host by coordinating with the underlying ClusterManager and orchestrates the execution of the stream processors.

Zookeeper is used in standalone for coordination amongst between the processors of a stream application. Amongst all the available processors of a stream application, a single processor will be chosen elected as a leader. The leader will generate the JobModel and propagate the JobModel to all the other processors in the group. Distributed barrier in zookeeper will be used to block the message processing until the recent latest JobModel is picked by all the processors in the group.

...