Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

JobModel is the data model in samza that logically represents a samza job. The JobModel hierarchy is that samza jobs have one to many containers(ContainerModel), and each container has one to many tasks(TaskModel). Each data model contains relevant information, such as logical id, partition information, etc. In the standalone deployment model, the JobModel is stored in zookeeper. Coordinator stream(kafka topic) is used to store the JobModel in the yarn deployment model.

Here’re the list of important and notable differences in processor and JobModel generation semantics between yarn and standalone deployment model:

...

Zookeeper is used in standalone for coordination between the stream processors of a stream application. Amongst all the available processors of a stream application, a single processor will be elected as a leader in standalone. The  In the standalone deployment model, the JobModel is stored in zookeeper. The leader will generate the JobModel and propagate the JobModel to all the other processors in the group. Distributed barrier in zookeeper will be used to block the message processing until the latest JobModel is picked by all the processors in the group. 

Overall high level changes:

...