Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Number of processors is a static configuration in yarn deployment model and a job restart is required to change the number of processors. However, an addition/deletion of a processor to a processors group in standalone is quite common and an expected behavior.
  • A stream processor is assigned a physical host by ContainerAllocator after the JobModel generation phase in yarn. Physical host in which a stream processor is going to run is known before the JobModel generation phase in standalone(ContainerAllocator phase is not needed in standalone to associate a processor with the physical host).
  • Existing generators discard the task to physical host assignment when generating the JobModel and only uses container to preferred host assignment. However, for standalone it’s essential to consider this detail(task to physical host assignment) between successive job model generations to generate optimal task to processor assignment. For instance, let’s assume stream processors P1, P2 runs on host H1 and processor P3 runs on host H3. If P1 dies, it is optimal to assign some of the tasks processed by P1 to P2. If previous task to physical host assignment is not taken into account when generating JobModel, this cannot be achieved.A stream processor is assigned a physical host by ContainerAllocator after the JobModel generation in yarn. Physical host in which a stream processor is going to run is known before the JobModel generation phase in standalone(Standalone does not need a ContainerAllocator phase to associate container with the physical host).
  • In an ideal world, any TaskNameGrouper should be usable interchangeably between yarn and standalone deployment models. Currently only a subset of TaskNameGrouper’s usable in yarn  are supported in standalone.
  • Zookeeper will be used as locality store in standalone and coordinator stream(kafka) is used as locality store in yarn.

...