Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In standalone, locality information of the stream processors will be stored seperately from the JobModel. JobModel will be used to hold the task assignments(processor to task assignment and task to system stream partition assignment). In standalone, each stream processor during it's startup phase will store the physical host on which it runs from into an appropriate zookeeper node. MetadataStore abstraction will be used to read and write stream processor locality information for different deployment models in appropriate storage layers. There will be two implementations of MetadataStore viz CoordinatorStreamBasedMetadataStore to read/write processor locality information into coordinator stream(a kafka topic) for yarn and ZkMetadataStore to read/write processor locality information in zookeeper for standalone. In case of standalone, last known physical host in which each  samza task had run will be stored in zookeeper, which will then be used to assign tasks to stream processors. Stream processor will update the task locality of the tasks assigned to it before it begins processing(This is synonymous to behaviour in yarn, where locality is updated in SamzaContainer as a part of startup sequence). Local Local state of the tasks will be persisted in a directory(local.store.dir) provided through configuration by each processor.

...