Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ZK Data Model to support host affinity:

After the Before resuming the event processing after an rebalancing phase, before the start of processing each stream processor will register the details of physical host on which it runs in the localityData zookeeper node. The goal here is to separate the locality information from the JobModel itself (JobModel will be used to hold the task assignments). MetadataStore abstraction will be used to read and write locality information for different deployment models in appropriate storage layers. There will be two implementations of MetadataStore viz CoordinatorStreamBasedMetadataStore to read/write container locality information for yarn and ZkMetadataStore to read/write container locality information in zookeeper for standalone. In case of standalone, last known physical host in which each  samza task had run will be stored in zookeeper, which will then be used to assign tasks to stream processors. Stream processor will update the task locality of the tasks assigned to it before it begins processing(This is synonymous to behaviour in yarn, where locality is updated in SamzaContainer as a part of startup sequence). Local state of the tasks will be persisted in a directory(local.store.dir) provided through configuration by each processor.

...