Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

After the rebalancing phase, before the start of processing each stream processor will register the details of physical host on which it runs in the localityData zookeeper node. The goal here is to separate the locality information from the JobModel itself (JobModel will be used to hold the task assignments). LocalityManager abstraction will be used to read and write locality information for different deployment models in appropriate storage layers. There will be two implementations of LocalityManager viz CoordinatorStreamBasedLocalityManager to read/write container locality information for yarn and ZkLocalityManager to read/write container locality information for standalone. In case of standalone, last known physical host in which each  samza task had run will be stored in zookeeper, which will then be used to assign tasks to stream processors. Stream processor will update the task locality of the tasks assigned to it before it begins processing(This is synonymous to behaviour in yarn, where locality is updated in SamzaContainer as a part of startup sequence).

Code Block
languagejava
- zkBaseRootPath/$appName-$appId-$JobName-$JobId-$stageId/
    - processors/
        - processor.000001/
        - processor.000002/
        ...
        - processor.00000N/
    - jobModels/
        - {jobModelVersion}
            JobModelObject
    - localityData
        - task01/
           - locationId
        - task02/
           - locationId
        ... 
 
        - task0N/
           - locationId

...