Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Here are few reasons supporting the modification of TaskNameGrouper interface and removing LocalityManager from interface methods:

  • Any TaskNameGrouper implementation should be usable in both yarn and standalone deployment models. However, TaskNameGrouper interface definition has an explicit and tight binding with CoordinatorStream through LocalityManager and existing TaskNameGrouper implementations employs LocalityManager to read/write locality mapping from and to Coordinator stream. Coordinator stream doesn’t exist in standalone landscape and this prohibits usage of some TaskNameGrouper implementations in standalone.

  • Multiple group methods in TaskNameGrouper interface and additional balance method in BalancingTaskNameGrouper are logically synonymous to each other and exists to generate ContainerModels based upon the input task models and past locality assignments. It’s sensible to combine them into one interface method with adequate parameters and simplify things.

  • Any future TaskNameGrouper implementation could hold references to LocalityManager(a live object) and create object hierarchies based upon that reference. This will clutter the ownership of LocalityManager and could potentially create an unintentional resource leak.

  • Logically, a TaskNameGrouper implementation would just require the previous generation container models(to get previous task to preferred host mapping, previous task to systemstreampartition mapping) which can be passed in through the interface method to generate new mapping. Any modifications to existing assignments should be done outside of TaskNameGrouper implementation. This will make any implementation as a pure function simply operating on the passed in data.

...