Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The common layer between yarn and standalone model is the TaskNameGrouper abstraction(which is part of JobModel generation phase) which will encapsulate the host aware task assignment to processors.
  • Deprecate different flavors of existing TaskNameGrouper implementations(each one of them primarily grouping TaskModel into containers) and provide a single unified contract which is agnostic and supported in different deployment models(standalone/yarn).
  • Introduction of MetaDataStore abstraction which will be used to store and retrieve locality information for different deployment models in appropriate storage layers(Kafka be will be used as locality storage layer for yarn and zookeeper will be used as storage layer in standalone).
  • Utilizing In the old model, only processor locality will be used to generate task to processor assignments. In the new model both the last reported task locality and processor locality of a stream application when generating the ContainerModels will be used when generating task to processor assignments in both yarn and standalone modesmodels.

If an optimal assignment for each task to a particular processor is generated in the JobModel as part of the leader in a stateful processors group, each follower will just pick up their assignments from job model after the rebalance phase and start processing(similar to non-stateful jobs). The goal is to guarantee that the optimal assignment happens which minimizes the task movement between the processors. Local state of the tasks will be persisted in a directory(local.store.dir) provided through configuration by each processor.

...