Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Deprecate the different existing flavors of the TaskNameGrouper implementations(each one of them primarily grouping TaskModel into containers) and provide a single unified contract. The common layer between yarn and standalone model is the TaskNameGrouper abstraction(which is part of JobModel generation phase) which will encapsulate the host aware task assignment to processors. In the existing implementation, only the processor locality is used to generate the task to processor assignments. In the new model, both the last reported task locality and processor locality of a stream application will be used when generating task to processor assignments in both the yarn and standalone models.
  • Introduction of MetaDataStore abstraction to store and retrieve processor and task locality for different deployment models in appropriate storage layers. Kafka be will be used as locality storage layer for yarn and zookeeper will be used as storage layer for standalone.
  • A new abstraction LocationIdProvider is introduced as a part of this change to generate locationId for a physical execution environment. All the processors of an application registered from an locationID should be able to share(read/write) their local state stores. Any store created by a processor running from a locationId is should be readable/writable by other processors running from the same locationId. Any custom LocationIdProvider is expected to honor this contract when generating the locationID. Here’re few reasons for introducing a new abstraction to generate locationId rather than using processorID as locationId.

...