Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

JobModel is the data model that logically represents a samza job. The hierarchy for JobModel JobModel hierarchy is that samza jobs have one to many containers, and containers have one to many tasks. Each data model contains relevant information, such as an logical id, partition information, etc. In standalone deployment model, JobModel is stored in zookeeper. Coordinator stream(kafka topic) is used to store JobModel in yarn deployment model.

...

  • The common layer between yarn and standalone model is the TaskNameGrouper abstraction(which is part of JobModel generation phase) which will encapsulate the host aware task assignment to processors.
  • Deprecate different flavors of existing TaskNameGrouper implementations(each one of them primarily grouping TaskModel into containers) and provide a single unified contract which is agnostic and supported in different deployment models(standalone/yarn).
  • Introduction of MetaDataStore abstraction which will be used to store and retrieve locality information for different deployment models in appropriate storage layers(Kafka be will be used as locality storage layer for yarn and zookeeper will be used as storage layer in standalone).
  • Utilizing both the task to preferred host mapping last reported task locality and processor locality of a stream application along with processor to preferred host mapping when generating the ContainerModels in both yarn and standalone modes.

...