Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Status

Current state: UNDER DISCUSSION Accepted

Discussion threadhttp://mail-archives.apache.org/mod_mbox/samza-dev/201802.mbox/%3CCAFvExu1GHnphidP_wRriMey-T7Hss4AqAxscOoBFUHuMR5sq%3DQ%40mail.gmail.com%3E

...

  1. Support stateful stream processing in standalone stream applications.
  2. Minimize partition movements amongst stateful processors in the rebalance phase.
  3. Existing generators discard the task to physical host assignment when generating the JobModel and only uses container to processor to preferred host assignment. However, for standalone it’s essential to consider this detail(task to physical host assignment) between successive job model generations to generate optimal task to processor assignment. For instance, let’s assume stream processors P1, P2 runs on host H1 and processors P3, P4 runs on host H3. If P1 dies, it is optimal to assign some of the tasks processed by P1 to P2. If previous task to physical host assignment is not taken into account when generating JobModel, this cannot be achieved.
  4. In an ideal world, any TaskNameGrouper should be usable interchangeably between yarn and standalone deployment models. Currently only a subset of TaskNameGrouper’s usable in yarn  are supported in standalone.

...

  • JobModel generation phase: ApplicationMaster(JobCoordinator) in yarn deployment model generates the Job model(processor container to task assignment) for the samza job. 
  • ContainerAllocator phase: This happens after the JobModel generation phase and schedules each processor container to run on a physical host by coordinating with the underlying ClusterManager and orchestrates the execution of the processor.

Here’re the list of important and notable differences in processor and JobModel generation semantics between yarn and standalone deployment model:

  • Number of processors of containers is a static configuration in yarn deployment model and a job restart is required to change the number of processorsit. However, an addition/deletion of a processor to a processors group in standalone is quite common and an expected behavior.
  • A processor container is assigned a physical host by ContainerAllocator after the JobModel generation phase in yarn. Physical host in which a stream processor is going to run is known before the JobModel generation phase in standalone(ContainerAllocator phase is not needed in standalone to associate the processor with the physical host).

...

  • Deprecate the different existing flavors of the TaskNameGrouper implementations(each one of them primarily grouping TaskModel into containers) and provide a single unified contract. The common layer between yarn and standalone model is the TaskNameGrouper abstraction(which is part of JobModel generation phase) which will encapsulate the host aware task assignment to processors. In the existing implementation, only the processor locality will be is used to generate the task to processor assignments. In the new model, both the last reported task locality and processor locality of a stream application will be used when generating task to processor assignments in both the yarn and standalone models.
  • Introduction of MetaDataStore abstraction to store and retrieve processor and task locality for different deployment models in appropriate storage layers. Kafka be will be used as locality storage layer for yarn and zookeeper will be used as storage layer for standalone.
  • A new abstraction LocationIdProvider is introduced as a part of this change to generate locationId for a physical execution environment. All the processors of an application registered from an locationID should be able to share(read/write) their local state stores. Any store created by a processor running from a locationId should be readable/writable by other processors running from the same locationId. Any custom LocationIdProvider is expected to honor this contract when generating the locationID. Here’re few reasons for introducing a new abstraction to generate locationId rather than using processorID as locationId.

...

Code Block
languagejava
// '+' denotes addition, '-' denotes deletion.
public interface TaskNameGrouper {
  + @Deprecated
  Set<ContainerModel> group(Set<TaskModel> tasks);

  + @Deprecated
  default Set<ContainerModel> group(Set<TaskModel> tasks, List<String> containersIds) {
    return group(tasks);
  }
  /**
   * @param taskModels, represents the taskModels generated by the SSPGrouper.
   * @param taskLocality, taskName to locationId mapping of the previous generation. 
   * @param processorLocality, processorId to locationId mapping.
   * @return the containerModels generated.   
   */  
  + Set<ContainerModel> group(Set<TaskModel> taskModels, Map<String, String> taskLocality, Map<String, String> processorLocality);
}

+ @Deprecated
public interface BalancingTaskNameGrouper extends TaskNameGrouper {
  + @Deprecated 
  Set<ContainerModel> balance(Set<TaskModel> tasks, LocalityManager localityManager);
}

public class ContainerModel {
  - @Deprecated
  - private final int containerId;
  private final String processorId;
  private final Map<TaskName, TaskModel> tasks;
  + // New field added denoting the physical locationId.
  + private final String locationId;
}

+public interface LocationIdProvider {
   +  // In case of containerized environments, LocationId is a combination of multiple fields (sliceId, containerId, hostname) instead of simple physical hostname,
   +  // This will be provided by the execution environment of the processor.
   + String getLocationId();
}


+ public interface MetadataStore {
  + // returnsGets the processorIdvalue toassociated LocationIdwith mapping.
the specified + public Map<String, String> readProcessorLocality({@code key}.
  + byte[] get(byte[] key);
  
  + // returns Updates the mapping of the specified key-value pair; Associates the specified {@code key} with the taskNamespecified to{@code LocationIdvalue} mapping.
  + public Map<String, String> readTaskLocality( void put(byte[] key, byte[] value);
 
  + // Deletes the mapping for //the writesspecified the{@code providedkey} processordIdfrom tothis hoststore mapping(if tosuch underlyingmapping storageexists).
  + publicvoid boolean writeProcessorLocality(Map<String, String> processorLocalityremove(byte[] key);
}

LocationId reported by the live processors of the group and last reported task locality will be used to calculate the task to container assignment in standalone. Preferred host mapping will be used for task and processor locality in case of yarn. Any new task/processor for which grouping in unknown(unavailable in preferred host/task-locality in underlying storage layer), will be treated as any_host during assignment.

...

  • Modify the existing interfaces and classes as per the proposed solution.

  • Add unit tests to test and validate compatibility and functional correctness. 

  • Add a integration test tests in samza standalone samples to verify the host affinity feature

  • Add an integration test to verify that there are minimal partition movements during rolling upgrade.

  • Verify compatibility - Jackson, a java serialization/deserialization library is used to convert data model objects in samza into JSON and back. After removing containerId field from ContainerModel, it should be verified that deserialization of old ContainerModel data with new ContainerModel spec works. 

  • Some TaskNameGrouper implementations assumes the comparability of integer containerId present in ContainerModel(for instance - GroupByContainerCount, a TaskNameGrouper implementation). Modify existing TaskNameGrouper implementations to take in collection of string processorId’s, as opposed to assuming that containerId is integer and lies within [0, N-1] interval(without incurring any change in functionality).

...