Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • This system does not intend to solve the problem of Dynamic Workload Balance i.e  Cruise control. It may act as a building block for one later.
  • Solving the canary problem for YARN based deployment model is out of the scope of this solution however system built should be easily extensible to support canary 
  • This system will not have built-in intelligence to find a better match for the host for a container it will make simplistic decisions as per params passed by the user.

...

Metastore today (Kafka) is at least once & eventually consistent, hence ContainerPlacementService has to do in-memory caching of UUIDs of accepted actions so that it does not take one request twice in case of duplicates delivered. But the in-memory caching must not be an unbounded cache since that can result in a job running out of memory. Size of a UUID is 16bytes, at max a job lets say might have 500 containers, then one request action per container for 500 containers will result in 0.008 MBs of increase memory (just in memory lookup). If we cache lets say last 20K actions (which can accomodate 40 failovers of 500 containers in the current scenario) the memory used will be 0.64 MBs at max if we implement a FIFO cache (inmemory lookup + fifo queue).

...

  • Remove the HostAwareContainerAllocator & ContainerAllocator, simplify Container Allocator as a simple lightweight entity allocating requests to available resources (PR1, PR2)
  • Introduce ContainerManager which acts as a brain for validating and issuing any actions on containers in the Job Coordinator for both active & Standby containers. (PR)
    • Transfer state & validation of container launch & expired request handling from ContainerAllocator to ContainerManager
    • Transfer state & lifecycle management of Container allocator & resource request on boot  job start (reading locality mapping) from ClusterResourceManager.CallBack(ContainerProcessManager) to ContainerManager*
  • Encapsulates logic and state related to container placement actions like move, restarts for active & standby container in ContainerManager (PR-1, TDB)
    • It is ContainerManager’s duty to validate any ContainerPlacementRequestMessages & also invalidate messages from the previous deployment incarnation
    • It is ContainerManager’s duty to write ContainerPlacementResponseMessages to Metastore for the external control controller to query the status of the request
    • ContainerPlacementMetadata is a metadata holder for container actions (ControlActionMetadata) for ex forex request_id, current status, requested resources etc


Note:  *ClusterResourceManager.Callback (ContainerProcessManager) is tightly coupled with ClusterbasedJobCoordinator today, all the proposed changes will be done except for moving state & lifecycle management of Container allocator & resource request on boot job start (reading locality mapping) from ClusterResourceManager.CallBack(ContainerProcessManager) to ContainerManager in phase 1 of the implementation so that this feature can be developed faster. Hence ContainerProcessManager will still be tied with ClusterBasedJobCoordinator and will intercept any container placement requests. 

...

  • If the preferred resources are not able to be accrued the active container is never stopped and a failure notification is sent for the ContainerPacementRequest
  • If the ContainerPlacementManager is not able to stop the active container (3.1 #1 above fails) in that  case the request is marked failed & a failure notification is sent for the ContainerPacementRequest
  • If ClusterResourceManager fails to start the stopped active container on the accrued destination host, then we attempt to start the container back on the source host and a failure notification is sent for the ContainerPacementRequest. If a container fails to start on source host then an attempt is made to start on ANY_HOST

...

Note: ClusterResourceManager.Callback (ContainerProcessManager) is tightly coupled with ClusterbasedJobCoordinator today, all the proposed changes (in part 2) will be done except for moving state & lifecycle management of Container allocator & resource request on boot job start(reading locality mapping) from ClusterResourceManager.CallBack(ContainerProcessManager) to ContainerManager in phase 1 of the implementation so that this feature can be developed faster. Hence ContainerProcessManager will still be tied with ClusterBasedJobCoordinator and will intercept any container placement requests. 

...