Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Expose a REST endpoint (eg: /containerHeartbeat) who's purpose is to get requests from the Samza container periodically and respond back weather the container is in the Job Coordinator's current list of valid containers.

    Code Block
    languagebash
    $ curl <host>:<port>/containerHeartbeat?executionContainerId=container_1490224420978_0323_01_000282
    {
    	alive: true
    }
  • Endpoint could be a part of the JobModelManager's servlet which is currently used for retrieving the JobModel by the containers during startup.
  • Endpoint can accept a "Execution Container IDexecutionContainerId" (eg: YARN container ID) and validate it against state maintained by the Job Coordinator (eg: YarnAppState) and future implementations of other cluster managers need to implement this endpoint and expose the same validation.

Container side

  • On the container side we In the LocalContainerRunner we can start a new thread monitor that periodically polls this the above endpoint described above to check if the container is valid. If its not, we shutdown the run loop and raise an error (so that the exit code is non 0 so that YARN reschedules the container)
    The plan is to setup a monitor in the LocalContainerRunner class that schedules a thread to check the above endpoint at regular intervals. On failure the thread modifies state on the LocalContainerRunner to denote that there was an error. This state is checked during exit in the LocalContainerRunner to exit with a non-zero code.

Public Interfaces

This would introduce a few new configs:

  • set an environment variable with the "Execution Container ID" during container launch. This can be read from the container to make requests to the above endpoint.
  • A new ContainerHeartbeatMonitor class that accepts a ContainerHeartbeatClient  callback and a ContainerHeartbeatClient (which has implements the business logic to make heartbeat checks on the JC endpoint) and a callback.
    The ContainerHeartbeatMonitor The ContainerHeartbeatMonitor schedules a separate thread at a fixed rate which uses the client to check if the heartbeat is still valid. On failure of the heartbeat, the passed callback is executed, which is used to shutdown the container and set state on LocalContainerRunner to shutdown the main thread with a non-zero code.

Public Interfaces

  • Set an environment variable "EXECUTION_ENV_CONTAINER_ID" (eg: YARN container ID) during container launch. This can be read from the container to make requests to the above endpoint.

Implementation and Test Plan

...