Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Expose a REST endpoint (eg: /isContainerValid) who's purpose is to get requests from the Samza container periodically and respond back weather the container is in the Job Coordinator's current list of valid containers.
  • Endpoint could be a part of the JobModelManager's servlet which is currently used for retrieving the JobModel by the containers during startup.
  • Endpoint can accept a "Execution Resource ID" (eg: YARN container ID) and validate it against state maintained by the Job Coordinator (eg: YarnAppState) and future implementations of other cluster managers need to implement this endpoint and expose the same validation.

 

Container side

...

  • On the container side we start a new thread that periodically polls this endpoint described above to check if the container is valid. If its not, we shutdown the run loop and raise an error (so that the exit code is non 0 so that YARN reschedules the container)
  • Since raising an exception may not be the ideal way to shutdown the container (skips all the shutdowns in the finally block). It may be useful to set a member variable in SamzaContainer to denote that an exception was raised. When shutting down in the LocalContainerRunner we can check for this variable and exit with a non-zero exit code.

...