Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Samza Yarn follows a multi-stage deployment model, where Job Runner, which runs on the submission host, reads configuration, performs planning and persist config in the coordinator stream before submitting the job to Yarn cluster. In Yarn, Application Master (AM) reads config from coordinator stream before spinning up containers to execute. Split of responsibility between job runner and AM is operationally confusing, and makes debugging the pipeline difficult with multiple points of failure. In addition, since planning invokes user code, it usually requires isolation on the runner from security perspective to guard the framework from malicious user code. In addition, config file is already packed in the tarball submitted to Yarn, it could be easier for AM to pick up the config locally., or a malicious user can gain access to other user jobs running on the same runner. 

Proposed Changes

We will provide a plugable config retrieval and planning interface retrieval interface on AM, when setused, Job Runner will simplify submit the job submission to Yarn, without involving any complex logic. AM on the other hand, will read job config by provided config using the provided config loader, performs planning, generate DAG and persist the final config back to coordinator stream.

Public Interfaces

Two We will introduce two job configs to configure the job to use the alternative workflow:

  • job.config.loader.class
  • job.config.loader.properties

Fully The changes are fully backward compatible. For people who are interested in using the new workflow, simplify supply "job.config.loader.class" and "job.config.loader.properties". For example, in Hello Samza example, application will be invoked by

...

Implementation and Test Plan

JobConfig

Extra We will add two new configs in JobConfig we add to control whether to read AM from ConfigLoader instead of coordinator stream.

...