Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel1

Status

Current state: [ UNDER DISCUSSION ]Accepted

Discussion thread[Discuss] SEP-23: Simplify Job Runner

JIRASAMZA-2405 

Released: 

Problem

Samza Yarn follows a multi-stage deployment model, where Job Runner, which runs on the submission host, reads configuration, performs planning and persists config in the coordinator stream before submitting the job to Yarn cluster. In Yarn, Application Master (AM) reads config from coordinator stream before spinning up containers to execute. Split of responsibility between job runner and AM is operationally confusing, and makes debugging the pipeline difficult with multiple points of failure. In addition, since planning invokes user code, it usually requires isolation on the runner from security perspective to guard the framework from malicious user code, or a malicious user can gain access to other user jobs running on the same runner. 

...

Code Block
deploy/samza/bin/run-app.sh \
  --config app.class=samza.examples.wikipedia.task.application.WikipediaFeedTaskApplication \
  --config job.name=wikipedia-stats \
  --config job.factory.class=org.apache.samza.job.yarn.YarnJobFactory \
  --config yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.tar.gz \
  --config job.config.loader.classfactory==org.apache.samza.config.loaderloaders.PropertiesConfigLoaderPropertiesConfigLoaderFactory \
  --config job.config.loader.properties.path=/__package/config/wikipedia-feed.properties

Work with Beam

See Work with Beam on how Samza Beam jobs work with simplified Job Runner. 

Rejected Alternatives

The above approach requires existing users to update its way to start a Samza job. Alternatively, we may keep the ability for runner to read from a local config, and AM will load the config using with the loader again.

...

Users need to change job submission script and provide related configs explicitly through --config, instead of using --config-factory and --config-path to load local file.

Config Rewriters won't be invoked on job runner anymore, i.e. if the config rewriter is rewriting the job submission configs, it won't take effect anymore, users are expected to pass them explicitly.

Rollback Plan

In case of a problem in Samza 1.4, users are able to rollback to Samza 1.3 and keep the old start up flow using --config-path & --config-factory.

...