Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

JIRASAMZA-2405

Released: 

Table of Contents
maxLevel1

Problem

Samza Yarn follows a multi-stage deployment model, where Job Runner, which runs on the submission host, reads configuration, performs planning and persist config in the coordinator stream before submitting the job to Yarn cluster. In Yarn, Application Master (AM) reads config from coordinator stream before spinning up containers to execute. Split of responsibility between job runner and AM is operationally confusing, and makes debugging the pipeline difficult with multiple points of failure. In addition, since planning invokes user code, it usually requires isolation on the runner from security perspective to guard the framework from malicious user code, or a malicious user can gain access to other user jobs running on the same runner. 

...

The full list of configs can be found in References#Complete list of job submission configs

Take wikipedia-feed in Hello Samza as an example:

Code Block
deploy/samza/bin/run-app.sh \
  --config job.name=wikipedia-stats \
  --config job.factory.class=org.apache.samza.job.yarn.YarnJobFactory \
  --config yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.tar.gz \
  --config job.config.loader.class==org.apache.samza.config.loader.PropertiesConfigLoader \
  --config config.path=/__package/config/wikipedia-feed.properties

...