Generating the Samza API whitelist

In order to load the Samza API classes from the API classloader, we need to tell cytodynamics what those classes are. We can do this by providing a whitelist of packages/classes when building the cytodynamics classloader. All public interfaces/classes inside of samza-api should be considered an API class. One way to generate this whitelist is to use a Gradle task to find all the classes from samza-api and put that list in a file. Then, that file can be read by Samza when constructing the cytodynamics classloader. The Gradle task should also include classes from samza-kv.

...

Set the context classloader to be the infrastructure classloader
The framework packaging needs to have a certain set-up. The following steps are for supporting log4j2 as the logging implementation for slf4j. It should be possible to support other logging implementations by adding the correct dependencies and specifying the correct classes on the API whitelist.
1. Include log4j2 dependencies in the framework API package (org.apache.logging.log4j:log4j-api, org.apache.logging.log4j:log4j-core, org.apache.logging.log4j:log4j-slf4j-impl, org.apache.logging.log4j:log4j-1.2-api).
2. Add the classes from log4j-api and log4j-core to the API whitelist. This can be done by just adding "org.apache.logging.log4j.*" to the whitelist.
3. Include samza-log4j2 as a dependency for the framework infrastructure package.
4. Include log4j2 dependencies in the framework infrastructure API package. These should already be included transitively through samza-log4j2.
5. Exclude all log4j v1 dependencies from all classpaths (org.slf4j:slf4j-log4j12, log4j:log4j).
6. (Recommended) Add a default log4j2.xml configuration file if there are cases in which the application does not provide one.
When setting the system property for the log4j2 configuration file location ("log4j.configurationFile"), the application's log4j2.xml should be used if it exists. Otherwise, a default log4j2.xml configuration from the framework infrastructure can be used. This can be done by passing an extra environment variable which is the "application lib directory" which may contain the application's log4j2.xml file to the job coordinator execution, and then reading that environment variable in the run-class.sh script when setting the log4j configuration system property.

For more context about why these changes are needed, see Details around necessary changes for logging 135861549.

Pros

Able to isolate log4j2 pluggable components built by Samza
Can override Samza infrastructure logging configuration

...

Locally build the framework tarballs for API and infrastructure. It would be useful to put an example somewhere for how to build those tarballs.
Deploy Zookeeper, Kafka, and YARN locally (https://samza.apache.org/startup/hello-samza/latest/).
Fill in certain configs (see 135861549 above). These will go into the properties file passed to the run-app.sh script.
Create the tarball for the application (https://samza.apache.org/startup/hello-samza/latest/). For testing local changes, remember to run the "publishToMavenLocal" command.

Changes can also be committed to samza-hello-samza to automatically execute the steps above.

Automated integration test

...

This will require multiple configs, including the location of the framework artifacts for YARN resources (see 135861549 above).

...

We could also consider writing an integration test using the integration test framework (which uses real YARN)

...

.

Full YARN cluster testing

It will also be useful to deploy some test jobs into a full YARN cluster with multiple nodes in order to verify the functionality.

Alternative solutions

Alternative solutions for SEP-24

...

Space shortcuts

Child pages

Versions Compared

Old Version 10

New Version 11

Key

Generating the Samza API whitelist

Pros

Automated integration test

Full YARN cluster testing

Alternative solutions

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 10

New Version 11

Key

Generating the Samza API whitelist

Pros

Automated integration test

Full YARN cluster testing

Alternative solutions