Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Generating the Samza API whitelist

In order to load the Samza API classes from the API classloader, we need to tell cytodynamics what those classes are. We can do this by providing a whitelist of packages/classes when building the cytodynamics classloader. All public interfaces/classes inside of samza-api should be considered an API class. One way to generate this whitelist is to use a Gradle task to find all the classes from samza-api and put that list in a file. Then, that file can be read by Samza when constructing the cytodynamics classloader. The Gradle task should also include classes from samza-kv.

Other than classes that are explicitly provided by Samza as API, there are some other classes which need to be loaded by a common classloader so that they can be shared across classloaders. For some cases, like log4j2, instead of including each specific class name, cytodynamics accepts wildcard entries for the whitelist (e.g. "org.apache.logging.log4j.*").

ClassesDescription
samza-apimain API classes
samza-kvsome classes from here are used by implementations of pluggable classes
org.apache.logging.log4j:log4j-apisee Logging below for more information
org.apache.logging.log4j:log4j-coresee Logging below for more information

"Infrastructure" classloader

...

  • Before, when Samza-owned components were packaged with the application, then their runtime dependencies would be dependent on the application's dependencies, so their runtime dependencies might not match their build-time dependencies. In this design, the Samza-owned components on the job coordinator are able to use the actual build-time dependencies as runtime dependencies. However, the old behavior will continue to exist in the application runners and on the processing containers. Therefore, this design will introduce an inconsistency between the dependencies used across the runners, the job coordinator, and the processing containers. If there is any flow which requires the same set of dependencies to be used across all 3 pieces, then there would be a problem. This would be a general problem of any solution which only does job coordinator dependency isolation. An example of an issue could be if Java serialization is used to serialize a class from a transitive dependency on the application runner, and then it is deserialized on the job coordinator, where the version of the class from the transitive dependency on the job coordinator is different than the version on the runner. Although it is possible that this could break something, it seems very unlikely that it could cause a problem. The inter-process flows we currently have involving the job coordinator should be using objects defined within Samza (same Samza version is used across components), simple objects (e.g. strings), or serialization technologies that have good compatibility concepts built-in (e.g. JSON). Once we have general split deployment, this will no longer be a problem.
    • The Jackson JSON library itself could still be inconsistent due to application packaging, but the application should only be able to override the minor version of what Samza uses (i.e. Jackson 1.*), since Jackson 2.* has a different artifact name and class namespace.

Alternative solutions

Alternative solutions for SEP-24