Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Overview

A deployable Samza application currently consists of JARs for Samza infrastructure code (and dependent JARs) and JARs for application-specific code (and dependent JARs). The full deployable package is determined at build time. When deploying an application, the built package of JARs is placed on the necessary node(s), which includes the job coordinator and the processing containers. This build-time packaging has benefits, as it simplifies the deployment responsibilities of Samza infrastructure – the package built by the application has everything needed to run a Samza application. Application owners (who may not be the same as the owners of the Samza infrastructure) choose the version of Samza to use and do the packaging.

...

Generating the Samza API whitelist

In order to load the Samza API classes from the API classloader, we need to tell cytodynamics what those classes are. We can do this by providing a whitelist of packages/classes when building the cytodynamics classloader. All public interfaces/classes inside of samza-api should be considered an API class. One way to generate this whitelist is to use a Gradle task to find all the classes from samza-api and put that list in a file. Then, that file can be read by Samza when constructing the cytodynamics classloader. The Gradle task should also include classes from samza-kv.

...

Handling SamzaApplication.describe

The infrastructure classloader will include the concrete descriptors, and we will build an additional application classloader which can delegate to the infrastructure classloader when running describe.

Since the application code is calling the descriptors directly, then the application classloader needs to be able to delegate to the infrastructure classloader. However, we do not want to delegate for every class. We only want to delegate for certain components (e.g. descriptors, serdes). We don't want to delegate for application dependencies or classes which are only implemented by the application.

Flow for loading a class from the additional application classloader for SamzaApplication.describe:

  1. If a class is a Samza API class, then load it from the API classloader.
  2. If the class is on the infrastructure classpath and it is in the infrastructure whitelist (e.g. descriptor), load it from the infrastructure classloader.
  3. If the class is on the application classpath, load it from the application classloader.
  4. ClassNotFoundException

A consequence of this structure is that there are "multiple" application classloaders on the job coordinator: one in this describe flow and the one described above at "API" classloader. Therefore, any classes loaded by one of the application classloaders cannot be used by the classes of the other application classloader. An example of when this could happen is in the low-level API. The application's TaskFactory implementation will be loaded by the application classloader described above, but the Kafka events deserialized into Avro objects will be loaded by the other application classloader. Even though the Avro objects are the same class (even associated with the same binary), the TaskFactory implementation won't be able to use the Avro objects since a different classloader instance was used. We can solve this by serializing the components specified through the descriptor and deserializing those components using the classloader that is used for the rest of the AM. This is consistent with the strategy to be able to serialize the whole job description. The interfaces have already been marked as Serializable.

Pros

  • API classloader stays simpler
  • Allows application to delegate to infrastructure for describe, and infrastructure to delegate to application for processing

Cons

  • Additional classloader component adds complexity
    • Includes having multiple classloaders associated with the same application classpath
  • Need to serialize and deserialize components of the application description
    • Currently, some application descriptions are not actually Serializable (e.g. application context factory for both SQL and Beam)

Classloader wiring

By using the special classloader to instantiate the "main" class, any dependencies will then be loaded using that classloader. Then Java will automatically propagate the special classloader through the rest of Samza. We can modify the "main" method to use reflection to load the "main" class and then trigger the actual Samza startup.

...