Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The maven-shade-plugin

The maven-shade-plugin is a powerful tool for creating jars, providing fine-grained control over the contents of a modules jar.

Using the shade-plugin is commonly referred to as "shading"; a term which should be avoided because it's not a well-defined term w.r.t. what features of the shade-plugin are actually being used. "Bundling" and "relocating" are more accurate terms; see below for their meanings.

Main features

Bundling dependencies

This refers to including the contents of another jar into the jar of a module.

This is commonly done for creating fat-jars, like the flink-dist jar that contains most of the Flink runtime and it's dependencies. The second common usage is for avoiding dependency conflicts via relocations (see below).

When bundling dependencies the plugin allows you to select specific files to include/exclude from jars via filters and pre-process files via transformers (for example to combine duplicate files like notices).

Note: You should avoid bundling dependencies without relocating them in artifacts that are directly consumed by users, as this can cause surprising dependency conflicts, since you are in-practice smuggling a depednency onto the classpath.

Relocating dependencies

This refers to changing all references to a particular java package in all bundled files.

Since you (generally) can't load 2 different versions of the same class (identified by package and class name), bundling and relocating a class to a different package causes it to be treated as an entirely separate entity. This allows you to bundle one version of a dependency for internal use, while allowing downstream users to use another version of said dependency.

Note: You should avoid relocating dependencies that are exposed by the APIs of your module, because the user has no control or even knowledge about the actual version, no access to proper source jars nor does such a dependency integrate nicely into the remaining ecosystem of the relocated dependency (e.g., you wouldn't be able to use some extension of said dependency because for all intents and purposes it's a different dependency).

Note: Relocation does not require the targeted dependency to be bundled. If you are certain that another module bundles the relocated dependency, then you can relocate just the classes of your module, relying on the other module to provide the actual relocated dependency.

Dependency reduction

This refers to the removal of references to bundled dependencies from the published poms.

If a dependency is bundled then you typically do not want users to see said dependency, to avoid unnecessarily extending the classpath or just plain causing more headaches to the user.

If the dependency was not relocated (which should be avoided) you force a user to deal with the dependency, although it's already being provided by your jar.

If the dependency was relocated then there's usually no reason to expose this dependency to the consuming user, as it just pollutes the classpath.

The Maven 3.3+ problem

In Maven 3.2.5 and below dependency reduction was not just applied to the published poms, but also to the in-memory Maven model when Flink was compiled. For a multi-module project as Flink this meant that a module could use a specific version of a dependency, bundle&relocate it, without having to worry that this dependency might be visible to other modules.

In Maven 3.3.0 this was changed and the dependency tree became immutable at runtime. Dependency reduction no longer workedon the in-memory model (while still working for the published poms).

This results in various dependency conflicts and dependencies being bundled multiple times, as previously hidden dependencies where now still picked up as transitive dependencies.

Workarounds
Build Flink in stages

Since dependency reduction still applies to the published poms you can still get the right result by building Flink in stages. Whenever a module A is bundling another module B, which is bundling some dependency, then A must be built in an entirely separate Maven build then B. Note that B also must be installed into the local maven repo.

This approach is error-prone and tedious, but is the only "solution" that doesn't require changes to Flink.

Mark bundled dependencies as optional

Since the core issue is that of bundled dependencies still being exposed to downstream modules, explicitly marking these dependencies as optional (aka, non-transitive) can resolve this issue, at the cost of higher maintenance overhead.

This the approach proposed in FLINK-28016 to achieve full Maven 3.3+ support.

Interactions with other plugins

Dependency reduction interferes with plugins that work with the dependency tree, like the maven-dependency-plugin. In practice there are 2 different dependency trees within Flink, one before, one after dependency reduction, the visiblity of which depends on how the dependency-plugin is being used.

For example, running the dependency-plugin within a single module means that it works agains the dependency-reduced poms from the local maven repository. Meanwhile, using the plugin for the entire project usually means

Interactions with IDEs

Dependency reduction is not active within the IDE (because the IDE works directly with the compiled class files, not created poms/jars).

This means that even if we have the capability to user entirely separate (and incompatible!) versions in production and Maven, we may still be forced to converge dependencies to a certain degree across modules. Reducing dependencies between modules can mitigate such issues.

History dependency index

This section This page lists the dependencies for specific releases, which can be used to compare the dependency sets between versions.

...