Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The maven-shade-plugin

The maven-shade-plugin is a powerful tool for creating jars, providing fine-grained control over the contents of a modules jar.

Using the shade-plugin is commonly referred to as "shading"; a term which should be avoided because it's not a well-defined term w.r.t. what features of the shade-plugin are actually being used. "Bundling" and "relocating" are more accurate terms; see below for their meanings.

Main features

Bundling dependencies

This refers to including the contents of another jar into the jar of a module.

...

Note: You should avoid bundling dependencies without relocating them in artifacts that are directly consumed by users, as this can cause surprising dependency conflicts, since you are in-practice smuggling a depednency onto the classpath.

Relocating dependencies

This refers to changing all references to a particular java package in all bundled files.

...

Note: Relocation does not require the targeted dependency to be bundled. If you are certain that another module bundles the relocated dependency, then you can relocate just the classes of your module, relying on the other module to provide the actual relocated dependency.

Dependency reduction

This refers to the removal of references to bundled dependencies from the published poms.

...

If the dependency was relocated then there's usually no reason to expose this dependency to the consuming user, as it just pollutes the classpath.

Interactions with other plugins

Dependency reduction interferes with plugins that work with the dependency tree, like the maven-dependency-plugin. In practice there are 2 different dependency trees within Flink, one before, one after dependency reduction, the visibility of which depends on how the dependency-plugin is being used.

For example, running the dependency-plugin within a single module means that it works against the dependency-reduced poms from the local maven repository. Meanwhile, using the plugin for the entire project means it uses the pre-dependency-reduction set of dependencies, which can affect which versions are pulled into a module.

Interactions with IDEs

Dependency reduction is not active within the IDE, because the IDE works directly with the compiled class files, not created poms/jars.

This means that even if we have the capability to use entirely separate (and incompatible!) versions in production and Maven, we may still be forced to converge dependencies to a certain degree across modules. Reducing dependencies between modules can mitigate such issues.

The Maven 3.3+ problem

In Maven 3.2.5 and below dependency reduction was not just applied to the published poms, but also to the in-memory Maven model when Flink was compiled. For a multi-module project as Flink this meant that a module could use a specific version of a dependency, bundle&relocate it, without having to worry that this dependency might be visible to other modules.

In Maven 3.3.0 this was changed and the dependency tree became immutable at runtime. Dependency reduction no longer workedon applied to the in-memory model (while still working for the published poms).

This results in various dependency conflicts and dependencies being bundled multiple times, as previously hidden dependencies where now still picked up as transitive dependencies.

Workarounds
Build Flink in stages

Solution

Mark bundled dependencies as optional

Since the core issue is that of bundled dependencies still being exposed to downstream modules, explicitly marking these dependencies as optional (aka, non-transitive) can resolve this issue, at the cost of higher maintenance overhead.

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyFLINK-28203

Workarounds

Build Flink in stages

Since dependency reduction still applies to the published poms you can get the right result by Since dependency reduction still applies to the published poms you can still get the right result by building Flink in stages. Whenever a module A is bundling another module B, which is bundling some dependency, then A must be built in an entirely separate Maven build then B. Note that B also must be installed into the local maven repo.

This approach is error-prone and tedious, but is the only "solution" that doesn't require changes to Flink.

Mark bundled dependencies as optional

Since the core issue is that of bundled dependencies still being exposed to downstream modules, explicitly marking these dependencies as optional (aka, non-transitive) can resolve this issue, at the cost of higher maintenance overhead.

This the approach proposed in FLINK-28016 to achieve full Maven 3.3+ support.

Interactions with other plugins

Dependency reduction interferes with plugins that work with the dependency tree, like the maven-dependency-plugin. In practice there are 2 different dependency trees within Flink, one before, one after dependency reduction, the visiblity of which depends on how the dependency-plugin is being used.

For example, running the dependency-plugin within a single module means that it works agains the dependency-reduced poms from the local maven repository. Meanwhile, using the plugin for the entire project usually means

Interactions with IDEs

Dependency reduction is not active within the IDE (because the IDE works directly with the compiled class files, not created poms/jars).

This means that even if we have the capability to user entirely separate (and incompatible!) versions in production and Maven, we may still be forced to converge dependencies to a certain degree across modules. Reducing dependencies between modules can mitigate such issues.

History dependency index

...