Rationale
Maven is stuck on POM v4 for a long time now, because changing the POM version and publishing artifacts on Maven Central with this new model would break consumers using either older Maven versions or other build tools (that use POM v4 as a compatibility format).
Other build tools don't suffer from this issue: their build format is kept internal and a POM is produced only while publishing artifacts to Maven repositories, with information just for artifacts consumers but no build instruction.
Maven could apply same strategy: generate a consumer-only POM when publishing artifacts to a repository, with information different from build POM (removed fields, inferred fields, modified fields) . The original POM is then called "build" POM – since it contains instructions to build the artifact – and the generated modified POM is called "consumer" POM – since it's intended for artifact consumers. Once this is done, we can have new build POM versions, which will require newer Maven versions to build, while consumer POM remains compatible with classical POM v4: the only requirement is to be able to generate a consumer POM (a classical POM v4) from the original build POM used during build: flatten-maven-plugin has already proven that generating a simplified POM from the original POM and publishing it to Maven repository is feasible.
The consumer-only POM will also be easier to explain and document consumer features, without being disturbed by build content. And it's a first step towards complete separation of build and consumer features that comes with Project Dependency Trees schema proposal.
From File to Effective POM
Up until Maven 3.6.3 the ModelBuilder reads the file, resulting in the raw model. To get the effective model, the raw model was cloned (Java) and enriched according to https://maven.apache.org/ref/3.6.3/maven-model-builder/
Since Maven 3.7.0 the ModelBuilder reads the file, resulting in the file model. Next, the file is read again, now using the BuildPomXMLFilter, resulting in another Model. This is merged into the file model, so all linenumber will match the original pom.
For this reason, the ModelValidator interface has a new default method: validateFileModel (yeah Java 8), which can still do most validations of the original validateRawModel. The latter can check the extended result from the BuildPomXMLFilter.
One of the filters is the ReactorDependencyXMLFilter, which can inject the version based on the reactor modules. However, this means that all fileModels must have been read, otherwise this information is not available in the transformContext. The ModelBuilder only reads one file, might include the parent, but doesn't handle the modules, that is done by the ProjectBuilder.
Special case: parent POMs (packaging=pom)
Parent POMs (which are POMs with "pom" packaging) don't really have any meaning as consumer POM: there is no dependency artifact to consume from them.
They are useful only as build POMs. Moreover, they are required in Maven repositories to be used either as parent POM or as dependencyManagement import (ie imported from dependency with scope="import", evaluated only at build time).
Then non-pom-packaging POMs will be published in Maven repositories as consumer POM (v4) but pom-packaging POMs will be published in Maven repositories as build POM only (eventually using new version/format): this use case won't cause issues.
Special case: CI Friendly versions
Since Maven 3.5.0 a feature for continuous development was introduced: https://maven.apache.org/maven-ci-friendly.html However, in case of a multimodule project there's an issue, because the the distributed files did end up in the right folder of the repository, but the placeholders were not replaced in the pom.xml. This causes issues once these were used as dependencies. Hence these values need to be updated/replaced as well, once being installed/deployed.
This is where a consumer POM with replaced values makes sense.
POM File name?
Consumer POM file name does not really have a meaning, but if it had, it would remain as pom.xml
.
Build POM file name, while updating POM format, could be changed to something like build.xml
(bad idea since it's "reserved" by Apache Ant), build.pom
, or even build.json
or build.yaml
: not sure this would be a good idea, but at least, we can.
Consumer POM fields when simplifying
One strategy is to remove build-only fieldsto remove. We need to define which fields from POM v4 we want to keep in consumer POM: removed, kept because required, kept by choice (could be removed if we decide), keep only a part of content, discussion required
field | status for consumer | comment |
---|---|---|
<modelVersion/> | not absolutely required, but kept as usual convention | |
<parent> | content inlined in consumer POM, because we can and it will simplify consumers code rfscholte: should stay ensure the calculation of distances of dependencies. Only the relativePath can be removed, since it point to a File on the local system. hboutemy: I don't see how parent has anything to do with distance of dependencies rfscholte: parent is a special kind of dependency, all dependency related-segments should stay. hboutemy: does not explain what is useful, since flatten remove the (build) dependency (of a really different nature). Keeping parent in consumer POM will block the idea of newer POM formats for build-only that consumers ignore. | |
<groupId/> | ||
<packaging/> | not absolutely required, since packaging is more a build configuration than something consumers may use | |
<name/> | necessary because of minimal requirements for central | |
<description/> | necessary because of minimal requirements for central | |
<url/> | necessary because of minimal requirements for central | |
<inceptionYear/> | ||
<organization> | ||
<licenses> | necessary because of minimal requirements for central | |
<developers> | necessary because of minimal requirements for central | |
<contributors> | If people want to remove content, the generation should be parametrized | |
<mailingLists> | ||
<prerequisites> | used for plugins, to define runtime Maven version prerequisite | |
<modules/> | rfscholte: points to a File on the local system, hence can always be removed. | |
<scm> | necessary because of minimal requirements for central | |
<issueManagement> | ||
<ciManagement> | ||
<distributionManagement> | keep rfscholte: not up to us. Is people want to remove it, they should use flatten-maven-plugin hboutemy: how is this information useful for consumers? rfscholte: it is not build-related, which is enough reason for me to keep it, it is about the distribution. It shows the original location from which it was spread into the world. hboutemy: original location is a build feature, nobody at SBT or other build tool generate a pom.xml with this field because they use their build tool to push | |
<properties> | values inlined in consumer POM rfscholte: all dependency-related segments should stay, which means properties too because they can be used as part of the dependency hboutemy: isn't what flatten-maven-plugin does? rfscholte: this consumer-pom is not the same as flatten-maven-plugin. The flatten-maven-plugin is a decision by the developers to resolve all properties. We should not do that by default. hboutemy: why? | |
<dependencyManagement> | rfscholte: all dependency-related segments should stay hboutemy: how is it useful for consumers? rfscholte: it is not build-related. e.g. even the pom of a jar can be used as bom. Maven allows it, so we should not try to simply remove it. hboutemy: in theory, yes. In practice, people write super poms for that. And that's yes a little new constraint to add: you must create a bom when you want a bom. Removing this from normal poms will just make clear that it's not used in normal dependencies, which are 99% of the time | |
<dependencies> | without system scope | system scoped dependencies removed in consumer POM (as done in flatten-maven-plugin) + import scope removed, since it's a build feature to import dependencyManagement build-only feature rfscholte: all dependency-related segments should stay, if we allow system-scope at build-time, it must also be consumable. hboutemy: ok, why not, I won't fight on this one (not a good practice, but that's life) |
<repositories> | need to check if repositories configured in dependencies are used during resolution rfscholte: all dependency-related segments should stay hboutemy: how is it useful for consumers? rfscholte: required if dependencies needs to be downloaded from a different repository. hboutemy: the question is: is it really used currently? (to avoid the dependencyManagement effect: people think it is used in dependencies, but it is not...) | |
<pluginRepositories> | ||
<build> | this is where the addition of new configuration to enhance Maven build features will be the most useful | |
<reports/> | let's remove this old Maven 1 compatibility field... | |
<reporting> | ||
<profiles> | ||
<id/> | ||
<activation> | keep JDK and OS activation only? removing other activations, which are build time. Same as flatten-maven-plugin feature | |
<dependencies> | ||
<build> | since removed from base model all dependency-related segments should stay |
Consumer POM fields that can be inferred or must be updated from build POM
Another approach is to have a build POM with properties or fields inferred from disk, that are filled automatically when deducting consumer POM:
- project.version from sub-modules can be inferred from disk: no need to write in build-POM, the value can be added when generating consumer POM
- project.version containing properties can be transformed to exact values in consumer-POM, particularly in CI-friendly case
- project.dependency.version can be skipped in build POM when it's a reactor-internal reference
First Step: Maven 3.7.0 build POM
As a first step to test consequences of differentiating build from consumer POM, a choice on a few differences has been made:
removal of project.version in build POM: in case the <parent/> is located at its relativePath (default: ../pom.xml), the version can be removed from build POM. groupId and artifactId are still required to ensure it is being matched with the right parent.
removal of dependencies versions in build POM: dependencies that are part of the reactor don't need a version anymore
cifriendly placeholders in versions (${sha1}, ${revision}, ${changelist}) in build POM will be resolved in consumer POM
<modules> from <project> will be removed from consumer POM
<relativePath> from <parent> will be removed from consumer POM
Future Steps
We'll define precisely in the future if other fields should be removed from the consumer POM, or if other improvements of build POM can be done without adding new fields
5 Comments
Robert Scholte
A small explanation: up until now we just have THE pom.xml, which is used for 2 things: locally for the build instructions, remote for at least dependency resolution, but also as meta-file for information about the project.
As long as the same file is used, it blocks improvements of build instructions. The first step we need to do is to split it up: have a build pom for local usage, consumer pom is the one being deployed and which is used for all other tools.
This page wants to go one step further: optimize the consumer pom. IMHO it is not up to Maven to decide which information should be removed. The only things which are potentially removable are (build)-plugins, report-plugins and plugin-management because these are instructions, all other can be useful information for others. In general, people should use the flatten-maven-plugin to strip elements from the pom they don't want to publish, Maven should not do that.
Pom model 4.0.0 cannot be fully optimized, so that's why we want to have the PDT file.
Robert Scholte
Looking at the discussion, the fundamental difference is: hboutemy want to keep dependency information only, whereas rfscholte want to remove pure maven specific build information only and use the future pdt file as the dependency information only file.
With Maven Central and the wide spreaded usage of the pom model 4.0.0 we cannot be sure for every element if and how it is used. For that reason I am less aggressive in removing elements. I don't want Maven to come into a situation where they are blamed for removing too much and that it cripples other systems or tools. Maven plugins are only called by Maven, hence we know best what to do with it.
Herve Boutemy
no no: yes only on the fact that there is a misunderstanding.
And since there is no answers on questions to really dig into eventual problems, but instead vague general assertions, the discussion goes nowhere
I'll continue the study with the useful feedback I had until now: we'll see if understanding comes from a demo
Christian Schulte
It's been some time the PDT file discussion took place on dev@. I'd like to add here that one of the reasons we discussed the need for PDT files is that we want to deploy never-changing resolution results instead of resolution "recipes" relying on Maven to resolve everything again the same way it did during building when consuming. We currently cannot fix resolution related bugs or introduce different resolution strategies. IMHO we should stop thinking about model version 4.0.0 and should keep things the way they are. Users already can use the flatten plugin. What we should do is focus on the PDT file idea and on a new build POM. Maybe even stop thinking about XML. PDT file can be some binary file format using some compression technique etc.
Herve Boutemy
we do agree on the target (without going into technical implementation details): I'm just trying to get a first implementation that requires less work, by using generated cleaned POMv4 and flatten-maven-plugin before going to the next step where the generation will do also PDTs. This first step would introduce experience before being more ambitious on the new PDT format.