Status
This RFC is currently in the DRAFT state. Nothing in this RFC has been agreed or confirmed.
Contents
Introduction
The remote repository layout defines how the central repositories used by Apache Maven as well as a non-trivial number of third party clients can access the artifacts produced as versioned releases of dependent projects.
Overview
Projects
The basic unit of organisation in a repository is the project coordinates. The project coordinates consist of the pair groupId
:artifactId
.
Versions
Each project consists of at least one versioned set of artifacts. The full set of artifacts for any specific version is defined by the coordinates groupId
:artifactId
:version
.
Artifacts
Any specific version of a project will have at least one artifact. The artifacts have the following information associated with them:
type
(mandatory) - this represents the type of artifactclassifier
(optional) - when absent, the artifact is the primary artifact of that type. When present it is used to disambiguate additional artifacts. For example with Java artifacts the main artifact is typically ajar
file and the javadocs are also typically packaged as ajar
file. The main artifact (containing the.class
files etc) will have no classifier while the Javadoc artifact will have a classifier ofjavadoc
.platformId
(optional) - when absent, the artifact either is not tied to any specific platform or is associated with multiple platforms. When present, the artifact will only work on the specific platform. TheplatformId
naming is established by convention based on the type of file / project. Some examples:- A project that builds Firefox binaries will have a need to produce different binaries targeting different systems. One such artifact might be on disk a file such as
firefox-49.0-2.fc26.i686.rpm
the version would probably be49.0-2
and the platformId could befc26.i686
or some derivative (such asfedora-26.i686
; asfedora.i686
orfedora.x86
) because that specific RPM may not work on other versions of fedora, other CPU architectures or other versions of linux systems that can support the RPM packaging. Similarly there may be afirefox_49.0+ubuntu0.12.04.1_i386.deb
that may use a DEB specific scheme for deciding theplatformId
or it may suffice to use something likeubuntu.i386
. It is expected that a convention will be established by the community of users of the repository - A project that builds installers for say Apache Tomcat, would probably end up producing an RPM that does not have a
platformId
(corresponding the thenoarch
RPMS). The Apache Tomcat connector RPMs, however, would haveplatformId
s as they include platform specific code. - The JFFI project produces a jar file that bundles the native libraries for implementing its foreign function interface SPI. It would be intended that the JFFI project would not deploy its
jar
artifact with aplatformId
as thejar
artifact targets multiple platforms with a single artifact. - Regular Java and .NET projects would be expected not to use the
platformId
as the artifacts produced by such projects are typically independent of operating system (subject to the availability of their required common runtime) though there are cases where Java and .NET projects may end up producing artifacts that target specific platforms - It may be the case that a Java artifact targets e.g. a specific JavaEE container... in those cases it may make sense to use the container as the
platformId
, e.g. there may be ajboss
andweblogic
variant of the same version of the same project. It is expected in such cases that the major differences between such platform specific artifacts would be the transitive required dependencies.
- A project that builds Firefox binaries will have a need to produce different binaries targeting different systems. One such artifact might be on disk a file such as
Every artifact is thus uniquely identified by its coordinates:
groupId:artifactId:platformId:version:classifier:type
For artifacts that do not have a platformId the preferred form of coordinates is:
groupId:artifactId::version:classifier:type
For artifacts that do not have a classifier, the preferred form of coordinates is:
groupId:artifactId:platformId:version::type
For artifacts that do not have either a platformId or a classifier, the preferred form of coordinates is:
groupId:artifactId::version::type
The intermediate :: characters are critical in order to disambiguate platform aware coordinates from the previous styles of coordinates:
groupId:artifactId:version:type
and
groupId:artifactId:version:classifier:type
Repository artifact layout
There have been two previous layouts used for the repository: Maven 1 and Maven 2/3
The migration from the Maven 1 layout to the current Maven 2/3 layout was problematic and caused a large amount of pain for users. Consequently, there is little appetite for a mass migration of artifacts to a new layout. Thus the new layout will be a superposition on top of the Maven 2/3 layout.
The Maven 2/3 layout mapped artifacts from a to a repository path using the following scheme:groupId:artifactId:version:classifier:type
${groupId.replace('.','/')}/${artifactId}/${version}/${artifactId}-${version}${classifier==null?'':'-'+classifier}.${type}
The new layout will mix the artifactId and platformId together. This scheme allows for better interoperability with older clients of the remote repository that do not understand the platformId concept.
Note: An alternative scheme would be to mix the platformId either with the version or with the classifier. Both of these were rejected because:
- Platform specific artifacts are highly likely to have different dependencies. This was rejected because legacy clients would thus not be able to consume that information as the dependency tree of a classifier artifact is the same as the dependency tree of the main artifact in a Model Version 4.0.0 POM (which is all we can assume a legacy client can consume)
- There is a strong likelihood that some projects will want to depend on multiple platforms of the same project. For example projects such as JFFI may want to depend on the native libraries from compiled each platform so that those artifacts can be embedded within the
jar
file. This was rejected because legacy clients cannot depend on multiple versions of the same project under the Model Version 4.0.0 graph conflict resolution rules.
The new layout is thus:
${groupId.replace('.','/')}/${artifactId}${platformId==null?'':'-'+platformId}/${version}/${artifactId}${platformId==null?'':'-'+platformId}-${version}${classifier==null?'':'-'+classifier}.${type}
In other words, from the point of view of a legacy client, the platform specific artifacts are available from a different project at groupId:artifactId-platformId
this mirrors the current way that users of the central repository have been handling platform specific artifacts.
Repository metadata layout
For legacy clients, we need to maintain the current maven-metadata.xml
files, however, for newer clients we will provide a more flexible metadata index using JSON. In order to allow for metadata evolution, the JSON format will be subject to the following restrictions:
- Consumers must ignore unknown keys
- Producers must preserve unknown keys
- Aggregating proxies must merge all keys, where conflicts arise, the aggregating proxy will use a priority list of upstream sources to determine which value will win
The basic format will be something like:
{ "modified":"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", // always ISO 8601 extended format in UTC timezone "group":[ // present if there are any artifacts deployed using the repo path as groupId "artifactId", "artifactId:platformId", "artifactId", "artifactId:platformId", "artifactId:platformId" ], "artifact":[ // present if there are any artifacts deployed using the repo path as groupId:artifactId[:platformId] "version", "version", "version", "version" ], "org.apache.maven:plugins":[ // this is a Maven specific key, hence namespaced { "name":"...", "artifactId":"...", "prefix":"..." }, { "name":"...", "artifactId":"...", "prefix":"..." } ] }
Tool specific keys must be prefixed by the top level groupId of the tool to which they are scoped. Each tool is responsible for the structure within that key and how to handle evolution of that structure.
Some examples:
https://repo.maven.apache.org/maven2/io/github/stephenc/maven/repo-metadata.json
would be:
{ "modified":"2014-01-16T09:55:43.511Z", "group":[ "rfmm-maven-plugin" ], "org.apache.maven:plugins":[ { "name": "Release From My Machine Maven Plugin" "prefix": "rfmm" "artifactId": "rfmm-maven-plugin" } ] }
https://repo.maven.apache.org/maven2/io/github/stephenc/maven/rfmm-maven-plugin/repo-metadata.json
would be:
{ "modified":"2014-01-16T09:55:49.243Z", "artifact":[ "1.0" ] }
TODO consider a counter-proposal... the top level keys are the repository "id" and then everything else is as before. This simplifies aggregating proxies and may assist with PDT Repositories as we would then know the IDs of the content from aggregating proxies, e.g.
https://repo.maven.apache.org/maven2/io/github/stephenc/maven/repo-metadata.json
would be:
{ "central":{ "modified":"2014-01-16T09:55:43.511Z", "group":[ "rfmm-maven-plugin" ], "org.apache.maven:plugins":[ { "name": "Release From My Machine Maven Plugin" "prefix": "rfmm" "artifactId": "rfmm-maven-plugin" } ] } }
https://repo.maven.apache.org/maven2/io/github/stephenc/maven/rfmm-maven-plugin/repo-metadata.json
would be:
{ "central":{ "modified":"2014-01-16T09:55:49.243Z", "artifact":[ "1.0" ] } }
3 Comments
Robert Scholte
How about using '+' as separator in both repo layout and coordinate? The '+' is not a valid identifier character, so there won't be any collisions and it is immediately clear which part is the platformId
i.e. groupId:artifactId+platformId:version:classifier:type and org/apache/maven/artifact+myos/1.2.3/artifact+myos.jar
Stephen Connolly
Robert Scholte so the critical part is that a 4.0.0 consumer must be able to get the platform specific artifacts. If
+
is not valid inartifactId
then a 4.0.0 pom cannot depend on the platform specific artifacts which is a problem.If
+
is valid inartifactId
then it is no different than-
except we don't even get to follow the existing convention people have been "following" on central (from what I can see)Stephen Connolly
Considering my counter-proposal... we would still need to define merging strategies for the different keys... but tool specific merging strategies then become only a concern of the tool that consumes the tool specific key, i.e. we only have to document how to merge the
modified
,group
andartifact
keys... which would be respectively "use latest", "merge as a unique set" and "merge as a unique set"... with theorg.apache.maven:plugins
namespaced key... that would be an internal to Maven concern, though we would probably use something like "merge and overwrite values, in reverse order of repository id's configured by the user" so that the "first" defined repository would always "win".I kind of like this counter-proposal