Java uses the JAR file format for packaging application components. Originally used simply for packaging classes and their associated resources, it is now used for package types that allow embedding of other packages such as:

  • Web applications (WAR files) that may contain JAR files with classes and/or web fragments
  • Resource adapters (J2CA RAR files) that may contain JAR files with classes or native libraries
  • Enterprise archives (EAR files) that may contain JAR files, WARs or RARs (with their embedded JARs and libraries)

This nesting is typically handled by expanding the packages onto the filesystem where they can be accessed using the standard JDK APIs; however, this requires a writable filesystem with space to hold the extracted packages and takes time to perform the extractions. This has the advantage that every resource contained in the package can be identified by a URL using a scheme supported directly by the JDK (using either the "file" protocol or the "jar" protocol).

To avoid unpacking the archive, alternative mechanisms have been build that use custom URLs and ClassLoader implementations to access their content. Examples of these are the "jndi" scheme used in previous versions of Tomcat or the "onejar" scheme used by the One-Jar project. These custom schemes may not be recognized by framework libraries and may be handled incorrectly or inefficiently. This is compounded by schemes deriving from the "jar" scheme with its use of non-hierarchical URIs that require special handling.

This proposal explores an alternative implementation based on the use of the NIO FileSystem library introduced in Java 7.

A prototype implementation is available in Tomcat's sandbox at http://svn.apache.org/viewvc/tomcat/sandbox/niofs/

Requirements

The design is predicated on the ability to create FileSystem to provide a fully-functional view of an archive's content from a !Path referring to an archive. !Paths to entries in that FileSystem may be used as the basis for other archive FileSystems. Essentially, an archive can be mounted as a FileSystem and any archives it contains can in turn be mounted to form a nested hierarchy of FileSystems.

Functional Requirements

  • A FileSystem view of an archive may be created by calling the newFileSystem(Path) method on the provider.
    • The FileSystem underlying the Path must support random access via the SeekableByteChannel returned from newByteChannel()
  • The provider's newByteChannel() operation must return a SeekableByteChannel that supports random access
  • A FileSystem view of an archive may be created by calling the newFileSystem(URI) method on the provider.
    • The URI must be able to be converted to a Path using the Paths.get(URI) API.
    • The FileSystem backing such a Path must meet the constraints defined for newFileSystem(Path)
  • The URIs for Paths returned by the provider must use standard URI syntax and support resolving of relative references

Non-Functional Requirements

  • The provider will be identified by the URI scheme "archive"
  • The provider should avoid unnecessary buffering of data in memory or on disk
    • Buffering modes should be configurable by the user
  • Performance should be comparable to that achievable by extracting the archive to disk
    • Mount performance should be comparable to the time and resources taken to extract the archive's content
    • File open performance should be comparable to the time taken to open a file on the default filesystem
    • File read performance should be comparable to the time taken to read from a file on the default filesystem
    • File seek performance should be comparable to the time taken to position within a file on the default filesystem

Implementation

Zip Structure

PKWARE's documentation on the format can be found at http://www.pkware.com/documents/casestudies/APPNOTE.TXT

A Zip file is organized as a series of file entries each consisting of a header followed by data, followed by a series of "central directory" entries that reference the individual file entries, followed by a "end of central directory" or EOCD record that can be used to reference the central directory. An application wishing to access a random entry must work backwards from the end of the file to locate the EOCD record, seek to and scan the central directory entries, then seek to the individual file entry.

Individual file entries may be uncompressed (i.e STORED) or compressed using the DEFLATE algorithm (although the Zip format allows others the JDK only supports DEFLATE). Data in STORED entries may be accessed directly once the entry's offset within the archive has been retrieved from the central directory entry. However, DEFLATE stores data as a series of blocks of unknown length so positioning within a deflated entry may involve following the block chain from the beginning.

Zip files may or may may not contain entries corresponding to folders in the filesystem. This is typically transparent to applications using a ClassLoader to load classes or resources but to provide a FileSystem view these nodes must be synthesized if not present.

Zip files may contain "zombie" entries that are not located in the central directory. These can be created when a zip file is updated to replace or remove additional items. An application that sequentially scans a Zip file may incorrectly handle this (returning the older or deleted entry) unless it continues to scan the entire jar to verify an entry still appears in the central directory; due to the inherent inefficiency in that most do not. In practice, application packages are generally not modified after initial build so this error is unlikely.

Zip files may contain data in addition to the archive entries such as executable code for self-extracting archives or text comments describing the archive.

Zip Indexing

TBD

URI Structure

The FileSystem API allows a Path to be converted to a URI and a URI to be converted to a Path. The scheme component of the URI is used to identify the FileSystemProvider but the remaining components are provider specific. The Path.toUri() method mentions that the URI may be used to encode the URI of the enclosing FileSystem but does not define how that should be done. We define a mechanism based on encoding the parent in the authority component of a hierarchical URI.

Following RFC 3986, the format for a Path's URI will be:

uri           = scheme ":" "//" authority 1*( "/" segment )
authority     = host
host          = reg-name
reg-name      = *( unreserved / pct-encoded / sub-delims )
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
pct-encoded   = "%" HEXDIG HEXDIG

where the authority component is used to contain the URI of the Path to the archive. Any characters not allow in a reg-name (specifically including "/" and ":") will be pct-encoded.

As an example, the entry com/example/Main.class located in the archive /tmp/example.jar whose Path has the URI file:///tmp/example.jar would have the URI

archive:file%3a%2f%2f%2ftmp%2fexample.jar/com/example/Main.class

If that JAR was included in the WEB-INF/lib directory of a WAR located at /tmp/example.war the full compound URI of the Main class would be

archive:archive%3afile%253a%252f%252f%252ftmp%25%2fexample.war%2fWEB-INF%2flib%2fexample.jar/com/example/Main.class

URL Support

URL support is required to allow references to resources to be returned by a ClassLoader or ServletContext. To enable Path URIs to be converted to URLs, a !URLStreamHandlerFactory that supports the URI's scheme is required. When connecting, the !URLStreamHandler can convert the URL to a URI and then to a Path.

URLClassLoader Support

Classes may be loaded from an archive by converting the Path of its root directory to a URL and using a URLClassLoader:

  Path archivePath = ... ; // path to archive
  URI archiveURI = new URI("archive", archivePath.toUri().toString(), null, null, null);
  FileSystem archiveFS = FileSystems.newFileSystem(archiveURI, null);
  URL rootURL = archiveFS.getPath("/").toUri().toURL();
  ClassLoader loader = new URLClassLoader({rootURL});

ToDos

  • a way to read the zip’s central directory
  • a way to seek into a deflated zip entry without inflating the entire thing
  • is a ClassLoader from a list of Path helpful?
  • how to deal with the locking model on Windows platform
  • how to work with Paths that are directories - do we get this for free?
  • how to use the WatchService to detect changes e.g. web.xml or *.jsp touched?

Performance Measurements

TBD

Limitations in standard JDK APIs

Zip Handling

The JDK API dealing with Zip archives have not been updated to work with the NIO File APIs:

  • ZipFile's constructor only accepts a java.io.File or a String relating to a file on the default filesystem
  • A zip entry may only be accessed as a sequential InputStream rather than a SeekableByteChannel
  • A ZipInputStream may only be constructed over an InputStream rather than a SeekableByteChannel

The JDK implementation of Zip support uses the native zlib library and maps the archive into memory for direct access and performance. This has implications:

  • The archive content must be accessible from native code
  • Memory mapping a file on some operating systems (e.g. Microsoft Windows) asserts a mandatory file lock which interferes with the "overwrite to re-deploy" mechanism often used in development environments

URL Support

The jar scheme syntax is now formally defined as:

jar:<url>!/[<entry>]

The JDK libraries such as JarURLConnection do not permit the <url> component to be another jar: URL; nesting is specifically not supported.

As this does not comply with the syntax rules for standard hierarchical URIs custom parsing code is required in order to perform URL manipulation. For example, to resolve a relative URI such as a class reference, the jar: URL must be parsed to extract and manipulate the [entry] component.

JarURLConnection's getJarFile API returns a JarFile which has the same issues described in #Zip Handling.

Built-in "jar" FileSystemProvider

To provide an illustrative example of a FileSystemProvider, Sun/Oracle released a demo "ZipFS" for working with Zip archives and a version of this is included in the JDK. This implementation inherits some of the limitations from above:

  • The archive must be located on the default FileSystem
  • It uses "jar:" URIs and does not support nesting
  • The SeekableByteChannel returned by newByteChannel does not support seek operations
  • No labels