Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: converted to 1.6 markup

Back to the VfsCookbook

Overview

Try using VFS to read the content of a compressed (gz) file inside of a tar file. Extract tar file objects. If they are gzip files, decompress them. Any directory structure in the tarfile is not being preserved, the contents are pulled out to the same location regardless of directory hierarchy (for the purposes of this example, all objects in the tar file have unique names, so there are no file name conflicts).

Use a multiple step two phase approach.

  1. extract gzipped file from look at each of the files in the tar file 2. decompress gzipped content to a temporary directory 3. move decompressed content to desired destination 4. remove temporary directory 5. remove gzipped file

There should be a cleaner, more direct route. Maybe someone more familiar with VFS can post better code.

This example uses Maven2. There is a pom.xml to define the project

  1. if it's a directory, recursively process it, otherwise
    • if it's a non-gzipped file, extract it to a file
    • if it's a gzipped file, decompress gzipped content to file

Conceptually there is a tar file:

No Format

archive.tar
 +- tardir/
     +- content.txt.gz
No Format

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     +- non-gzip.txt

I'd like to end up with an uncompressed file "content.txt" and "non-gzip.txt".

Sample data file

Create this sample archive.tar file with some (unix) commands along the lines of:

No Format

ls -l > content.txt
gzip content.txt
ls -l > non-gzip.txt
mkdir tardir
mv content.txt.gz non-gzip.txt tardir
tar cvf archive.tar tardir
rm -r tardir

The contents of the content.txt and non-gzip.txt files are just a directory listings, dump in anything you want here. For this example the sample archive.tar is located in the /extra/data/tryVfs directory. You can see that hardcoded in the java example below. The content.txt and non-gzip.txt files will be extracted into the same location.

Key Concepts

Building the resolveFile 'name' String

An essential ingredient for this "recipe" is the name argument for the FileSystemManager.resolveFile(String name) method. See this in the lines defining and using String gzName, line numbers 99-101 in the ExtractFromGzipInTar.java code listing below. The important work of connecting to the content.txt file inside the content.txt.gz file inside the archive.tar file is performed by

No Format

FileSystemManager fsManager = VFS.getManager();
FileObject file = fsManager.resolveFile( "gz:tar:file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz!content.txt" );

In order to build similar strings for your own purposes, you will need to understand what is going on here. The paths to the file of interest are chained together with the "!" character as a separator. At the same time the corresponding file system scheme designators ("file:", "tar:" and "gz:") should be prepended onto the front in reverse order. Taking this one step at a time, we have the full path to the archive.tar file (/extra/data/tryVfs/archive.tar), which is accessed through the normal file system *file:*

*file:*+///extra/data/tryVfs/archive.tar+

Now we will treat the file as a tar: file and navigate inside this archive by appending a "!" and specifying the path /tardir/content.txt.gz.

tar:file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz

Finally we will switch to the gz: file system to read the uncompressed content.txt (again using the "!" separator character)

gz:tar:file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz!content.txt

Generic Drill-down

On line 90 I'm giving special attention to gzip files

No Format

if (extractFile.getName().getExtension().equals("gz"))

and other types of compression like zip and bzip2 (as well as nested archives like jar and tar) will not be expanded. To generically drill down and expand zip, bzip2, jar, tar files to arbitrary depth, eliminate the "gz" specific code and use instead

No Format

if (manager.canCreateFileSystem(extractFile))
{
    FileObject innerFile = manager.createFileSystem(extractFile);
}

pom.xml Project file

This example uses Maven2. There is a pom.xml to define the project

No Format

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>gov.noaa.eds</groupId>
    <artifactId>tryVfs</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>
    <name>Try apache commons vfs</name>
    <url>http://maven.apache.org</url>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.5</source>
                    <target>1.5</target>
                </configuration>
            </plugin>
            <plugin>
                <!-- Usage: mvn assembly:assembly -->
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>gov.noaa.eds.tryVfs.MultiStep</mainClass>xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>gov.noaa.eds</groupId>
    <artifactId>tryVfs</artifactId>
    <packaging>jar</packaging>
        </manifest><version>1.0-SNAPSHOT</version>
    <name>Try apache commons vfs</name>
    <url>http://maven.apache.org</url>
         </archive><build>
        <plugins>
        </configuration>
    <plugin>
        </plugin>
        </plugins><groupId>org.apache.maven.plugins</groupId>
    </build>
    <dependencies>
        <dependency><artifactId>maven-compiler-plugin</artifactId>
            <groupId>commons-vfs</groupId>
    <configuration>
        <artifactId>commons-vfs</artifactId>
            <version>1<source>1.0<5</version>source>
        </dependency>
        <dependency>
       <target>1.5</target>
     <groupId>junit</groupId>
            <artifactId>junit<</artifactId>configuration>
            <version>3.8.1</version></plugin>
            <scope>test</scope><plugin>
        </dependency>
     </dependencies>
</project>

Content of src/main/java/gov/noaa/eds/tryVfs/MultiStep.java

No Format

package gov.noaa.eds.tryVfs;

import org.apache.commons.vfs.AllFileSelector;
import org.apache.commons.vfs.FileName;
import org.apache.commons.vfs.FileObject;
import org.apache.commons.vfs.FileSystemException;
import org.apache.commons.vfs.FileSystemManager;
import org.apache.commons.vfs.FileType;
import org.apache.commons.vfs.FileTypeSelector;
import org.apache.commons.vfs.VFS;
import org.apache.commons.vfs.provider.local.LocalFile;

/**
 * Try using VFS to read the content of a compressed (gz) file inside of
 * a tar file. Extract tar file objects. If they are gzip files, decompress them.
 * Any directory structure in the tarfile is not being preserved, the contents
 * are pulled out to the same location regardless of directory hierarchy (for
 * the purposes of this example, all objects in the tar file have unique names,
 * so there are no file name conflicts).
 *
 * Use a multiple step approach.
 * 1. extract gzipped file from tar file
 * 2. decompress gzipped content to a temporary directory
 * 3. move decompressed content to desired destination
 * 4. remove temporary directory
 * 5. remove gzipped file
 *
 * There should be a cleaner more direct route, but I haven't discovered it yet.
 */
public class MultiStep {
    FileSystemManager fsManager = null;
    static String extractDirname = "/extra/data/tryVfs";
    LocalFile extractDir = null;
    
    /**
     * Extract files from a tar file. If the file extracted is gzipped,
     * decompress it and remove the gzipped version.
     * @param args command line arguments are currently not used
     */
    public static void main( String[] args ) {   <!-- Usage: mvn assembly:assembly -->
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>gov.noaa.eds.tryVfs.ExtractFromGzipInTar</mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        MultiStep<dependency>
 msExtract = new MultiStep();
        <groupId>commons-vfs</groupId>
         try {
  <artifactId>commons-vfs</artifactId>
          msExtract.fsManager = VFS.getManager();<version>1.0</version>
        }</dependency>
 catch (FileSystemException ex) {
    <dependency>
        throw new RuntimeException("failed to get<groupId>junit</groupId>
 fsManager from VFS", ex);
        }<artifactId>junit</artifactId>
        
    <version>3.8.1</version>
    try {
       <scope>test</scope>
     msExtract.extractDir = (LocalFile) msExtract.fsManager.resolveFile("file://" </dependency>
    </dependencies>
</project>

Source Code

Content of src/main/java/gov/noaa/eds/tryVfs/ExtractFromGzipInTar.java

No Format

/*
 * ExtractFromGzipInTar.java
 */
package gov.noaa.eds.tryVfs;

import org.apache.commons.vfs.AllFileSelector;
import org.apache.commons.vfs.FileName;
import org.apache.commons.vfs.FileObject;
import org.apache.commons.vfs.FileSystemException;
import org.apache.commons.vfs.FileSystemManager;
import org.apache.commons.vfs.FileType;
import org.apache.commons.vfs.FileTypeSelector;
import org.apache.commons.vfs.VFS;
import org.apache.commons.vfs.provider.local.LocalFile;

/**
 * Try using VFS to read the content of a compressed (gz) file inside of
 * a tar file. Extract tar file objects. If they are gzip files, decompress them.
 * Any directory structure in the tarfile is not being preserved, the contents
 * are pulled out to the same location regardless of directory hierarchy (for
 * the purposes of this example, all objects in the tar file have unique names,
 * so there are no file name conflicts).
 *
 * @author Ken Tanaka
 */
public class ExtractFromGzipInTar 
{
    FileSystemManager fsManager = null;
    static String extractDirname = "/extra/data/tryVfs";
    
    /**
     * Extract files from a tar file. If the file extracted is gzipped,
     * decompress it and remove the gzipped version.
     * @param args command line arguments are currently not used
     */
    public static void main( String[] args )
    {                + extractDirname);
            if (! msExtract.extractDir.exists()) {
                msExtract.extractDir.createFolder();
            }
        } catch (FileSystemException ex) {
            throw new RuntimeException("failed to prepare extract directory " 
                    + extractDirname, ex);
        }
        
        
        /* Create a tarFile object */
        FileObject tarFile;
        try {
            System.out.println("Resolve tar file:");
            tarFile = msExtract.fsManager.resolveFile(
                    "tar:/extra/data/tryVfs/archive.tar");
            
            FileName tarFileName = tarFile.getName();
        ExtractFromGzipInTar extract = new System.out.println("  PathExtractFromGzipInTar();
      : " + tarFileName.getPath());
        try {
   System.out.println("  URI      : "extract.fsManager += tarFileNameVFS.getURIgetManager());
        } catch (ExceptionFileSystemException ex) {
            throw new RuntimeException("failed to openget tarfsManager filefrom VFS", ex);
        }
        
        /* Work on files inside tarFile */}
        FileObject[] children;
        try {
        /* Create a tarFile childrenFileObject = tarFile.getChildren();
        } catch (FileSystemException ex) {
to connect to the tarfile on disk */
        FileObject tarFile;
       throw new RuntimeException("failed to get contents of tarfile ", ex);
try {
            String tarName = }
     new String("tar:file://" + extractDirname + "/archive.tar");
   
        for (FileObject f : children) {
System.out.println("Resolve " + tarName);
            tarFile = msExtractextract.fsManager.processChildresolveFile(ftarName);
         }
   
      
    } // main( String[] args )FileName tarFileName = tarFile.getName();
    
      private void processChild(FileObject f) {
System.out.println("  Path     : "  try {+ tarFileName.getPath());
            if (f.getType() == FileType.FOLDER) {
 System.out.println("  URI      : " + tarFileName.getURI());
      // Recursively process} filescatch in(Exception thisex) folder{
            throw new RuntimeException("failed  FileObject[] children = f.getChildren(to open tar file ", ex);
        }
        for (FileObject subfile : children) {

        /* Work on files inside tarFile */
         processChild(subfile)FileObject[] children;
        try {
       }
     children = tarFile.getChildren();
        } elsecatch (FileSystemException ex) {
            throw new RuntimeException("failed to get contents FileNameof fnametarfile = f.getName(", ex);
        }
        String
 extractName = new String(this.extractDir.getName() + "/"
  for (FileObject f : children) {
            extract.processChild(f);
     + fname.getBaseName());
   }
    } // main( String[] args )
    
    System.out.println("Extracting " + extractName);private void processChild(FileObject f) {
        try {
        LocalFile   extractFile =if (LocalFile) this.fsManager.resolveFile(extractName);f.getType() == FileType.FOLDER) {
                extractFile.copyFrom(f, new AllFileSelector());
// Recursively process files in this folder
                FileObject[] children = f.getChildren();
                // if the file is gzipped, decompress it
for (FileObject subfile : children) {
                   if processChild(extractFile.getName().getExtension().equals("gz")) {subfile);
                }
    System.out.println("Decompressing " + extractName);
      } else {
             String gzName = new String("gz://" + extractFileFileName fname = f.getName().getPath());
                String extractName = new System.out.printlnString("gzName="file://" + extractDirname + gzName);"/"
                    FileObject   gzFile =+ thisfname.fsManager.resolveFilegetBaseName(gzName));
                    String fileName = extractFile.getName().getBaseName().replaceAll(".gz$", ""System.out.println("Extracting " + extractName);
                LocalFile extractFile = (LocalFile) this.fsManager.resolveFile(extractName);
                   
 // The decompressed path we want
          // if the file is gzipped, decompress it
/* line  String90 decompName*/ = newif String(thisextractFile.extractDirgetName().getNamegetExtension() + "/" .equals("gz")) {
                            + fileNameSystem.out.println("Decompressing " + extractName);
                    
                    // AThe temporary Directory
              uncompressed filename we seek
      String tmpDirname = new String(this.extractDir.getName() + "/" 
       // content.txt
                    +String fileName + ".tmp= extractFile.getName().getBaseName().replaceAll(".gz$", "");
                    
                    // A temporary file path
                    String tmpFilename = new String(tmpDirname + "/" + fileName);
 Build the direct path to the uncompressed content of the 
                    // gzip file in the tar file.
                    // Some debug lines
     gz:tar:file:///archive.tar!/tardir/content.txt.gz!content.txt
/* line 100 */      String gzName = new String("gz:" +    System.out.println("fileName   =fname.getURI() + "!" + fileName);
                    System.out.println("decompName =" + decompName);
                 FileObject gzFile = Systemthis.outfsManager.println("tmpDirname =" + tmpDirname);
resolveFile(gzName);
                    
   System.out.println("tmpFilename=" + tmpFilename);
               // The decompressed path we want
                    String decompName = new String("file://" Extracting+ extractDirname from+ gzip"/" file
 ends up with a directory containing what
                    // we want.+ fileName);
                    LocalFile tmpDirdecompFile = (LocalFile) this.fsManager.resolveFile(tmpDirnamedecompName);
                    tmpDir.copyFrom(gzFile, new FileTypeSelector(FileType.FILE));
                    
// Some debug lines
                 // Move the uncompressed file to the location desired. System.out.println("fileName   =" + fileName);
                    LocalFile tmpFile = (LocalFile) this.fsManager.resolveFile(tmpFilenameSystem.out.println("decompName =" + decompName);
                    LocalFile decompFile = (LocalFile) this.fsManager.resolveFile(decompNameSystem.out.println("gzName=" + gzName);
                    tmpFile.moveTo(decompFile);

                    // Extracting
       
             decompFile.copyFrom(gzFile, new FileTypeSelector(FileType.FILE));
     // Delete the temporary directory.
       } else {
           tmpDir.delete(new AllFileSelector());
        // just extract the non-gzip file
       
             extractFile.copyFrom(f, new AllFileSelector());
     // Delete the gzip file now that we have the uncompressed version.}
            }
        //} Notecatch that the plain file FileObject (extractFile) is used (FileSystemException ex) {
              ex.printStackTrace();
      // for deleting instead of the gzipthrow FileObjectnew (gzFile).
      RuntimeException("Error working on tarfile object " + f.getName());
        }
    } // extractFile.deleteprocessChild(newFileObject AllFileSelector());
                }
            }
        } catch (FileSystemException ex) {
            ex.printStackTrace();
            throw new RuntimeException("Error working on tarfile object " + f.getName());
        }
    } // processChild(FileObject f)
}

f)
}

Compiling

Compile the source code with

No Format

mvn assembly:assembly

This will create an executable jar file in the standard target directory.

Running

Use a command like this to run the example

No Format

java -jar target/tryVfs-1.0-SNAPSHOT-jar-with-dependencies.jar

Sample Output

No Format

Nov 7, 2007 12:22:01 PM org.apache.commons.vfs.VfsLog info
INFO: Using "/tmp/vfs_cache" as temporary files store.
Resolve tar:file:///extra/data/tryVfs/archive.tar
  Path     : /
  URI      : tar:file:///extra/data/tryVfs/archive.tar!/
Extracting file:///extra/data/tryVfs/non-gzip.txt
Extracting file:///extra/data/tryVfs/content.txt.gz
Decompressing file:///extra/data/tryVfs/content.txt.gz
fileName   =content.txt
decompName =file:///extra/data/tryVfs/content.txt
gzName=gz:tar:file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz!content.txt

In addition to the archive.tar file, there should now be content.txt and non-gzip.txt files in the same location.