Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Found a more direct method of extracting files and simplified the code example

...

Try using VFS to read the content of a compressed (gz) file inside of a tar file. Extract tar file objects. If they are gzip files, decompress them. Any directory structure in the tarfile is not being preserved, the contents are pulled out to the same location regardless of directory hierarchy (for the purposes of this example, all objects in the tar file have unique names, so there are no file name conflicts).

Use a multiple step two phase approach.

  1. extract gzipped file from look at each of the files in the tar file 2. decompress gzipped content to a temporary directory 3. move decompressed content to desired destination 4. remove temporary directory 5. remove gzipped file

...

  1. if it's a directory, recursively process it, otherwise
    • if it's a non-gzipped file, extract it to a file
    • if it's a gzipped file, decompress gzipped content to file

Conceptually there is a tar file:

No Format
archive.tar
 +- tardir/
     +- content.txt.gz
     +- non-gzip.txt

I'd like to end up with an uncompressed file "content.txt" and "non-gzip.txt".

Sample data file

Create this sample archive.tar file with some (unix) commands along the lines of:

No Format
ls -l > context.txt
gzip content.txt
ls -l > non-gzip.txt
mkdir tardir
mv content.txt.gz non-gzip.txt tardir
tar cvf archive.tar tardir
rm -r tardir

The content contents of the content.txt file is just and non-gzip.txt files are just a directory listinglistings, dump in anything you want here. For this example the sample archive.tar is located in the /extra/data/tryVfs directory. You can see that hardcoded in the java example below. The content.txt and non-gzip.txt file files will be extracted into the same location.

...

No Format
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>gov.noaa.eds</groupId>
    <artifactId>tryVfs</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>
    <name>Try apache commons vfs</name>
    <url>http://maven.apache.org</url>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.5</source>
                    <target>1.5</target>
                </configuration>
            </plugin>
            <plugin>
                <!-- Usage: mvn assembly:assembly -->
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>gov.noaa.eds.tryVfs.MultiStep<ExtractFromGzipInTar</mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <groupId>commons-vfs</groupId>
            <artifactId>commons-vfs</artifactId>
            <version>1.0</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

...

Content of src/main/java/gov/noaa/eds/tryVfs/MultiStepExtractFromGzipInTar.java

No Format
/*
 * MultiStepExtractFromGzipInTar.java
 */
package gov.noaa.eds.tryVfs;

import org.apache.commons.vfs.AllFileSelector;
import org.apache.commons.vfs.FileName;
import org.apache.commons.vfs.FileObject;
import org.apache.commons.vfs.FileSystemException;
import org.apache.commons.vfs.FileSystemManager;
import org.apache.commons.vfs.FileType;
import org.apache.commons.vfs.FileTypeSelector;
import org.apache.commons.vfs.VFS;
import org.apache.commons.vfs.provider.local.LocalFile;

/**
 * Try using VFS to read the content of a compressed (gz) file inside of
 * a tar file. Extract tar file objects. If they are gzip files, decompress them.
 * Any directory structure in the tarfile is not being preserved, the contents
 * are pulled out to the same location regardless of directory hierarchy (for
 * the purposes of this example, all objects in the tar file have unique names,
 * so there are no file name conflicts).
 *
 * Use a multiple step approach.
 * 1. extract gzipped file from tar file
 * 2. decompress gzipped content to a temporary directory
 * 3. move decompressed content to desired destination
 * 4. remove temporary directory
 * 5. remove gzipped file
 *
 * There should be a cleaner more direct route, but I haven't discovered it yet.
 * 
 * @author ktanaka
 */
public class MultiStep {
    FileSystemManager fsManager = null;
    static String extractDirname = "/extra/data/tryVfs";
    LocalFile extractDir = null;
    
    /**
     * Extract files from a tar file. If the file extracted is gzipped,
     * decompress it and remove the gzipped version.
     * @param args command line arguments are currently not used
     */
    public static void main( String[] args ) {
        MultiStep msExtract = new MultiStep();
        
        try {
            msExtract.fsManager = VFS.getManager();
        } catch (FileSystemException ex) {
            throw new RuntimeException("failed to get fsManager from VFS", ex);
        }
        
        try {
            msExtract.extractDir = (LocalFile) msExtract.fsManager.resolveFile("file://"
                    + extractDirname);
            if (! msExtract.extractDir.exists()) {
                msExtract.extractDir.createFolder();
            }
        } catch (FileSystemException ex) {
            throw new RuntimeException("failed to prepare extract directory " 
                    + extractDirname, ex);
        }
        
        
        /* Create a tarFile object */
        FileObject tarFile;
        try {
            System.out.println("Resolve tar file:");
            tarFile = msExtract.fsManager.resolveFile(
                    "tar:/extra/data/tryVfs/archive.tar");
            
            FileName tarFileName = tarFile.getName();
            System.out.println("  Path     : " + tarFileName.getPath());
            System.out.println("  URI      : " + tarFileName.getURI());
        } catch (Exception ex) {
            throw new RuntimeException("failed to open tar file ", ex);
        }
        
        /* Work on files inside tarFile */.commons.vfs.VFS;
import org.apache.commons.vfs.provider.local.LocalFile;

/**
 * Try using VFS to read the content of a compressed (gz) file inside of
 * a tar file. Extract tar file objects. If they are gzip files, decompress them.
 * Any directory structure in the tarfile is not being preserved, the contents
 * are pulled out to the same location regardless of directory hierarchy (for
 * the purposes of this example, all objects in the tar file have unique names,
 * so there are no file name conflicts).
 *
 * @author Ken Tanaka
 */
public class ExtractFromGzipInTar 
{
    FileSystemManager fsManager = null;
    static String extractDirname = "/extra/data/tryVfs";
    
    /**
     * Extract files from a tar file. If the file extracted is gzipped,
     * decompress it and remove the gzipped version.
     * @param args command line arguments are currently not used
     */
    public static void main( String[] args )
    {
        ExtractFromGzipInTar extract = new ExtractFromGzipInTar();
        FileObject[] children;
        try {
            childrenextract.fsManager = tarFileVFS.getChildrengetManager();
        } catch (FileSystemException ex) {
            throw new RuntimeException("failed to get contentsfsManager offrom tarfile VFS", ex);
        }
        
        for (FileObject f : children) {
        /* Create a  msExtract.processChild(f);
        }tarFile FileObject to connect to the tarfile on disk */
        FileObject tarFile;
    } // main( String[] argstry ){
    
    private void processChild(FileObject f) {
String tarName = new String("tar:file://" + extractDirname  try {+ "/archive.tar");
            if (f.getType() == FileType.FOLDER) {System.out.println("Resolve " + tarName);
            tarFile = extract.fsManager.resolveFile(tarName);
  // Recursively process files in this folder
    
            FileObject[]FileName childrentarFileName = ftarFile.getChildrengetName();
            System.out.println("  Path  for (FileObject subfile : children) {
   : " + tarFileName.getPath());
            System.out.println("  URI      : " +  processChild(subfiletarFileName.getURI());
        } catch (Exception      }ex) {
            }throw else {
      new RuntimeException("failed to open tar file ", ex);
          FileName fname = f.getName(); }
        
        String/* extractNameWork =on new String(this.extractDir.getName() + "/"
files inside tarFile */
        FileObject[] children;
        try {
       + fname.getBaseName());
    children = tarFile.getChildren();
        } catch System.out.println("Extracting " + extractName);
(FileSystemException ex) {
            throw new RuntimeException("failed to get LocalFilecontents extractFileof = (LocalFile) this.fsManager.resolveFile(extractNametarfile ", ex);
        }
        extractFile.copyFrom(f, new AllFileSelector());
        for (FileObject f : children)    {
             extract.processChild(f);
   // if the file is gzipped,}
 decompress it
  } // main( String[] args )
    
    private ifvoid (extractFile.getName().getExtension().equals("gz"))processChild(FileObject f) {
        try  {
          System.out.println("Decompressing " + extractName);
    if (f.getType() == FileType.FOLDER) {
            String gzName = new String("gz://" + extractFile.getName().getPath()); Recursively process files in this folder
                FileObject[] children =  System.out.println("gzName=" + gzNamef.getChildren();
                  for  (FileObject gzFilesubfile = this.fsManager.resolveFile(gzName);: children) {
                    String fileName = extractFile.getName().getBaseName().replaceAll(".gz$", "");
processChild(subfile);
                }
        
    } else {
              // The decompressedFileName pathfname we want= f.getName();
                    String decompNameextractName = new String(this.extractDir.getName()"file://" + extractDirname + "/" 
                            + fileNamefname.getBaseName());
                System.out.println("Extracting " +  extractName);
                LocalFile extractFile =  // A temporary Directory(LocalFile) this.fsManager.resolveFile(extractName);
                
    String tmpDirname = new String(this.extractDir.getName() + "/" 
     // if the file is gzipped, decompress it
                + fileName + ".tmp");if (extractFile.getName().getExtension().equals("gz")) {
                    
    System.out.println("Decompressing " + extractName);
                // A temporary file path
                    String// tmpFilenameThe =uncompressed newfilename String(tmpDirname + "/" + fileName);
we seek
                    // content.txt
                    String //fileName Some debug lines= extractFile.getName().getBaseName().replaceAll(".gz$", "");
                    System.out.println("fileName   =" + fileName);
                    System.out.println("decompName =" + decompName);
        // Build the direct path to the uncompressed content of the 
            System.out.println("tmpDirname =" + tmpDirname);
     // gzip   file in the tar file.
       System.out.println("tmpFilename=" + tmpFilename);
           // gz:tar:file:///archive.tar!/tardir/content.txt.gz!content.txt
        
            String gzName = new String("gz:" + fname.getURI() + // Extracting from gzip file ends up with a directory containing what
"!" + fileName);
                    FileObject gzFile = this.fsManager.resolveFile(gzName);
      // we want.
            
        LocalFile tmpDir = (LocalFile) this.fsManager.resolveFile(tmpDirname);
        // The decompressed path we want
       tmpDir.copyFrom(gzFile, new FileTypeSelector(FileType.FILE));
           String decompName = new String("file://" + extractDirname + "/" 
                    //  Move the uncompressed file to the location desired.+ fileName);
                    LocalFile tmpFiledecompFile = (LocalFile) this.fsManager.resolveFile(tmpFilenamedecompName);
                    LocalFile
     decompFile = (LocalFile) this.fsManager.resolveFile(decompName);
            // Some debug lines
     tmpFile.moveTo(decompFile);
               System.out.println("fileName   =" + fileName);
                    // Delete the temporary directory.System.out.println("decompName =" + decompName);
                    tmpDirSystem.out.delete(new AllFileSelector()println("gzName=" + gzName);
                    
                    // DeleteExtracting
 the gzip file now that we have the uncompressed version.
          decompFile.copyFrom(gzFile, new FileTypeSelector(FileType.FILE));
        // Note that the plain file FileObject (extractFile) is} usedelse {
                    // forjust deletingextract instead of the non-gzip FileObject (gzFile).file
                    extractFile.deletecopyFrom(f, new AllFileSelector());
                }
            }
        } catch (FileSystemException ex) {
            ex.printStackTrace();
            throw new RuntimeException("Error working on tarfile object " + f.getName());
        }
    } // processChild(FileObject f)
}

...

Sample Output

No Format
Nov 67, 2007 212:3822:5601 PM org.apache.commons.vfs.VfsLog info
INFO: Using "/tmp/vfs_cache" as temporary files store.
Resolve tar :file:///extra/data/tryVfs/archive.tar
  Path     : /
  URI      : tar:file:///extra/data/tryVfs/archive.tar!/
Extracting file:///extra/data/tryVfs/contentnon-gzip.txt.gz
DecompressingExtracting file:///extra/data/tryVfs/content.txt.gz
gzName=gzDecompressing file:///extra/data/tryVfs/content.txt.gz
fileName   =content.txt
decompName =file:///extra/data/tryVfs/content.txt
tmpDirname =gzName=gz:tar:file:///extra/data/tryVfs/content.txt.tmp
tmpFilename=file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.tmp/gz!content.txt

In addition to the archive.tar file, there should now be a content.txt file and non-gzip.txt files in the same location.