...
Try using VFS to read the content of a compressed (gz) file inside of a tar file. Extract tar file objects. If they are gzip files, decompress them. Any directory structure in the tarfile is not being preserved, the contents are pulled out to the same location regardless of directory hierarchy (for the purposes of this example, all objects in the tar file have unique names, so there are no file name conflicts).
Use a multiple step two phase approach.
- extract gzipped file from look at each of the files in the tar file 2. decompress gzipped content to a temporary directory 3. move decompressed content to desired destination 4. remove temporary directory 5. remove gzipped file
There should be a cleaner, more direct route. Maybe someone more familiar with VFS can post better code.
- if it's a directory, recursively process it, otherwise
- if it's a non-gzipped file, extract it to a file
- if it's a gzipped file, decompress gzipped content to file
Conceptually there is a tar file:
No Format |
---|
archive.tar
+- tardir/
+- content.txt.gz
+- non-gzip.txt
|
I'd like to end up with an uncompressed file "content.txt" and "non-gzip.txt".
Sample data file
Create this sample archive.tar
file with some (unix) commands along the lines of:
No Format |
---|
ls -l > contextcontent.txt gzip content.txt ls -l > non-gzip.txt mkdir tardir mv content.txt.gz non-gzip.txt tardir tar cvf archive.tar tardir rm -r tardir |
For this example the The contents of the content.txt
and non-gzip.txt
files are just a directory listings, dump in anything you want here. For this example the sample archive.tar
is located in the /extra/data/tryVfs
directory. You can see that hardcoded in the java example below. The content.txt
file and non-gzip.txt
files will be extracted into the same location.
pom.xml Project file
This example uses Maven2. There is a pom.xml
to define the project
Key Concepts
Building the resolveFile 'name' String
An essential ingredient for this "recipe" is the name argument for the FileSystemManager.resolveFile(String name)
method. See this in the lines defining and using String gzName
, line numbers 99-101 in the ExtractFromGzipInTar.java code listing below. The important work of connecting to the content.txt file inside the content.txt.gz file inside the archive.tar file is performed by
No Format |
---|
FileSystemManager fsManager = VFS.getManager();
FileObject file = fsManager.resolveFile( "gz:tar:file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz!content.txt" );
|
In order to build similar strings for your own purposes, you will need to understand what is going on here. The paths to the file of interest are chained together with the "!" character as a separator. At the same time the corresponding file system scheme designators ("file:", "tar:" and "gz:") should be prepended onto the front in reverse order. Taking this one step at a time, we have the full path to the archive.tar
file (/extra/data/tryVfs/archive.tar), which is accessed through the normal file system *file:*
*file:*+///extra/data/tryVfs/archive.tar+
Now we will treat the file as a tar: file and navigate inside this archive by appending a "!" and specifying the path /tardir/content.txt.gz
.
tar:file:///extra/data/tryVfs/archive.tar
!/tardir/content.txt.gz
Finally we will switch to the gz: file system to read the uncompressed content.txt
(again using the "!" separator character)
gz:tar:file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz
!content.txt
Generic Drill-down
On line 90 I'm giving special attention to gzip files
No Format |
---|
if (extractFile.getName().getExtension().equals("gz"))
|
and other types of compression like zip and bzip2 (as well as nested archives like jar and tar) will not be expanded. To generically drill down and expand zip, bzip2, jar, tar files to arbitrary depth, eliminate the "gz" specific code and use instead
No Format |
---|
if (manager.canCreateFileSystem(extractFile))
{
FileObject innerFile = manager.createFileSystem(extractFile);
}
|
pom.xml Project file
This example uses Maven2. There is a pom.xml
to define the project
No Format |
---|
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" |
No Format |
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>gov.noaa.eds</groupId> <artifactId>tryVfs</artifactId> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <name>Try apache commons vfs</name> <url>http://maven.apache.org</url> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.5</source> <target>1.5</target> </configuration> </plugin> <plugin> <!-- Usage: mvn assembly:assembly --> <artifactId>maven-assembly-plugin</artifactId> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <archive> <manifest> <mainClass>gov.noaa.eds.tryVfs.MultiStep</mainClass> </manifest> </archive> </configuration> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>commons-vfs</groupId> <artifactId>commons-vfs</artifactId> <version>1.0</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope>xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> </dependency><modelVersion>4.0.0</modelVersion> </dependencies> </project> |
Source Code
Content of src/main/java/gov/noaa/eds/tryVfs/MultiStep.java
No Format |
---|
/* * MultiStep.java */ package gov.noaa.eds.tryVfs; import org.apache.commons.vfs.AllFileSelector; import org.apache.commons.vfs.FileName; import org.apache.commons.vfs.FileObject; import org.apache.commons.vfs.FileSystemException; import org.apache.commons.vfs.FileSystemManager; import org.apache.commons.vfs.FileType; import org.apache.commons.vfs.FileTypeSelector; import org.apache.commons.vfs.VFS; import org.apache.commons.vfs.provider.local.LocalFile; /** * Try using VFS to read the content of a compressed (gz) file inside of * a tar file. Extract tar file objects. If they are gzip files, decompress them. * Any directory structure in the tarfile is not being preserved, the contents * are pulled out to the same location regardless of directory hierarchy (for * the purposes of this example, all objects in the tar file have unique names, * so there are no file name conflicts). * * Use a multiple step approach. * 1. extract gzipped file from tar file * 2. decompress gzipped content to a temporary directory * 3. move decompressed content to desired destination * 4. remove temporary directory * 5. remove gzipped file * * There should be a cleaner more direct route, but I haven't discovered it yet. * * @author ktanaka */ public class MultiStep { FileSystemManager fsManager = null; static String extractDirname = "/extra/data/tryVfs"; LocalFile extractDir = null; /** * Extract files from a tar file. If the file extracted is gzipped, * decompress it and remove the gzipped version. * @param args command line arguments are currently not used */ public static void main( String[] args ) { <groupId>gov.noaa.eds</groupId> <artifactId>tryVfs</artifactId> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <name>Try apache commons vfs</name> <url>http://maven.apache.org</url> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.5</source> <target>1.5</target> </configuration> </plugin> <plugin> <!-- Usage: mvn assembly:assembly --> <artifactId>maven-assembly-plugin</artifactId> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <archive> <manifest> MultiStep msExtract = new MultiStep(); <mainClass>gov.noaa.eds.tryVfs.ExtractFromGzipInTar</mainClass> try { msExtract.fsManager = VFS.getManager(); </manifest> } catch (FileSystemException ex) { </archive> throw new RuntimeException("failed to get fsManager from VFS", ex); </configuration> } </plugin> try {</plugins> </build> <dependencies> msExtract.extractDir = (LocalFile) msExtract.fsManager.resolveFile("file://" <dependency> <groupId>commons-vfs</groupId> + extractDirname); <artifactId>commons-vfs</artifactId> if (! msExtract.extractDir.exists()) {<version>1.0</version> </dependency> msExtract.extractDir.createFolder();<dependency> }<groupId>junit</groupId> } catch (FileSystemException ex) {<artifactId>junit</artifactId> throw new RuntimeException("failed to prepare extract directory " <version>3.8.1</version> <scope>test</scope> </dependency> </dependencies> </project> |
Source Code
Content of src/main/java/gov/noaa/eds/tryVfs/ExtractFromGzipInTar.java
No Format |
---|
/* * ExtractFromGzipInTar.java */ package gov.noaa.eds.tryVfs; import org.apache.commons.vfs.AllFileSelector; import org.apache.commons.vfs.FileName; import org.apache.commons.vfs.FileObject; import org.apache.commons.vfs.FileSystemException; import org.apache.commons.vfs.FileSystemManager; import org.apache.commons.vfs.FileType; import org.apache.commons.vfs.FileTypeSelector; import org.apache.commons.vfs.VFS; import org.apache.commons.vfs.provider.local.LocalFile; /** * Try using VFS to read the content of a compressed (gz) file inside of * a tar file. Extract tar file objects. If they are gzip files, decompress them. * Any directory structure in the tarfile is not being preserved, the contents * are pulled out to the same location regardless of directory hierarchy (for * the purposes of this example, all objects in the tar file have unique names, * so there are no file name conflicts). * * @author Ken Tanaka */ public class ExtractFromGzipInTar { FileSystemManager fsManager = null; static String extractDirname = "/extra/data/tryVfs"; /** * Extract files from a tar file. If the file extracted is gzipped, * decompress it and remove the gzipped version. * @param args command line arguments are currently not used */ public static void main( String[] args ) { + extractDirname, ex); } /* Create a tarFile object */ FileObject tarFile; try { System.out.println("Resolve tar file:"); tarFile = msExtract.fsManager.resolveFile( "tar:/extra/data/tryVfs/archive.tar"); FileName tarFileName = tarFile.getName(); System.out.println(" Path : " + tarFileName.getPath()); System.out.println(" URI : " + tarFileName.getURI()); } catch (Exception ex) { throw new RuntimeException("failed to open tar file ", ex); } /*ExtractFromGzipInTar Work on files inside tarFile */extract = new ExtractFromGzipInTar(); FileObject[] children; try { childrenextract.fsManager = tarFileVFS.getChildrengetManager(); } catch (FileSystemException ex) { throw new RuntimeException("failed to get contentsfsManager offrom tarfile VFS", ex); } for (FileObject f : children) { msExtract.processChild(f);} } } // main( String[] args/* ) Create a tarFile FileObject to connect to the privatetarfile voidon processChild(FileObject f) {disk */ tryFileObject {tarFile; try { if (f.getType() == FileType.FOLDER) { String tarName = new String("tar:file://" + extractDirname + "/archive.tar"); // Recursively process files in this folder System.out.println("Resolve " + tarName); FileObject[] childrentarFile = fextract.fsManager.getChildrenresolveFile(); tarName); for (FileObjectFileName subfiletarFileName : children) { = tarFile.getName(); System.out.println(" Path : " + processChildtarFileName.getPath(subfile)); System.out.println(" URI } : " + tarFileName.getURI()); } elsecatch (Exception ex) { throw new RuntimeException("failed to open FileNametar fnamefile = f.getName(", ex); } String extractName = new String(this.extractDir.getName() + "/" /* Work on files inside tarFile */ FileObject[] children; + fname.getBaseName()); try { System.out.println("Extracting " + extractName children = tarFile.getChildren(); } catch (FileSystemException ex) { LocalFile extractFile = (LocalFile) this.fsManager.resolveFile(extractName); throw new RuntimeException("failed to get contents of extractFile.copyFrom(f, new AllFileSelector()tarfile ", ex); } for (FileObject f : children) { // if the file is gzipped, decompress it extract.processChild(f); } } // main( String[] ifargs (extractFile.getName().getExtension().equals("gz"))) private void processChild(FileObject f) { try { if System.out.println("Decompressing " + extractName);(f.getType() == FileType.FOLDER) { // Recursively process files Stringin gzName = new String("gz://" + extractFile.getName().getPath());this folder FileObject[] children = System.out.println("gzName=" + gzNamef.getChildren(); for (FileObject gzFilesubfile = this.fsManager.resolveFile(gzName);: children) { String fileName = extractFile.getName().getBaseName().replaceAll(".gz$", "")processChild(subfile); } } else { // The decompressed path we want FileName fname = f.getName(); String decompNameextractName = new String(this.extractDir.getName()"file://" + extractDirname + "/" + fname.getBaseName()); System.out.println("Extracting " + fileNameextractName); LocalFile extractFile = (LocalFile) this.fsManager.resolveFile(extractName); // A temporary Directory // if the file is gzipped, decompress it /* line 90 String*/ tmpDirname = newif String(thisextractFile.extractDirgetName().getNamegetExtension() + "/" .equals("gz")) { System.out.println("Decompressing " + fileName + ".tmp"extractName); // AThe uncompressed temporaryfilename filewe pathseek String tmpFilename = new String(tmpDirname + "/" + fileName); // content.txt String fileName = extractFile.getName().getBaseName().replaceAll(".gz$", ""); // Some debug lines System.out.println("fileName =" + fileName); // Build the direct path to the uncompressed content System.out.println("decompName =" + decompName); of the // gzip System.out.println("tmpDirname =" + tmpDirname);file in the tar file. System.out.println("tmpFilename=" + tmpFilename); // gz:tar:file:///archive.tar!/tardir/content.txt.gz!content.txt /* line 100 */ String gzName = new String("gz:" + fname.getURI() + "!" + fileName); //FileObject ExtractinggzFile from gzip file ends up with a directory containing what = this.fsManager.resolveFile(gzName); // we want. LocalFile tmpDir = (LocalFile) this.fsManager.resolveFile(tmpDirname); // The decompressed path we want tmpDir.copyFrom(gzFile, new FileTypeSelector(FileType.FILE)); String decompName = new String("file://" + extractDirname + "/" // Move the uncompressed file to the location desired.+ fileName); LocalFile tmpFiledecompFile = (LocalFile) this.fsManager.resolveFile(tmpFilenamedecompName); LocalFile decompFile = (LocalFile) this.fsManager.resolveFile(decompName); // Some debug lines tmpFile.moveTo(decompFile); System.out.println("fileName =" + fileName); // Delete the temporary directory.System.out.println("decompName =" + decompName); tmpDirSystem.out.delete(new AllFileSelector())println("gzName=" + gzName); // Extracting Delete the gzip file now that we have the uncompressed version. decompFile.copyFrom(gzFile, new FileTypeSelector(FileType.FILE)); // Note that the plain file} FileObject (extractFile) is used else { // forjust deletingextract instead of the non-gzip FileObject (gzFile).file extractFile.deletecopyFrom(f, new AllFileSelector()); } } } catch (FileSystemException ex) { ex.printStackTrace(); throw new RuntimeException("Error working on tarfile object " + f.getName()); } } // processChild(FileObject f) } |
...
Sample Output
No Format |
---|
Nov 67, 2007 212:3822:5601 PM org.apache.commons.vfs.VfsLog info INFO: Using "/tmp/vfs_cache" as temporary files store. Resolve tar :file:///extra/data/tryVfs/archive.tar Path : / URI : tar:file:///extra/data/tryVfs/archive.tar!/ Extracting file:///extra/data/tryVfs/contentnon-gzip.txt.gz DecompressingExtracting file:///extra/data/tryVfs/content.txt.gz gzName=gzDecompressing file:///extra/data/tryVfs/content.txt.gz fileName =content.txt decompName =file:///extra/data/tryVfs/content.txt tmpDirname =gzName=gz:tar:file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz!content.tmp tmpFilename=file:///extra/data/tryVfs/content.txt.tmp/content.txt txt |
In addition to the archive.tar
file, there should now be content.txt
and non-gzip.txt
files in the same location.