...
Cool as a one line format translator is, we are actually going to have to do a little more work to create an extractor capable of producing metadata for CAS-Curator. A requirement for metadata extractors that are to be integrated with CAS-Curator is that they product three pieces of metadata:
ProductType
FileLocation
Filename
We should note that this is NOT a general requirement of all metadata extractors, but a ramification of the current implementation of CAS-Curator. In order to product this extra metadata, we will develop a small Python script:
Code Blocknoformat |
---|
#!/usr/bin/python import os import sys fullPath = sys.argv[1] pathElements = fullPath.split("/"); fileName = pathElements[len(pathElements)-1] fileLocation = fullPath[:(len(fullPath)-len(fileName))] productType = "MP3" cmd = "java -jar /Users/woollard/Desktop/extractors/mp3extractor/" cmd += "tika-app-1.4.jar -m "+fullPath+" | awk -F:" cmd += " 'BEGIN {print \"<cas:metadata xmlns:cas=" cmd += "\\\"http://oodt.jpl.nasa.gov/1.0/cas\\\">\"}" cmd += " {print \"<keyval><key>\"$1\"</key><val>\"substr($2,1)\"" cmd += "</val></keyval>\"}' > "+fileName+".met" os.system(cmd) f = open(fileName+".met", 'a') f.write('<keyval><key>ProductType</key><val>'+productType) f.write('</val></keyval>\n<keyval><key>Filename</key><val>') f.write(fileName+'</val></keyval>\n<keyval><key>FileLocation') f.write('</key><val>'+fileLocation+'</val></keyval>\n') f.write('</cas:metadata>') f.close() |
...