Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: addDescription is uses its createAnnotatorDescription not createDescription method

...

  1. XML Descriptor files are the original method used to create pipelines in Apache UIMA™.  Though self-descriptive, they are verbose and error prone.
  2. uimaFIT™ enables creation of pipelines through Java code.  This greatly simplifies unit testing and experimentation.
  3. The PipelineBuilder class in ctakes-core is a facade for uimaFIT™ factories and objects.

  4. Piper files are a modern equivalent of the XML descriptor files.  Piper files list basic commands and parameters in a flat format.
CommandParameter 1Parameters 2-nDescription
packagepackage path 

Add to known packages. Shortens load and add specifications.

loadPiper file path Load external piper file.
setname=value<name=value ...>Add global parameter values.
cliname=char<name=char ...>

Add global parameter values based upon command-line character option values.

readerCR name<name=value ...>Set the collection reader for pipeline input data.
readFilesinput directory Set the collection reader for pipeline input data to the files in directory tree reader.
addAE or CC name<name=value ...>Add AE/CC to pipeline.
addDescriptionAE or CC name<value ...>

Add AE/CC to pipeline using its .createDescription createAnnotatorDescription method.

addLoggedAE or CC name<name=value ...>Add AE/CC to pipeline with Start/Finish logging.
addLastAE or CC name<name=value ...>Add AE/CC to the end of pipeline. Useful if the pipeline is meant to be extended.
writeXmisoutput directory Add XMI writer to the pipeline.
// or # or !comment text Line Comment.

...

  1. Create an empty text file.  The standard file extension for piper files is .piper
  2. Set a reader for your pipeline.   To set values to parameters used by the reader class, simply add one or more name=value pairs after the class name.   
    readFiles is a convenience command.  "readFiles my/data" is equivalent to "reader FileTreeReader InputDirectory=my/data".
  3. add annotation engines and cas consumers to your pipeline. To set values to parameters used by the annotation engine class, simply add one or more name=value pairs after the class name. 
  4. load common groups of components from another piper file.  See Table 2 for piper files in cTAKES.
  5. reader, load and To add multiple pipeline components from a package not standard to cTAKES, use the package command to point to it.
    reader, load and add* commands all take class names or file directories as their first parameter.  If  
    If the class is not in a standard cTAKES module's crae or  or cc package package, or a piper file is not in a standard module's pipeline/ directory then the package or path must be specified for that component / file.  Using package simplifies this
  6. Use package to simplify adding multiple pipeline components from a package not standard to cTAKES.
    Gliffy Diagram
    chromemin
    nameUse of package
  7. To easily add multiple components that use the same parameter, use Use set to assign a value to that a parameter used by following components.
    Gliffy Diagram
    nameset global value

    *A name=value pair on a component line will, for that component, override a set parameter value
  8. Set a reader for your pipeline.  readFiles is a convenience method, equivalent to "reader FileTreeReader".  Note: if you run with PiperFileRunner and do not specify a reader then FileTreeReader will be used.

  9. To add common groups of components, load another piper file.  See Table 2 for piper files in cTAKES.
  10. If you 

 

 

 

Piper fileModuleDescriptionContains
DefaultTokenizerPipeline.pipercoreComplete Tokenizer pipeline.SimpleSegmentAnnotator SentenceDetector TokenizerAnnotatorPTB
ChunkerSubPipe.piperchunkerChunker partial pipeline.Chunker ChunkAdjuster (NP,NP 1) ChunkAdjuster (NP,NP,NP 2)
AssertionSubPipe.piperassertionEntity attribute partial pipeline.ClearNLPDependencyParserAE ClearNLPSemanticRoleLabelerAE ConceptConverterAnalysisEngine AssertionAnalysisEngineFit GenericAttributeAnalysisEngine SubjectAttributeAnalysisEngine
AttributeCleartkSubPipe.piperassertionEntity attribute partial pipeline.ClearNLPDependencyParserAE ClearNLPSemanticRoleLabelerAE PolarityCleartkAnalysisEngine UncertaintyCleartkAnalysisEngine HistoryCleartkAnalysisEngine ConditionalCleartkAnalysisEngine GenericCleartkAnalysisEngine SubjectCleartkAnalysisEngine
DefaultFastPipeline.piperclinical-pipelineComplete Clinical pipeline.DefaultTokenizerPipeline.piper ContextDependentTokenizerAnnotator POSTagger ChunkerSubPipe.piper DefaultJCasTermAnnotator AttributeCleartkSubPipe.piper
  1. .
  2. cli is a special type of set that sets a parameter to some value entered by the User on a command line.

    Gliffy Diagram
    nameUse of cli

    * cli can only be used with the PiperFileRunner class, the bin/runPiperFile script or the Piper File Submitter GUI.
    * Reserved parameters unavailable for cli are listed in Table 3.

  3. addDescription is a special type of add that utilizes a component's static addDescription(..) method.  

    * Use with care as not all components have such a method.

  4. Use addLogged to ensure a component's start and finish time are logged.  This is useful for debugging and profiling some components.

  5. Use addLast to ensure that a component, such as a writer, executes at the end of a pipeline.  Multiple components can be added with addLast.
    * writeXmis is a convenience command.  "writeXmis my/output" is equivalent to "add FileTreeXmiWriter OutputDirectory=my/output".
  6. name=value pairs can accept comma-delimited arrays:   ArrayParm=this,is,an,array
    * Texts enclosed in quotes are not arrays:  NotArrayParm="this,is,just,text"

 

  1. To run a piper file from the command line, execute the script bin/runPiperFile -p path/to/piper
  2. To run a piper from code use the main(..) method of PiperFileRunner in ctakes-core, or more directly use the PiperFileReader class in ctakes-core.

  3. There are examples of piper file use in the ctakes-examples module.

  4. A piper file can also be loaded and run by the Simple Pipeline Fabricator GUI and the Piper File Submitter GUI.

 

ModulePiper fileDescription
coreDefaultTokenizerPipelineComplete Tokenizer pipeline.
chunkerChunkerSubPipeChunker partial pipeline.
assertionAssertionSubPipeEntity attribute partial pipeline.
assertionAttributeCleartkSubPipeEntity attribute partial pipeline.
clinical-pipelineDefaultFastPipelineComplete Clinical pipeline.

Table 2.  Available standard piper files in cTAKES.

 

Gliffy Diagram
 gliffy
border
true
nameDefaultFastPipeline.piper

Diagram 1.  Piper files used in the cTAKES default Clinical Pipeline.  Upper left is DefaultFastPipeline.piper

 

cliEquivalent Parameter NameDescription
-pPiperLocation of a Piper file.
-iInputDirectoryDirectory for all input files.
-oOutputDirectoryDirectory for all output files.
-sSubDirectorySubdirectory for files.
-lLookupXmlPath to fast dictionary lookup xml.

Table 3.  Reserved cli characters and their corresponding parameter names.

  

CLIEquivalent Parameter NameDescription
--userumlsUser , ctakes.umlsuserUMLS Username for dictionary lookup.
--passumlsPass , ctakes.umlspwUMLS Password for dictionary lookup.
--xmiOutOutputDirectoryAdds FileTreeXmiWriter to pipeline.

Table 4.  Additional command line parameters accepted by bin/runPiperFile and PiperFileRunner.

Info

Content by Label
showLabelsfalse
max5
spacesCTAKES
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ("pipeline","piper","custom") and type = "page" and space = "CTAKES"
labelscustom pipeline piper

...