...
- XML Descriptor files are the original method used to create pipelines in Apache UIMA™. Though self-descriptive, they are verbose and error prone.
- uimaFIT™ enables creation of pipelines through Java code. This greatly simplifies unit testing and experimentation.
The
PipelineBuilder
class in ctakes-core is a facade for uimaFIT™ factories and objects.- Piper files are a modern equivalent of the XML descriptor files. Piper files list basic commands and parameters in a flat format.
Command | Parameter 1 | Parameters 2-n | Description |
---|---|---|---|
package | package path | Add to known packages. Shortens load and add specifications. | |
load | Piper file path | Load external piper file. | |
set | name=value | <name=value ...> | Add global parameter values. |
cli | name=char | <name=char ...> | Add global parameter values based upon command-line character option values. |
reader | CR name | <name=value ...> | Set the collection reader for pipeline input data. |
readFiles | input directory | Set the collection reader for pipeline input data to the files in directory tree reader. | |
add | AE or CC name | <name=value ...> | Add AE/CC to pipeline. |
addDescription | AE or CC name | <value ...> | Add AE/CC to pipeline using its .createDescription createAnnotatorDescription method. |
addLogged | AE or CC name | <name=value ...> | Add AE/CC to pipeline with Start/Finish logging. |
addLast | AE or CC name | <name=value ...> | Add AE/CC to the end of pipeline. Useful if the pipeline is meant to be extended. |
writeXmis | output directory | Add XMI writer to the pipeline. | |
// or # or ! | comment text | Line Comment. |
Table 1. Standard Piper commands.
A A complete runnable pipeline can be created using a reader (or readFiles) and only add commands.
Step-by-step guide
...
- Create an empty text file. The standard file extension for piper files is
.piper
. - Set a reader for your pipeline. To set values to parameters used by the reader class, simply add one or more
name=value
pairs after the class name.
* readFiles is a convenience command. "readFiles my/data
" is equivalent to "reader FileTreeReader InputDirectory=my/data
". - add annotation engines and cas consumers to your pipeline. To set values to parameters used by the annotation engine class, simply add one or more
name=value
pairs after the class name. - load common groups of components from another If applicable, start with a load of an available piper file. See Table 2 for standard piper files in cTAKES.
- reader, load and add* commands all take class names or file directories as their first parameter.
If the class is not in a standard cTAKES module's crae or cc package, or a piper file is not in a standard module's pipeline/ directory then the package or path must be specified for that component / file. - Use package to simplify adding multiple pipeline components from a package not standard to cTAKES.
Gliffy Diagram chrome min name Use of package - Use set to assign a value to a parameter used by following components.
Gliffy Diagram name set global value
*Aname=value
pair on a component line will, for that component, override a set parameter value. cli is a special type of set that sets a parameter to some value entered by the User on a command line.
Gliffy Diagram name Use of cli * cli can only be used with the
PiperFileRunner
class, thebin/runPiperFile
script or the Piper File Submitter GUI.
* Reserved parameters unavailable for cli are listed in Table 3.addDescription is a special type of add that utilizes a component's static
addDescription(..)
method.* Use with care as not all components have such a method.
Use addLogged to ensure a component's start and finish time are logged. This is useful for debugging and profiling some components.
- Use addLast to ensure that a component, such as a writer, executes at the end of a pipeline. Multiple components can be added with addLast.
* writeXmis is a convenience command. "writeXmis my/output
" is equivalent to "add FileTreeXmiWriter OutputDirectory=my/output
". name=value
pairs can accept comma-delimited arrays:ArrayParm=this,is,an,array
* Texts enclosed in quotes are not arrays:NotArrayParm="this,is,just,text"
- To run a piper file from the command line, execute the script
bin/runPiperFile -p path/to/piper
To run a piper from code use the
main(..)
method ofPiperFileRunner
in ctakes-core, or more directly use thePiperFileReader
class in ctakes-core.There are examples of piper file use in the ctakes-examples module.
- A piper file can also be loaded and run by the Simple Pipeline Fabricator GUI and the Piper File Submitter GUI.
Module | Piper file | Description | |
---|---|---|---|
core | DefaultTokenizerPipeline | Complete Tokenizer pipeline. | |
chunker | ChunkerSubPipe | Chunker partial pipeline. | |
assertion | AssertionSubPipe | Entity attribute partial pipeline. | |
assertion | AttributeCleartkSubPipe | Entity attribute partial pipeline. | |
clinical-pipeline | DefaultFastPipeline | Complete Clinical pipeline. | |
Piper file | Module | Description | Contains |
DefaultTokenizerPipeline.piper | core | Complete Tokenizer pipeline. | SimpleSegmentAnnotator SentenceDetector TokenizerAnnotatorPTB |
ChunkerSubPipe.piper | chunker | Chunker partial pipeline. | Chunker ChunkAdjuster (NP,NP 1) ChunkAdjuster (NP,NP,NP 2) |
AssertionSubPipe.piper | assertion | Entity attribute partial pipeline. | ClearNLPDependencyParserAE ClearNLPSemanticRoleLabelerAE ConceptConverterAnalysisEngine AssertionAnalysisEngineFit GenericAttributeAnalysisEngine SubjectAttributeAnalysisEngine |
AttributeCleartkSubPipe.piper | assertion | Entity attribute partial pipeline. | ClearNLPDependencyParserAE ClearNLPSemanticRoleLabelerAE PolarityCleartkAnalysisEngine UncertaintyCleartkAnalysisEngine HistoryCleartkAnalysisEngine ConditionalCleartkAnalysisEngine GenericCleartkAnalysisEngine SubjectCleartkAnalysisEngine |
DefaultFastPipeline.piper | clinical-pipeline | Complete Clinical pipeline. | DefaultTokenizerPipeline.piper ContextDependentTokenizerAnnotator POSTagger ChunkerSubPipe.piper DefaultJCasTermAnnotator AttributeCleartkSubPipe.piper |
Table 2. Available standard piper files in cTAKES.
Gliffy Diagram |
---|
| |||||
|
Diagram 1. Piper files used in the cTAKES default Clinical Pipeline. Upper left is DefaultFastPipeline.piper
cli | Equivalent Parameter Name | Description |
---|---|---|
-p | Piper | Location of a Piper file. |
-i | InputDirectory | Directory for all input files. |
-o | OutputDirectory | Directory for all output files. |
-s | SubDirectory | Subdirectory for files. |
-l | LookupXml | Path to fast dictionary lookup xml. |
Table 3. Reserved cli characters and their corresponding parameter names.
CLI | Equivalent Parameter Name | Description |
---|---|---|
--user | umlsUser , ctakes.umlsuser | UMLS Username for dictionary lookup. |
--pass | umlsPass , ctakes.umlspw | UMLS Password for dictionary lookup. |
--xmiOut | OutputDirectory | Adds FileTreeXmiWriter to pipeline. |
Table 4. Additional command line parameters accepted by bin/runPiperFile
and PiperFileRunner
.
Info |
---|
Related articles
Content by Label | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...