...
These instructions are for end users. With these instructions you can install Apache cTAKES, configure it, and use it to process text (typically text associated with a medical record). If you were planning to expand, change, or modify the code within cTAKES, refer to the cTAKES 3.1 2 Developer Install Guide.
These instructions will cover installation and a test of the main product including trained models for sentence detection and tagging parts of speech, dictionaries from a subset of the UMLS, the LVG resource, etc. Optional components are described in the Component Use Guide.
...
Step | Example | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. Make sure you have Java 1.6 7 or higher. Most systems come with Java already installed.
| Windows:
|
...
Step | Example | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. Navigate to the cTAKES downloads page on the Apache site and download the binary package. Select a mirror site and press the Change button to modify the URL to your desired mirror location before doing the download or accept the default.
| Windows:
| |||||||||||||||||
2. (Optional but recommended) Verify the downloaded files against a file signature to ensure you have the proper and complete file. | No example | |||||||||||||||||
3. Unzip the file you downloaded into a directory that you want to be the cTAKES install location. The compressed files contain a single directory at the top level. This folder we will call <cTAKES_HOME>. You will need to refer to this directory later.
Linux:
| Windows:
| |||||||||||||||||
4. Download the cTAKES resources ZIP file with a matching version from the ctakesresources project (More information on cTAKES models). These resources are required to operate cTAKES.
| Windows:
| |||||||||||||||||
5. Copy (or move) the resources to cTAKES_HOME.
| Windows:
Linux:
Mac OSX:
|
(
...
Recommended) Add UMLS access rights
Note |
---|
In the initial setup cTAKES will recognize only few sample concepts in text. If you wish to perform named entity recognition or concept identification for anything other than these few words, you will need to 1) obtain the rights to use UMLS resources 2) add those credentials to cTAKES, and 3) use an aggregate that makes use of those UMLS resources. If you don't, cTAKES will work but won't recognize much. |
Step | Example | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. If you do not have a UMLS username and password, you may request one at UMLS Terminology Services. | No example | |||||||||||||||||
2. Edit the following files. Find the line in each script that runs java and add the ctakes.umlsuser and ctakes.umlspw parameters to the java command with your credentials. Make sure you substitute your actual ID and password if you cut and paste the example.
Linux:
|
For example, if your username and password were literally myusername and mypassword, you could insert them before the -cp option so the start of the java command would look like this:
|
Process documents using cTAKES
...
Step | Example | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. Open a command prompt and change to the cTAKES_HOME directory.
|
Linux:
| ||||||||||||
2. Start the CAS Visual Debugger by running this command: | Windows:
Linux:
| ||||||||||||
3. Copy the example text from the next cell in this table and paste the contents into the Text section of CVD, replacing the text that is already there. |
| ||||||||||||
4. An analysis engine (AE) needs to be loaded in order to process text.
rather thanuse
Use the Run-> Load AE menu bar command. Navigate to the file
Click Open.
| |||||||||||||
5. From the menu bar, click Run -> Run AggregatePlaintextProcessor or "Run AggregatePlaintextUMLSProcessor".AggregatePlaintextFastUMLSProcessor.
Note: If you would like to TEST some simple annotators to ensure it's working without UMLS, you can just load: /desc/ctakes-core/desc/analysis_egine/SentencesAndTokensAggregate.xml | |||||||||||||
6. You'll get a list of all the annotations for this clinical document in the Analysis Results frame. Annotations such as named entities, division by sentence, etc from the pipeline are viewable. To see one, in the Analysis Results frame, click on the key in front of:
This will show an AnnotationIndex in the lower frame. Select any annotation in that lower frame and you will see the text discovered in
Now select items in the lower frame to see the text being annotated. |
...
Step | Example | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. Open a command prompt and change to the cTAKES_HOME directory:
|
Linux:
| ||||||||||||
2. Create a directory for some test data. | Windows:mkdir testdata | ||||||||||||
3. Download this sample file and place it into the testdata directory. | No example | ||||||||||||
4. Start the collection processing engine by running this command: | Windows:
Linux:
| ||||||||||||
5. This will bring up the Collection Processing Engine Configurator. In the Menu bar click File >Open CPE Descriptor | |||||||||||||
6. Navigate to the following file, which uses the AggregateCdaProcessor
| No example | ||||||||||||
7. Change the Collection reader input directory to testdata, which contains a CDA file(s). | |||||||||||||
8. Click the Play button (green/blue play arrow near the bottom).
| |||||||||||||
9. You should see that one document was processed. You did process a collection of documents. In this case the collection only contained one just to show how to do it. Close the results window.
| |||||||||||||
10. Close the CPE application. You may be prompted to save changes. Since this was just a test you may click the No button. | No example |
...
Annotator | Description | Example Aggregate Analysis Engine (AE) | Example Collection processing Engine (CPE) | |||||
---|---|---|---|---|---|---|---|---|
Clinical Document Pipeline | The complete cTAKES pipeline to obtain majority of cTAKES annotations | <cTAKES_HOME>/desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml | <cTAKES_HOME>/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/test1.xml | |||||
Chunker | Obtain cTAKES chunk annotations | NA | NA | |||||
Dependency Parser | Obtain dependency parsing tree | <cTAKES_HOME>/desc/ctakes-dependency-parser/desc/analysis_engine/ClearParserSRLTokenizedInfPosAggregate.xml | <cTAKES_HOME>/desc/ctakes-dependency-parser/desc/collection_processing_engine/ClearParserTestCPE.xml | Drug NER | The annotator to obtain drug annotationsdependency parsing tree | <cTAKES_HOME>/desc/ctakes-drugdependency-nerparser/desc/analysis_engine/DrugAggregatePlaintextUMLSProcesorClearParserSRLTokenizedInfPosAggregate.xml | <cTAKES_HOME>/desc/ctakes-drugdependency-nerparser/desc/collection_processing_engine/DrugNER_PlainText_CPEClearParserTestCPE.xml | |
Drug NER | The annotator to obtain drug annotations | Dictionary Lookup | Mapping cTAKES annotations to dictionaries (e.g., SNOMED_CT or RxNorm | <cTAKES_HOME>/desc/ctakes-dictionarydrug-lookupner/desc/analysis_engine/TestAggregateTAE.xml | NA | PAD Term Spotter | Identifying terms related to PADDrugAggregatePlaintextUMLSProcesor.xml | <cTAKES_HOME>/desc/ctakes-paddrug-term-spotterner/desc/analysiscollection_processing_engine/RadiologyDrugNER_PlainText_TermSpotterAnnotatorTAECPE.xml |
Dictionary Lookup | Mapping cTAKES annotations to dictionaries (e.g., SNOMED_CT or RxNorm | <cTAKES_HOME>/desc/ctakes-paddictionary-term-spotterlookup/desc/collection_processinganalysis_engine/Radiology_SampleTestAggregateTAE.xml | NA | |||||
Relation Extractor | Annotate certain relations between certain Event, Entity, and Modifier annotations | <cTAKES_HOME>/desc/ctakes-relation-extractor/desc/analysis_engine/RelationExtractorAggregate.xml | N/A | |||||
Smoking Status | The annotator to obtain document or patient-level smoking status | <cTAKES_HOME>/desc/ctakes-smoking-status/desc/analysis_engine/SimulatedProdSmokingTAE.xml | <cTAKES_HOME>/desc/ctakes-smoking-status/desc/collection_processing_engine/Sample_SmokingStatus_output_flatfile.xml | |||||
Side Effect | The annotator to find side effect mentions and sentences from clinical documents | <cTAKES_HOME>/desc/ctakes-side-effect/desc/analysis_engine/SideEffectAggregateTAE_UMLS.xml | <cTAKES_HOME>/desc/ctakes-side-effect/desc/collection_processing_engine/SideEffectCPE.xml |
...
Also, before you go on to process text in production, you will want to consider dictionaries and models. If you did not obtain the rights yet to the UMLS resources and models, you will want to do so. Be aware, the models have been trained on data that may not match your data well enough to be effective. In some cases you might want to modify the dictionaries and train models using your own data.