Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: saving mid-edit to backup so far... updating for 4.0

...

These instructions are for end users . With these instructions you can who want to install Apache cTAKES , configure it, and use it to process text (typically text associated with a medical record). If you were planning to expand, change, or modify the code within cTAKES, refer to the cTAKES 4.0 Developer Install Guide.

These instructions will cover installation and a test of the main product including trained models for sentence detection and tagging parts of speech, dictionaries from a subset of the UMLS, the LVG resource, etc. Optional components are described in the Component Use Guide.running one of cTAKES against some text. Optional components are described in the Component Use Guide.

Once you have finished installing cTAKES and its Once you have finished installing cTAKES and its separately-bundled resources, you will be able to see what cTAKES is capable of.

...

Step

Example

1. Make sure you have Java 1.7 or higher. Most systems come with Java already installed.

Run this command to check your version.

Windows and Linux:

Code Block
languagenone
java -version

Windows:

Code Block
languagenone
C:\>java -version 
java version "1.7.0_20" Java(TM) SE Runtime Environment (build 1..7.0_20-b02) Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)
Linux:
.


Linux:

Code Block
languagenone
tbleeker@systemuser@system:/$ java -version
java version "1.7.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.1) (6b22-1.10.1-0ubuntu1)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)

Install cTAKES


Install cTAKES

StepStep

Example

1. Navigate to On the cTAKES downloads page on the Apache site and downloads page, download the binary package. Select a mirror site and press the Change button to modify the URL to your desired mirror location before doing the download or accept the default.
Windows:
Download the ZIP file.
Linux:
Use wget to obtain the *.TAR.GZ file.
wget <URL of the file from downloads>

Info

The download time will be commensurate with ~165MB of data.

Windows:
Image Removed
Linux:

Code Block
languagenone
HTTP request sent, awaiting response... 200 OK Length: 763500777 (728M) [application/x-gzip] Saving to: `apache-ctakes-4.0.0-bin.tar.gz' 13% [===========> ] 106,548,331 1.13M/s eta 11m 9s

User Installation package.

Info

The download time will be commensurate with ~500MB of data.

 

2. (Recommended) Verify the downloaded files against a signature to ensure you have the proper and complete file.

From the following directory, download the signature file that corresponds to your download from step 1

https://www.apache.org/dist/ctakes/ctakes-4.0.0/ 

 Please do not download any of the files that end with .zip or .gz directly from apache.org/dist - use the downloads page listed in step 1 if you need to download cTAKES itself so that a mirror can be used2. (Optional but recommended) Verify the downloaded files against a file signature to ensure you have the proper and complete file.

No example

3. Unzip the file you downloaded into a directory that you want to be the cTAKES install location. The compressed files contain a single directory at the top level. This folder we will call <cTAKES_HOME>. You will need to refer to this directory later.

Windows:

Code Block
languagenone
C:\apache-ctakes-4.0.0

Linux:

Code Block
languagenone
/usr/local/apache-ctakes-4.0.0

Windows:

Linux:

Code Block
languagenone
tar -xvf apache-ctakes-4.0.0.bin.tar.gz -C /usr/local 

4. Download the cTAKES resources ZIP file with a matching version from the ctakesresources project (More information on cTAKES models). These resources are required to operate cTAKES.

Info

Due to licensing considerations, resources are hosted at an external location. For ease of installation, a single package was created with all the resources you will need. Licensing for these resources is found within the download.

Info

Download time will be commensurate with 1GB of data.


Unzip the cTAKES resources file into a temporary location.

Windows:


Linux:

Code Block
langnone
cd /tmp
wget http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-4.0.0.zip
sudo unzip ctakes-resources-4.0.0.zip

5. Copy (or move) the resources to cTAKES_HOME.
Copy the contents of the temporary resources directory (and all sub-directories) to <cTAKES_HOME>/resources.

Info

There may be conflicts while taking this action. Overwrite the cTAKES_HOME files with those in the resources download.

Windows:

Code Block
langnone
xcopy /s C:\temp\ctakes-resources-4.0.0\resources C:\apache-ctakes-4.0.0\resources

Linux:

Code Block
langnone
cp -R /tmp/resources/* /usr/local/apache-ctakes-4.0.0/resources

Mac OSX:

Code Block
langnone
ditto /tmp/resources/* /usr/local/apache-ctakes-4.0.0/resources

...

Note

In the initial setup cTAKES will recognize only few sample concepts in text. If you wish to perform named entity recognition or concept identification for anything other than these few words, you will need to 1) obtain the rights to use UMLS resources 2) add those credentials to cTAKES, and 3) use an aggregate a cTAKES pipeline that makes use of those UMLS resources. If you don't, cTAKES will work but won't recognize much.

Step

Example

1. If you do not have a UMLS username and password, you may request one at UMLS Terminology Services.

No example

2. Edit Once you have your UMLS username and password, edit the following files. Find the line in each script that runs java and add the ctakes.umlsuser and ctakes.umlspw parameters to the java command with your credentials. Make sure you substitute your actual ID and password if you cut and paste the example.

Windows:

Code Block
languagenone
<cTAKES_HOME>\bin\runctakesCVD.bat
<cTAKES_HOME>\bin\runctakesCPE.bat

Linux:

Code Block
languagenone
<cTAKES_HOME>/bin/runctakesCVD.sh
<cTAKES_HOME>/bin/runctakesCPE.sh
Code Block
languagenone
java -Dctakes.umlsuser=<YOUR_UMLS_ID_HERE> -Dctakes.umlspw=<YOUR_UMLS_PASSSWORD_HERE> -cp ... 

...

Annotator

Description

Example Aggregate Analysis Engine (AE)Piper file

Example Collection processing Engine (CPE)

Clinical Document Pipeline

The complete cTAKES pipeline to obtain majority of cTAKES annotations<cTAKES_HOME>/desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xmlpipeline to obtain concepts and their attributes

 

<cTAKES_HOME>/desc/ctakes-clinical-pipeline/desc/collection_processing_engine/test1.xml

Chunker

Obtain cTAKES chunk annotationsNA

<cTAKES_HOME>/TBD

NA

Dependency Parser

Obtain dependency parsing tree

<cTAKES_HOME>/desc/ctakes-dependency-parser/desc/analysis_engine/ClearParserSRLTokenizedInfPosAggregate.xmlTBD

<cTAKES_HOME>/desc/ctakes-dependency-parser/desc/collection_processing_engine/ClearParserTestCPE.xml

Drug NER

The annotator to obtain drug annotations<cTAKES_HOME>/desc/ctakes-drug-ner/desc/analysis_engine/DrugAggregatePlaintextUMLSProcesor.xml

 

<cTAKES_HOME>/desc/ctakes-drug-ner/desc/collection_processing_engine/DrugNER_PlainText_CPE.xml

Dictionary Lookup

Mapping cTAKES annotations to dictionaries (e.g., SNOMED_CT or RxNorm

<cTAKES_HOME>/desc/ctakes-dictionary-lookup/desc/analysis_engine/TestAggregateTAE.xmlTBD

NA

PAD Term Spotter 

Identifying terms related to PAD

<cTAKES_HOME>/desc/ctakes-pad-term-spotter/desc/analysis_engine/Radiology_TermSpotterAnnotatorTAE.xmlTBD

<cTAKES_HOME>/desc/ctakes-pad-term-spotter/desc/collection_processing_engine/Radiology_Sample.xml

Relation Extractor

Annotate certain relations between certain Event, Entity, and Modifier annotations

<cTAKES_HOME>/TBD<cTAKES_HOME>/desc/ctakes-relation-extractor/desc/analysis_engine/RelationExtractorAggregate.xml

N/A

Smoking Status

The annotator to obtain document or patient-level smoking status

<cTAKES_HOME>/desc/ctakes-smoking-status/desc/analysis_engine/SimulatedProdSmokingTAE.xmlTBD

<cTAKES_HOME>/desc/ctakes-smoking-status/desc/collection_processing_engine/Sample_SmokingStatus_output_flatfile.xml

Side Effect

The annotator to find side effect mentions and sentences from clinical documents

<cTAKES_HOME>/desc/ctakes-side-effect/desc/analysis_engine/SideEffectAggregateTAE_UMLS.xmlTBD

<cTAKES_HOME>/desc/ctakes-side-effect/desc/collection_processing_engine/SideEffectCPE.xml

 

Next Steps

The cTAKES 4.0 Component Use Guide will help you to understand , in great detail, each of the cTAKES components that have been installed. In some cases you can learn how to improve the components.

Also, before you go on to process text in production, you will want to consider dictionaries and models. If you did not obtain the rights yet to the UMLS resources and models, you will want to do so. Be aware, the models within cTAKES have been trained on data that may not match your data well enough to be effective. In some cases you might want to modify the dictionaries and train models using your own data.