Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

 

Step

 

 

Example

1. Do no try this until AFTER cTAKES 4.0 is released (this 4.0 document is still a DRAFT): On the cTAKES downloads page, download the User Installation package.

Info

The download time will be commensurate with ~650 MB of data.

 

2. (Recommended) Verify the downloaded files against a signature to ensure you have the proper and complete file.

From the following directory, download the signature file that corresponds to your download from step 1

https://www.apache.org/dist/ctakes/ctakes-4.0.0/ 

 Please do not download any of the files that end with .zip or .gz directly from apache.org/dist - use the downloads page listed in step 1 if you need to download cTAKES itself so that a mirror can be used.

No example

3. Unzip the file you downloaded into a directory that you want to be the cTAKES install location. The compressed files contain a single directory at the top level. This folder we will call <cTAKES_HOME>. It is the directory that contains subdirectories like bin, desc, resources, lib.

You will need to refer to this directory later.

Windows:

Code Block
languagenone
C:\apache-ctakes-4.0.0

Linux:

Code Block
languagenone
/usr/local/apache-ctakes-4.0.0

Windows:

Linux:

Code Block
languagenone
tar -xvf apache-ctakes-4.0.0.bin.tar.gz -C /usr/local 

4. Download the cTAKES resources ZIP file with a matching version from the ctakesresources project (More information on cTAKES models). These resources are required to operate cTAKES.

Info

Due to licensing considerations, resources are hosted at an external location. For ease of installation, a single package was created with all the resources you will need. Licensing for these resources is found within the download.

Info

Download time will be commensurate with 1GB of data.


Unzip the cTAKES resources file into a temporary location.

Windows:


Linux:

Code Block
langnone
cd /tmp
wget http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-4.0.0.zip
sudo unzip ctakes-resources-4.0.0.zip

5. Copy (or move) the resources to cTAKES_HOME.
Copy the contents of the temporary resources directory (and all sub-directories) to <cTAKES_HOME>/resources.

Info

There may be conflicts while taking this action. Overwrite the cTAKES_HOME files with those in the resources download.

Windows:

Code Block
langnone
xcopy /s C:\temp\ctakes-resources-4.0.0\resources C:\apache-ctakes-4.0.0\resources

Linux:

Code Block
langnone
cp -R /tmp/resources/* /usr/local/apache-ctakes-4.0.0/resources

Mac OSX:

Code Block
langnone
ditto /tmp/resources/* /usr/local/apache-ctakes-4.0.0/resources

...

 

 

Step

 

 

Example

1. If you do not have a UMLS username and password, you may request one at UMLS Terminology Services.

No example

2. Once you have your UMLS username and password, edit the following files. Find the lines in each script that runs java and add the ctakes.umlsuser and ctakes.umlspw parameters to the java command with your credentials. Make sure you substitute your actual ID and password if you cut and paste the example.

Windows:

Code Block
languagenone
<cTAKES_HOME>\bin\runctakesCVD.bat
<cTAKES_HOME>\bin\runctakesCPE.bat

Linux:

Code Block
languagenone
<cTAKES_HOME>/bin/runctakesCVD.sh
<cTAKES_HOME>/bin/runctakesCPE.sh

 

In the examples below, the rest of the lines after  -cp are not shown because you do not need to modify the rest of the line. Do not delete the rest of the line after -cp however.

Code Block
languagenone
java -Dctakes.umlsuser=<YOUR_UMLS_ID_HERE> -Dctakes.umlspw=<YOUR_UMLS_PASSSWORD_HERE> -cp ... 

If you use special characters in your user name or password, you may need to escape them or for windows, place the string in quotes

For example, if your username and password were literally myusername and mypassword, you could insert them before the -cp option so the start of the java command would look like this:

 

Code Block
java  -Dctakes.umlsuser=myusername  -Dctakes.umlspw=mypassword  -cp ... 

Windows:

 If you use special characters in your umls user name or password, you can place them in double-quotes:

Code Block
java  -Dctakes.umlsuser="myuser!!!!"  -Dctakes.umlspw="mypass!!!!"  -cp ... 

The rest of the line after  -cp is not shown because you do not need to modify the rest of the line. Do not delete the rest of the line after -cp however.

 Linux:

 If you use special characters in your user name or password, you may need to escape them

2a. You may also specify your UMLS Credentials as environment variables to your operating system, but the dots will need to be replaced with underscores.

Linux:

 

Code Block
languagebash
export ctakes_umlsuser=myusername
export ctakes_umlspw=mypassword

...

 

 

Step

 

 

Example

1. Open a command prompt and change to the cTAKES_HOME directory:, which is the directory that contains subdirectories like bin, desc, resources, lib.

Note

It is best if <cTAKES_HOME> is your current directory. The scripts will change directories, so being home to run the command is best.


Windows:

Code Block
languagenone
cd \apache-ctakes-4.0.0

Linux:

Code Block
languagenone
cd /usr/local/apache-ctakes-4.0.0

2. Create a directory for some test data.

mkdir testdata

3. Download this sample file and place it into the testdata directory.

No example

4. Start the collection processing engine by running this command:
The application may take a minute to start on slower hardware.

Windows:

Code Block
languagenone
bin\runctakesCPE.bat

Linux:

Code Block
languagenone
bin/runctakesCPE.sh

5. This will bring up the Collection Processing Engine Configurator. In the Menu bar click File >Open CPE Descriptor

6. Navigate to the following file, which uses the AggregateCdaProcessor

Code Block
langnone
<cTAKES_HOME>
  /desc
    /ctakes-clinical-pipeline
      /desc
        /collection_processing_engine
          /test1.xml


Click Open.

No example

7. Change the Collection Reader input directory to testdata, which contains a CDA file(s).

Within the CAS Consumers pane of the same window, change the output directory to testdata/output

8. Click the Play button (green/blue play arrow near the bottom).

Info

What just happened? You placed a sample CDA document into the input of a pipeline. The pipeline used a file system reader that will process all files in a directory. The processing was accomplished by a pipeline of cTAKES components. The AggregateCdaProcessor allows for a parameter (Chunk Creator Class) to be passed to the Chunker annotator. For each input file, one resultant file was placed into the output directory. Each output file is an XML file that includes the annotations made by each component within the pipeline.

9. You should see that one document was processed. You did process a collection of documents. In this case the collection only contained one just to show how to do it. Close the results window.

Note

This example of using the CPE GUI did not use the UMLS resources. If you wish to perform named entity recognition or concept identification for anything other than a few words, you will need to 1) obtain the rights to use UMLS resources 2) add those credentials to cTAKES, and 3) use a pipeline that makes use of those UMLS resources (see above).

10. Close the CPE application. You may be prompted to save changes. Since this was just a test you may click the No button.

No example

...

The analysis engines and collection processing engines shipped with cTAKES for some of the annotators are described in the following table.

Warning

TBD - TODO - fix this - cTAKES 3.1 binary distributions did not include test data. Loading the CPE descriptors into the CPE tool will require resetting the input and output directories. Test files could be obtained from the cTAKES 2.5 release binary distribution. Look for a testdata directory in cTAKES_HOME.

...