Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Step

Example

1. Import the cTAKES projects using Maven.

File -> Import ... -> Maven -> Check out Maven Projects from SCM.
Click Next.


Info

The following location is the main trunk of cTAKES. See how cTAKES treats the trunk, branches, and tags in the developer FAQs.

2. For SCM URL use "svn" in the drop-down and this in the text field

Code Block
langnone
https://svn.apache.org/repos/asf/ctakes/trunk

Click Finish.
Eclipse will download and build all of the cTAKES sub-projects including running jcasgen as needed.

Info

Due to the way Maven and Eclipse work together you will see two copies of the sub-projects in Eclipse. If you look into your workspace directories there is only one set of underlying files.



3. Download cTAKES 3.0 Dictionaries and models.

Info

Due to licensing considerations and easy of installability, one download from an external location was established with all the resources you will need. Licensing for these resources is found within the download.

Info

Download time will be commensurate with 1GB of data.


Windows:
Go to cTAKES resources and download the ZIP file with a matching version from the ctakesresources project.
Unzip the files into a temporary location such as C:\temp.

Linux:
Obtain the URL of the version matching ZIP file from cTAKES resources, get the file, and unzip to a temporary location.

Windows:


Linux:

Code Block
langnone
cd /tmp
wget http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-3.0.1.zip
sudo unzip ctakes-resources-3.0.1.zip

4. Copy (or move) the resources to cTAKES_HOME.
With Eclipse, cTAKES_HOME will be your workspace location followed by the project name "ctakes". Copy the contents of the temporary resources directory (and all sub-directories) to <cTAKES_HOME>/ctakes-dictionary-lookup/resources.

Info

There may be conflicts while taking this action. Overwrite the cTAKES_HOME files with those in the resources download.

Windows:

Code Block
langnone
xcopy /s C:\temp\ctakes-resources-3.0.1\resources C:\Users\<userID>\workspace\ctakes\ctakes-dictionary-lookup\resources

Linux:

Code Block
langnone
cp -R /tmp/resources/* <LINUX_ECLIPSE_WORKSPACE_HOME>/ctakes/ctakes-dictionary-lookup/resources

5. Refresh Eclipse.

You must refresh your Eclipse projects to make sure that Eclipse knows about the new directories and data.

No example

6. Add the resources as a folder to the classpath.
You will need to repeat this step for ALL project that you wish to use the resources (nearly all of them do except for ctakes-relation-extractor). For example:

Open the properties on the top-level project ctake-clicical-pipeline. For Eclipse, do not select the one under the ctakes project but the sibling to that at the highest level. Select Java Build Path -> Libraries tab -> Add Class Folder ... button.
Select the resources directory under the ctakes-dictionary-lookup.
Click OK on all dialogs until you are out of the sequence.


7. UMLS user ID and password.
Usually the dictionaries are required to process data. If you plan to utilize the UMLS dictionaries you must pass your UMLS user ID and password to the pipeline. There are several ways to do this - select one.

Note

If you do not have a UMLS username and password, you may request one at UMLS Terminology Services

  1. Environment variable - Set or export environment variable

    No Format
    ctakes.umlsuser=<username>, ctakes.umlspw=<password>
    
  2. Add the system properties to the Java arguments for a run configuration (shown in the next cell). Navigate to ctakes-clinical-pipeline -> resources -> launch > UIMA_<CVD | CPE>GUI--clinical_documents pipeline.launch. Right-click on the launch file and select Run-As -> Run Configurations... In the Arguments tab enter these parameters in the VM. Click Apply.

    No Format
    -Dctakes.umlsuser=<username> -Dctakes.umlspw=<password>
    
  3. Change the UMLSUser and UMLSPW <nameValuePair> strings in these descriptor files with your UMLS username and password.
    • Dictionary Lookup: <cTAKES_HOME>/desc/ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml* (optional) Drug NER: <cTAKES_HOME>/desc/ctakes-drug-ner/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
      The following shows where in the files you would make the changes. (Do not change the <configurationParameters> by the same name.)
      Code Block
      languagenone
            <nameValuePair>
              <name>ctakes.umlsuser</name>
              <value>
                <string>YOUR_UMLS_USERNAME_HERE</string>
              </value>
            </nameValuePair>
            <nameValuePair>
              <name>ctakes.umlspw</name>
              <value>
                <string>YOUR_UMLS_PASSWORD_HERE</string>
              </value>
            </nameValuePair>
    • Now include the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within your aggregate Analysis Engine or switch to the ones provided by cTAKES. cTAKES has provided duplicates of shipped Analysis Engine descriptors, put UMLS in the name, and placed DictionaryLookupAnnotatorUMLS.xml within them for these components:
      • Dictionary Lookup
      • Clinical Documents pipeline
      • Drug NER
      • Side Effect 
    • So you simply need to switch to using those descriptors. For example, if you were using AggregateCdaProcessor.xml in the Clinical Documents pipeline you would switch to using AggregateCdaUMLSProcessor.xml instead and you will now hook into the complete dictionaries.

      You can, of course, modify your own aggregate Analysis Engine files and place the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within them.
      Since this is an in-memory database implementation, please be patient during the initial load as it could take approximately 20-30 seconds for the database to initialize.




...

Step

Example

1. Checkout the cTAKES project.

Info

The following location is the main trunk of cTAKES. See how cTAKES treats the trunk, branches, and tags in the developer FAQs.


Windows/Linux:

Code Block
langnone
cd /
svn co https://svn.apache.org/repos/asf/ctakes/trunk cTAKES-3.0

The parameter on the end will be created as a new directory in your current location.

Note

For Linux, make sure that you have write access to the directory that you are in.

We will refer to the directory you specify at the end of the checkout command as <cTAKES_HOME>.

Windows:

Code Block
langnone
C:\>cd /
C:\>svn co https://svn.apache.org/repos/asf/ctakes/trunk cTAKES-3.0

 ...

A ctakes-3.0\ctakes-type-system\pom.xml
A ctakes-3.0\ctakes-type-system\.settings
A ctakes-3.0\ctakes-type-system\.settings\org.eclipse.jdt.core.prefs
A ctakes-3.0\ctakes-type-system\.settings\org.eclipse.core.resources.prefs
A ctakes-3.0\ctakes-type-system\desc
A ctakes-3.0\DISCLAIMER
Checked out revision 1433729.

C:\>cd cTAKES-3.0
C:\cTAKES-3.0>

Linux:

Code Block
langnone
tbleeker@system:~$ cd /
tbleeker@system:/$ svn co https://svn.apache.org/repos/asf/ctakes/trunk cTAKES-3.0

...

A ctakes-3.0/ctakes-type-system/pom.xml
A ctakes-3.0/ctakes-type-system/.settings
A ctakes-3.0/ctakes-type-system/.settings/org.eclipse.jdt.core.prefs
A ctakes-3.0/ctakes-type-system/.settings/org.eclipse.core.resources.prefs
A ctakes-3.0/ctakes-type-system/desc
A ctakes-3.0/DISCLAIMER
Checked out revision 1434842.

tbleeker@system:/$ cd cTAKES-3.0/
tbleeker@system:/cTAKES-3.0$ 

2. Download cTAKES 3.0 Dictionaries and models.

Info

Due to licensing considerations and easy of installability, one download from an external location was established with all the resources you will need. Licensing for these resources is found within the download.

Info

Download time will be commensurate with 1GB of data.


Windows:
Go to cTAKES resources and download the ZIP file with a matching version from the ctakesresources project.
Unzip the files into a temporary location such as C:\temp.

Linux:
Obtain the URL of the version matching ZIP file from cTAKES resources, get the file, and unzip to a temporary location.

Windows:


Linux:

Code Block
langnone
cd /tmp
wget http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-3.0.1.zip
sudo unzip ctakes-resources-3.0.1.zip

3. Copy (or move) the resources to cTAKES_HOME.
With Eclipse, cTAKES_HOME will be your workspace location followed by the project name "ctakes". Copy the contents of the temporary resources directory (and all sub-directories) to <cTAKES_HOME>/ctakes-dictionary-lookup/resources.

Info

There may be conflicts while taking this action. Overwrite the cTAKES_HOME files with those in the resources download.

Windows:

Code Block
langnone
xcopy /s C:\temp\ctakes-resources-3.0.1\resources C:\cTAKES-3.0\ctakes-dictionary-lookup\resources

Linux:

Code Block
langnone
sudo cp -R /tmp/resources/* /cTAKES-3.0/ctakes-dictionary-lookup/resources

4. Compile the complete set.

Make sure you are in the proper directory.

Windows/Linux:

Code Block
langnone
cd cTAKES-3.0
mvn clean compile


Note

For Linux, make sure you are using the user that has access to the files in your cTAKES directory.

Info

Instead of "compile" you can use the maven target called "package" to compile and build all the cTAKES deliverables. Package is convenient in situations like running cTAKES outside of maven with custom processes/scripts because it will bundle up all of the 3rd party and transient dependencies.




Windows/Linux:

Code Block
langnone
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache cTAKES ..................................... SUCCESS [59.140s]
[INFO] Apache cTAKES common type system .................. SUCCESS [41.856s]
[INFO] Apache cTAKES utils ............................... SUCCESS [6.255s]
[INFO] Apache cTAKES core ................................ SUCCESS [17.940s]
[INFO] Apache cTAKES part-of-speech tagger ............... SUCCESS [5.148s]
[INFO] Apache cTAKES chunker ............................. SUCCESS [3.027s]
[INFO] Apache cTAKES document preprocessor ............... SUCCESS [4.118s]
[INFO] Apache cTAKES dictionary lookup ................... SUCCESS [1:14.740s]
[INFO] Apache cTAKES context dependent tokenizer ......... SUCCESS [5.975s]
[INFO] Apache cTAKES LVG lexical tools ................... SUCCESS [7.831s]
[INFO] Apache cTAKES named entity contexts ............... SUCCESS [4.743s]
[INFO] Apache cTAKES Constituency Parser ................. SUCCESS [9.516s]
[INFO] Apache cTAKES Dependency Parser ................... SUCCESS [32.386s]
[INFO] Apache cTAKES Assertion's zoner ................... SUCCESS [2.152s]
[INFO] Apache cTAKES Assertion ........................... SUCCESS [12.200s]
[INFO] Apache cTAKES ctakes-clinical-pipeline ............ SUCCESS [4.446s]
[INFO] Apache cTAKES Relation Extractor .................. SUCCESS [13.634s]
[INFO] Apache cTAKES CoReference Resolver ................ SUCCESS [8.923s]
[INFO] Apache cTAKES Drug NER ............................ SUCCESS [6.958s]
[INFO] Apache cTAKES Side Effects ........................ SUCCESS [7.566s]
[INFO] Apache cTAKES Smoking Status ...................... SUCCESS [8.377s]
[INFO] Apache cTAKES Pad Term Spotter .................... SUCCESS [9.048s]
[INFO] Apache cTAKES Temporal Information Extraction ..... SUCCESS [33.993s]
[INFO] Apache cTAKES Distribution ........................ SUCCESS [17:59.809s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 24:22.120s
[INFO] Finished at: Wed Jan 16 17:44:35 CST 2013
[INFO] Final Memory: 41M/181M
[INFO] ------------------------------------------------------------------------
...



5. Add the resources as a folder to the classpath.
Make sure the current path or dot (.) is in your CLASSPATH environment variable accessible to the process maven is running in.

No example

6. UMLS user ID and password.
Usually the dictionaries are required to process data. If you plan to utilize the UMLS dictionaries you must pass your UMLS user ID and password to the pipeline. There are several ways to do this - select one.

Note

If you do not have a UMLS username and password, you may request one at UMLS Terminology Services

  1. Environment variable - Set or export environment variables.
    Refer to the Eclipse documentation above for more information.
  2. Add the system properties to the Java arguments for the maven environment.
    Add these parameters to the MAVEN_OPTS environment variable in the next section as you run the commands to process documents.
    No Format
    -Dctakes.umlsuser=<username> -Dctakes.umlspw=<password>
    
    Make the ID and password specific to you.
  3. Change the UMLSUser and UMLSPW <nameValuePair> strings in these descriptor files with your UMLS username and password.
    Refer to the Eclipse documentation above for more information.

No example

...