Section | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
These are instructions for installation of cTAKES for developers. With these instructions you can set up your development environment with cTAKES code, change or extend the code, compile the code, and deploy. If you simply want to be a user of the software, refer to the cTAKES 3.0 User Install Guide.
Knowledge about what the cTAKES components do is not supplied by the install instructions. This is found in the cTAKES 3.0 Component Use Guide. There is no training or documentation (except for code comments) on the code itself. You must familiarize yourself with the components and then study the code on your own to be able to extend it.
...
The minimal install instructions below are short but require a lot of prerequisite setup on your own. If you need more help then follow the step by step instructions. The step by step instructions for Eclipse assume a Windows or Ubuntu Linux install environment. You will need to extrapolate for any other environments.
Warning |
---|
This page is still under construction. |
Eclipse minimal install instructions
Prerequisites: Java JDK 1.6+, Eclipse IDE 4.2+, subversive plugin (or svn equivalent with appropriate SVN team provider connectors), m2e plugin (or mvn equivalent)
Noteinfo |
---|
The following location is the main trunk of cTAKES. See how cTAKES treats the trunk, branches, and tags in the developer FAQs. |
- Import Project > Maven > Checkout Maven Project from SCM and use: svn and https://svn.apache.org/repos/asf/incubator/ctakes/trunk
- Select all projects.
- Wait until Eclipse downloads and builds all of your projects (it may take up to 30 minutes depending on the machine).
- The various build helpers should run jcasgen and build the projects for you. There should not be any reason to run mvn install, etc.
- Merge the version-matching resources ZIP file from http://sourceforge.net/projects/ctakesresources/files/ into your ctakes-dictionary-lookup project.
- (Optional) If you would like to launch the UIMA CVD or CPE GUI, run ctakes-clinical-pipeline/resources/launch/UIMA_<CVD | CPE>GUI--clinical_documents pipeline.launch
- (Optional) UIMA plug-ins called "UIMA Eclipse tooling and runtime support" can be installed from update site: http://www.apache.org/dist/uima/eclipse-update-site
Eclipse step by step install instructions
Preparing Java
Include Page |
---|
...
|
...
|
Preparing Eclipse
If you are going to use Eclipse for development then follow these instructions.
Step | Example | |||||
---|---|---|---|---|---|---|
1. Download and install Eclipse 4.2+. | No example | |||||
2. Subversion Eclipse plug-in (based on Subversive site). We will use the one called "Subversive - SVN Team Provider"
Expand the Collaboration category. | ||||||
3. Subversion team provider connectors 1.7+. | ||||||
4. Maven is already part of Eclipse, but more integration to Maven commands is needed.
Expand the Collaboration category. | ||||||
5. Maven SCM connector. |
Compile a release in Eclipse
Step | Example | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. Import the cTAKES projects using Maven. | |||||||||||||
2. For SCM URL use "svn" in the drop-down and this in the text field
Click Finish.
| |||||||||||||
3. Download cTAKES 3.0 Dictionaries and models.
| Windows: Code Block | | |||||||||||
|
Code Block | ||
---|---|---|
| ||
cd /tmp wget http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-3.0.1/resources.zip sudo unzip ctakes-resources-3.0.1.zip |
4. Copy ( 4. Copy (or move) the resources to cTAKES_HOME.
With Eclipse, cTAKES_HOME will be your workspace location followed by the project name "ctakes". Copy the contents of the temporary resources directory (and all sub-directories) to <cTAKES_HOME>/ctakes-dictionary-lookup/resources.
Info |
---|
There may be conflicts while taking this action. Overwrite the cTAKES_HOME files with those in the resources download. |
Windows:
Code Block | ||
---|---|---|
| ||
xcopy /s C:\temp\ctakes-resources-3.0.1\resources C:\Users\<userID>\workspace\ctakes\ctakes-dictionary-lookup\resources |
Linux:
Code Block | ||
---|---|---|
| ||
copycp -R /tmp/ctakes-resources-3.0.1/resources/* /usr/local/apache-ctakes-3.0.0-incubating<LINUX_ECLIPSE_WORKSPACE_HOME>/ctakes/ctakes-dictionary-lookup/resources/* |
5. Refresh Eclipse.
You must refresh your Eclipse projects to make sure that Eclipse knows about the new directories and data.
No example
6. Add the ctakes-dictionary-lookup/resources as a folder to the classpath.
You will need to repeat this step for ALL EACH project that you wish to use the needs access to those resources (nearly all of them do except for ctakes-relation-extractor). For example:
Open the properties on the top-level project ctakectakes-clicicalclinical-pipeline. For Eclipse, do not select the one under the ctakes project but the sibling to that at the highest level. Select Java Build Path -> Libraries tab -> Add Class Folder ... button.
Select the resources directory under the ctakes-dictionary-lookup.
Click OK on all dialogs until you are out of the sequence.
7. UMLS user ID and password.
Usually the dictionaries are required to process data. If you plan to utilize the UMLS dictionaries you must pass your UMLS user ID and password to the pipeline. There are several ways to do this - select one.
Note |
---|
If you do not have a UMLS username and password, you may request one at UMLS Terminology Services |
Environment variable - Set or export environment variable
No Format export ctakes.umlsuser=<username>, ctakes.umlspw=<password>
Add the system properties to the Java arguments for a run configuration (shown in the next cell). Navigate to ctakes-clinical-pipeline -> resources -> launch > UIMA_<CVD | CPE>GUI--clinical_documents pipeline.launch. Right-click on the launch file and select Run-As -> Run Configurations... In the Arguments tab enter these parameters in the VM. Click Apply.
No Format -Dctakes.umlsuser=<username> -Dctakes.umlspw=<password>
- Change the UMLSUser and UMLSPW <nameValuePair> strings in these descriptor files with your UMLS username and password.
Dictionary Lookup: <cTAKES_HOME>/desc/ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml* (optional) Drug NER: <cTAKES_HOME>/desc/ctakes-drug-ner/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
The following shows where in the files you would make the changes. (Do not change the <configurationParameters> by the same name.)Code Block language none <nameValuePair> <name>ctakes.umlsuser</name> <value> <string>YOUR_UMLS_USERNAME_HERE</string> </value> </nameValuePair> <nameValuePair> <name>ctakes.umlspw</name> <value> <string>YOUR_UMLS_PASSWORD_HERE</string> </value> </nameValuePair>
- Now include the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within your aggregate Analysis Engine or switch to the ones provided by cTAKES. cTAKES has provided duplicates of shipped Analysis Engine descriptors, put UMLS in the name, and placed DictionaryLookupAnnotatorUMLS.xml within them for these components:
- Dictionary Lookup
- Clinical Documents pipeline
- Drug NER
- Side Effect
- So you simply need to switch to using those descriptors. For example, if you were using AggregateCdaProcessor.xml in the Clinical Documents pipeline you would switch to using AggregateCdaUMLSProcessor.xml instead and you will now hook into the complete dictionaries.
You can, of course, modify your own aggregate Analysis Engine files and place the DictionaryLookupAnnotatorUMLS.xml Analysis Engine within them.
Since this is an in-memory database implementation, please be patient during the initial load as it could take approximately 20-30 seconds for the database to initialize.
Process documents using cTAKES
Step | Example | |||||||
---|---|---|---|---|---|---|---|---|
1. Launching the UIMA CAS Visual Debugger (CVD) or the Collection Processing Engine (CPE) from Eclipse can now be accomplished in the ctakes-clinical-pipeline project. Navigate to:
where you must select between CVD and CPE in the command. Navigate to ctakes-clinical-pipeline -> resources -> launch > UIMA_<CVD | CPE>GUI-clinical_documents pipeline. launch. Right-click on the launch file and select Run-As -> UIMA_<CVD | CPE>GUI-clinical_documents.
| ||||||||
2. (Optional) Process data. | No example |
(Optional) UIMA tools plug-in
| No example |
(Optional) UIMA tools plug-in
Developers may be interested in the Developers may be interested in the Eclipse plug-ins provided by the UIMA community. They include, for example, a UIMA component descriptor editor.
Step | Example | |||||
---|---|---|---|---|---|---|
1. Find UIMA Eclipse plug-ins.
| ||||||
2. Install UIMA Eclipse plug-ins. | ||||||
3. (optional) Verify the installation of the UIMA Plug-ins. Go to Help -> About Eclipse -> Installation Details -> Plug-ins. You will see a dialog such as that i the next cell with plug-in names starting with "UIMA Eclipse:". |
Command line minimal install instructions
Prerequisites: Java JDK 1.6+, SVN, Maven 3.0+
Noteinfo |
---|
The following location is the main trunk of cTAKES. See how cTAKES treats the trunk, branches, and tags in the developer FAQs. |
- svn co https://svn.apache.org/repos/asf/incubator/ctakes/trunk ctakes-3.0
- mvn clean compile package
- Running the mvn package command will generate a binary distribution in /ctakes-distribution/target/ctakes-<release>-bin.tar.gz/zip
- Merge the version-matching resources ZIP file from http://sourceforge.net/projects/ctakesresources/files/ into your ctakes-dictionary-lookup project.
- (Optional) If you would like to launch the UIMA CVD or CPE GUI
- with MAVEN_OPTS="-Xmx2g -Xms1g" run mvn -PrunCVD compile
For further information see the Apache Source Code Repository page.
Command line step by step install instructions
Preparing Java
Include Page |
---|
...
|
...
|
Preparing command line tools
Step | Example | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. Install an SVN client.
| Windows:
| ||||||||||||||
2. Install a Maven 3.0+ client. Unzip the file to the root drive. Unzip the file to /usr/local/apache-maven-3.0.4 which will be your MAVEN_HOME.
| Windows: | ||||||||||||||
3. Set the Maven environment variable values -
| Windows:
|
Compile a release from command line
Step | Example | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1. Checkout the cTAKES project.
The parameter on the end will be created as a new directory in your current location.
We will refer to the directory you specify at the end of the checkout command as <cTAKES_HOME>. | Windows:
Linux:
| |||||||||||||||||||
2. Download cTAKES 3.0 Dictionaries and models.
| Windows: Code Block | | ||||||||||||||||||
|
Code Block | ||
---|---|---|
| ||
cd /tmp wget http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-3.0.1/resources.zip sudo unzip ctakes-resources-3.0.1.zip |
3. Copy (or move) the resources to cTAKES_HOME.
With Eclipse, cTAKES_HOME will be your workspace location followed by the project name "ctakes". Copy the contents of the temporary resources directory (and all sub-directories) to <cTAKES_HOME>/ctakes-dictionary-lookup/resources.
Info |
---|
There may be conflicts while taking this action. Overwrite the cTAKES_HOME files with those in the resources download. |
Windows:
Code Block | ||
---|---|---|
| ||
xcopy /s C:\temp\ctakes-resources-3.0.1\resources C:\cTAKES-3.0\ctakes-dictionary-lookup\resources |
Linux:
Code Block | ||
---|---|---|
| ||
copysudo cp -R /tmp/ctakes-resources-3.0.1/resources/* /cTAKES-3.0/ctakes-dictionary-lookup/resources/* |
4. Compile the complete set.
Make sure you are in the proper directory.
Windows/Linux:
Code Block | ||
---|---|---|
| ||
mvn clean compile packagecd cTAKES-3.0 mvn clean compile |
Note |
---|
For Linux, make sure you are using the user that has access to the files in your cTAKES directory. |
Info |
---|
Instead of "compile" you can use the maven target called "package" to compile and build all the cTAKES deliverables. Package is convenient in situations like running cTAKES outside of maven with custom processes/scripts because it will bundle up all of the 3rd party and transient dependencies. |
Windows/Linux:
Code Block | ||
---|---|---|
| ||
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache cTAKES ..................................... SUCCESS [59.140s]
[INFO] Apache cTAKES common type system .................. SUCCESS [41.856s]
[INFO] Apache cTAKES utils ............................... SUCCESS [6.255s]
[INFO] Apache cTAKES core ................................ SUCCESS [17.940s]
[INFO] Apache cTAKES part-of-speech tagger ............... SUCCESS [5.148s]
[INFO] Apache cTAKES chunker ............................. SUCCESS [3.027s]
[INFO] Apache cTAKES document preprocessor ............... SUCCESS [4.118s]
[INFO] Apache cTAKES dictionary lookup ................... SUCCESS [1:14.740s]
[INFO] Apache cTAKES context dependent tokenizer ......... SUCCESS [5.975s]
[INFO] Apache cTAKES LVG lexical tools ................... SUCCESS [7.831s]
[INFO] Apache cTAKES named entity contexts ............... SUCCESS [4.743s]
[INFO] Apache cTAKES Constituency Parser ................. SUCCESS [9.516s]
[INFO] Apache cTAKES Dependency Parser ................... SUCCESS [32.386s]
[INFO] Apache cTAKES Assertion's zoner ................... SUCCESS [2.152s]
[INFO] Apache cTAKES Assertion ........................... SUCCESS [12.200s]
[INFO] Apache cTAKES ctakes-clinical-pipeline ............ SUCCESS [4.446s]
[INFO] Apache cTAKES Relation Extractor .................. SUCCESS [13.634s]
[INFO] Apache cTAKES CoReference Resolver ................ SUCCESS [8.923s]
[INFO] Apache cTAKES Drug NER ............................ SUCCESS [6.958s]
[INFO] Apache cTAKES Side Effects ........................ SUCCESS [7.566s]
[INFO] Apache cTAKES Smoking Status ...................... SUCCESS [8.377s]
[INFO] Apache cTAKES Pad Term Spotter .................... SUCCESS [9.048s]
[INFO] Apache cTAKES Temporal Information Extraction ..... SUCCESS [33.993s]
[INFO] Apache cTAKES Distribution ........................ SUCCESS [17:59.809s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 24:22.120s
[INFO] Finished at: Wed Jan 16 17:44:35 CST 2013
[INFO] Final Memory: 41M/181M
[INFO] ------------------------------------------------------------------------
... |
5. Add the resources as a folder to the classpath.
Make sure the current path or dot (.) is in your CLASSPATH environment variable accessible to the process maven is running in.
No example
6. UMLS user ID and password.
Usually the dictionaries are required to process data. If you plan to utilize the UMLS dictionaries you must pass your UMLS user ID and password to the pipeline. There are several ways to do this - select one.
Note |
---|
If you do not have a UMLS username and password, you may request one at UMLS Terminology Services |
- Environment variable - Set or export environment variables.
Refer to the Eclipse documentation above for more information. Add the system properties to the Java arguments for the maven environment.
Add these parameters to the MAVEN_OPTS environment variable in the next section as you run the commands to process documents.No Format -Dctakes.umlsuser=<username> -Dctakes.umlspw=<password>
Make the ID and password specific to you.
- Change the UMLSUser and UMLSPW <nameValuePair> strings in these descriptor files with your UMLS username and password.
Refer to the Eclipse documentation above for more information.
No example
Process documents using cTAKES
Step | Example | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1. Launching the UIMA CAS Visual Debugger (CVD) or the Collection Processing Engine (CPE) from Eclipse can now be accomplished in the ctakes-clinical-pipeline project:
Linux:
where you must select between CVD and CPE in the command. Other Run Configurations are also available in the Eclipse Run menuand CPE in the command. | |||||||||||
2. (Optional) Process data.
| No example |
Next Steps
The cTAKES 3.0 Component Use Guide will help you to understand, in great detail, each of the cTAKES components that have been installed. In some cases you can learn how to improve the components.
Also, before you go on to process text in production you will need to consider dictionaries and models. cTAKES does not distribute from Apache a complete dictionary capable of annotating production data. The models provided have been trained on data that may not match your data well enough to be effective. In most cases, you will need to modify the dictionaries and train models on your own data to be effective.