Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The cTAKES 3.0 Component Use Guide will help you to understand in great detail each of the cTAKES components that have been installed. In some cases you can learn how to improve the components. However, before you go on to process text in production you will need to consider dictionaries dictionaries and models used during annotation indeed are the cornerstone of quality for your results. Use the instructions below to facilitate changing the default dictionary and models. It is not likely that the UMLS dictionaries will match to your underlying data completely. Also, it is even less likely that the text models (trained on specific data) will be the appropriate model for your data.

Dictionaries

UMLS Dictionaries

cTAKES includes a simple, very limited dictionary to make functions work since annotation is dependent upon having a dictionary. cTAKES DOES NOT include UMLS (SNOMED-CT and RxNorm) dictionaries.

...

If you would like to go back to using the small sample dictionaries that do not require a UMLS username, use the DictionaryLookupAnnotator.xml (UMLS is not in the file name) Analyis Engine descriptor in your aggregate. Just removing your password from the DictionaryLookupAnnotatorUMLS.xml files will not switch you back to the small sample dictionaries.

LVG

We have successfully tested the 2008 release of the full LVG data. In order to use this release of the full LVG data you should:

  1. Download either the full version or the lite version from NIH Lexical Tools
  2. Extract the TGZ file that you downloaded with a tool like 7-zip (available online) to a temporary directory. On some operating systems, like Windows, this may need to be done in two steps, 1) to uncompress and 2) to unzip.
  3. Replace the directory <cTAKES_HOME>/resources/org/apache/ctakes/lvg/data/HSqlDb with data/HSqlDb from your extracted download. Replacing the entire directory is appropriate.
  4. In the future, you can upgrade to later versions of LVG by editing the <cTAKES_HOME>/resources/org/apache/ctakes/lvg/data/config/lvg.properties file, replacing "lvg2008" with the name of the new release.

Building Your Own Dictionaries

To install customized dictionaries for RxNorm, SNOMED-CT, or other vocabularies that are available through the UMLS, see the following posts on the cTAKES forums:

Models

Some models included in cTAKES may not represent your data distribution well. If you want to build or train your own models, please read the cTAKES 3.0 Component Use Guide, particularly:

...