Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The dictionaries and models used during annotation indeed are the cornerstone of quality for your results. Use the instructions below to facilitate changing the default dictionary dictionaries and models. It is not likely that the UMLS dictionaries will match to your underlying data completely. Also, it is even less likely that the text models (trained on specific data) will be the appropriate model for your data.

Dictionaries

UMLS Dictionaries

Why? cTAKES includes a simple, very limited dictionary to make functions work since annotation is dependent upon having a at least one dictionary. cTAKES DOES NOT include the UMLS dictionaries (like SNOMED-CT and RxNorm) dictionaries.). The models made available by cTAKES have been trained on a specific set of text data (a corpus) which may not match well with your text.

Dictionaries

In order to make it easy to obtain common dictionaries cTAKES maintains a SourceForge project where you can download a file with the following dictionaries:

  • The OrangeBook
  • An rxnorm_index database (a Lucene index) containing drug names from RxNorm
  • A UMLS database (using two hsqldb tables) containing anatomical sites, procedures, signs/symptoms, and disorders/diseases from SNOMED-CT (umls_ms_2011ab)

...

Note
titleNote

If you do not have a UMLS username and password, you may request one at UMLS Terminology Services

Warning

How does the user incorporate the ZIP file into what they already have whether it be a developer (Eclipse or command line) or a user.

In order to use the UMLS dictionaries shipped with cTAKES you will need to do two things:
(1) Change the UMLSUser and UMLSPW <nameValuePair> strings in these descriptor files with your UMLS username and password.

...

Building Your Own Dictionaries

It is not likely that the UMLS dictionaries will match to your underlying data completely. Other local terms may be required, etc. To install customized dictionaries for RxNorm, SNOMED-CT, or other vocabularies that are available through the UMLS, see the following posts on the cTAKES forums:

Models

Some models included in cTAKES may not represent your data distribution well. If you want to build or train your own models, please read the cTAKES 3.0 Component Use Guide, particularly:

...