Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Wiki Markup
{scrollbar}

Section
Column
width65%
Panel
titleContents of this Page
Table of Contents
minLevel2
Column
Include Page
Menu cTAKES 4.0 to Include
Menu cTAKES 4.0 to Include

Overview of Dictionary Lookup

The dictionary lookup annotator finds the entries from one or more dictionaries that match the document text in some way. Within this annotator, these matches are called lookup hits.

...

Searches for a lookup hit are limited to within windows, where the window type is defined in the LookupDescriptorFile. A window can be the words that fall within the same Sentence, the same Chunk, the same LookupWindowAnnotation or any other annotation. See the clinical documents pipeline project for an example of an analysis engine (LookupWindowAnnotator.xml) that creates LookupWindowAnnotations.

Implementation of Dictionary Lookup

Starting with version 1.3, cTAKES includes UMLS (SNOMED CT and RxNorm) dictionaries. To use those dictionaries, you must have a UMLS username and password, and an internet connection (to verify your UMLS username and password). If you do not have a UMLS username and/or are not interested in those dictionaries, you can build your own or use the small sample dictionaries (see below).

...

Tip

To better understand the dictionary lookup annotator code you could start by reading the Javadoc API for the classes DictionaryLookupAnnotator.java and FirstTokenPermutationImpl.java.'

DictionaryLookupAnnotatorUMLS.xml

This uses the bundled UMLS (SNOMED CT and RxNorm) dictionaries. Before using this analysis engine descriptor, update the UMLSUser and UMLSPW parameters within this descriptor with your UMLS username and password. You will need to have an active connection to the internet so your UMLS username and password can be verified.

DictionaryLookupAnnotator.xml

This uses the small sample dictionaries. This annotator can be run out-of-the-box without modifying any parameters, but annotates a very limited set of terms such as carcinoma, aspirin, knee, and pain.

DictionaryLookupannotatorCSV.xml

This is an example of how to use a dictionary contained in a delimited file rather than a database or a Lucene index. This is only recommended for small dictionaries.

DictionaryLookupannotatorDB.xml

This is a skeleton of how you could use a dictionary contained in a database that can be accessed via a JDBC driver instead of using a Lucene index or flat file.

Sample dictionaries

This project includes two sample dictionaries that are used by default:

...

Tip

To view the contents of a Lucene index, you could use a tool such as Luke.

Creating your own dictionaries

To create a dictionary yourself, you could download a copy of the UMLS Metathesaurus and build upon the program mentioned above to create a Lucene index of the desired vocabulary.

...