Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Put sentence about install instructions as 1st sentence. add link to install instructions

Obtaining Prebuilt Dictionaries

 

The dictionaries and models used during annotation indeed are the cornerstone of quality for your results. The install instructions show you how to get the separately-downloadable ctakes-resources ctakesresources archive (which is not itself released by the Apache Software Foundation) that you need to run most of cTAKES. Those

The dictionaries and models used during annotation indeed are the cornerstone of quality for your results. Those resources include:

  • An RxNorm_index database (a Lucene index): Contains drug names from RxNorm.
  • The OrangeBook: If you are not using the drug NER pipeline, the Orange Book is used to filter out what it found in RxNorm so that only things in both RxNorm and Orange Book are annotated. If you use Drug NER, Orange Book filtering is bypassed.
  • UMLS database (using two hsqldb tables): Contains terms for anatomical sites, procedures, signs/symptoms, and disorders/diseases from SNOMED-CT, NCI Thesaurus, MeSH, and ICD-9 (umls_ms_2011ab) which have been tokenized by cTAKES.
  • The full LVG: From the lexical tools provided by the NLM for word normalization. Used to match similar words, for example the plural and singular forms of a word.

...