Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

Terms in a natural language may be ambiguous, i.e. can be mapped to multiple distinct concepts. For example, the word ‘cold’ can refer to the viral infection ‘common cold’ or the ‘sensation of cold’. YTEX implements the 'adapted lesk' method that uses semantic similarity measures to quantify how well a concept ‘fits’ in a given context. This page describes the WSD algorithm, the configuration for the SenseDisambiguatorAnnotator, and describes how to reproduce the results of our evaluation on the NLM WSD and MSH WSD data sets.

...

For a high-level overview of the WSD method we've implemented, refer to our paper: Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification.

Note that you must perform the additional YTEX installation tasks to use this component.

SenseDisambiguatorAnnotator

The SenseDisambiguatorAnnotator is an UIMA annotator integrated with cTAKES. cTAKES identifes named entities (EntityMention Annotations), which in turn can contain multiple concepts (OntologyConcept Feature Structures). The SenseDisambiguatorAnnotator disambiguates each ambiguous term (i.e. EntityMention with multiple OntologyConcepts) in a document as follows:

  • Takes all EntityMentions within a window around the ambiguous term
  • Scores candidate concepts using the semantic similarity with context concepts; the score is stored in the score attribute of theOntologyConceptthe OntologyConcept.
  • Picks the candidate concept with the highest score: sets the OntologyConcept.disambiguated attribute to true for the best concept, and false for others.

The SenseDisambiguatorAnnotator is configured via YTEXvia CTAKES_HOME/config/desc/resources/org/apache/ctakes/ytex/ytex.properties:

  • ytex.sense.windowSize - context window size. concepts from named entities +- windowSize around the target named entity are used for disambiguation. defaults to 50
  • ytex.sense.metric - measure to use. defaults to INTRINSIC_PATH. See SemanticSim_V06 for Semantic Similarity for valid values.
  • ytex.conceptGraph - concept graph to use. Defaults to sct-msh-csp-aod rxnorm (SNOMED-CT, MeSH, CRISP thesaurus, Alcohol & Other Drug ThesaurusRXNORM).

The optimal measure and concept graph depends on the application. These defaults achieved the best score on the MSH WSD data set; you might want to experiment with the LCH measure and umls concept graph: this configuration achieved the best performance on the NLM WSD data set.