Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add scrollbar

Scrollbar

Introduction

YTEX provides a generalizable framework for the computation of path finding, corpus & intrinsic information content based semantic similarity measures from any domain ontology. This page describes the usage and configuration of the YTEX Semantic Similarity Tools. For a high-level overview, refer to our paper: Semantic similarity in the biomedical domain: an evaluation across knowledge sources.

Semantic similarity measures include path finding measures based purely on path distances, and information-content based measures based on taxonomic relationships and information content (IC) of concepts, a measure of concept frequency. Semantic similarity measures utilize a concept graph where vertices represent concepts and edges represent taxonomical relationships. The similarity between concepts is computed from the length of the path between concepts and their nearest common ‘parent’. Previous studies that took advantage of a large, annotated medical corpus to estimate concept frequencies showed that IC based measures of semantic similarity outperform path finding measures. Unfortunately, large annotated corpora are not typically available for many applications. To overcome this limitation, methods that estimate IC from the structure of the concept graph have been developed and their accuracy shown to rival that of corpus-based measures.

Usage

YTEX provides a web application client, web services interface, RESTful interface, and command-line interface to compute similarity measures. The demo similarity web app is available under http://informatics.med.yale.edu/ytex.web; if you plan to use this application extensively, please install ytex locally. Please refer to Sanchez & Batet for an excellent overview of similarity measures in general, and intrinsic information content (IC) based measures in particular. We scale all measures to the unit interval; see YTEX Semantic Similarity Measures for details.

...

Note that you must perform the additional YTEX installation tasks to use this component.  You must install the UMLS if you want to create your own concept graphs.

Similarity Web App

The similarity web app allows you to select

...

The similarity web application has two pages:

Similarity Single

Compute similarities for a single concept pair. In addition to the similarity values, this page outputs the path between concepts. You can enter the text of the concept, and the application will attempt to find the corresponding concept id (CUI). Alternatively, you can simply enter the concept id.

Similarity Multiple

Similarity Multiple: Compute the similarity between multiple pairs of concepts. Enter each concept pair on a different line, and separate concepts by a comma or whitespace. The output can be exported to a CSV file or Excel spreadsheet.

Similarity Web/RESTful Services

As with the web application, you can specify the concept graph, concept pairs, and measures for which similarities should be computed. Both methods accept a list of measures; these are:

  • Path-Finding Measures
    • WUPALMER: Wu & Palmer
    • LCH: Leacock & Chodorow
    • PATH: Path
    • RADA: Rada
  • Corpus IC Based Measures:
    • LIN: Lin
  • Intrinsic IC Based Measures:
    • INTRINSIC_LIN: Intrinsic IC based Lin
    • INTRINSIC_LCH: Intrinsic IC based Leacock & Chodorow
    • INTRINSIC_PATH: Intrinsic IC based Path, identical to Jiang & Conrath
    • INTRINSIC_RADA: Intrinsic IC based Rada
    • JACCARD: Intrinsic IC based Jaccard
    • SOKAL: Intrinsic IC based Sokal & Sneath

RESTful interface

To get the similarity between a pair of concepts using the concept graph sct-umls, and the LCH and Intrinsic LCH measures:http://informatics.med.yale.edu/ytex.web/services/rest/similarity?conceptGraph=umls&concept1=C0018787&concept2=C0024109&metrics=LCH,INTRINSIC_LCH&lcs=true

...

To get the 'default' concept graph: http://informatics.med.yale.edu/ytex.web/services/rest/getDefaultConceptGraph

Web Services interface

The Web Services interface is analogous to the restful interface, but allows the computation of similarities fro multiple concept pairs. Seehttp://informatics.med.yale.edu/ytex.web/services/conceptSimilarityWebService?wsdl

Command-Line Interface

The ConceptSimilarityServiceImpl java program accepts a list of concept pairs, and outputs their similarities in a tab-delimited format. It accepts the following arguments:

...

Code Block
languagebash
cd CTAKES_HOME
bin\setenv.bat
java -cp %CLASSPATH% -Dlog4j.configuration=file:/%CTAKES_HOME%/config/log4j.xml -Xmx256m org.apache.ctakes.ytex.kernel.metric.ConceptSimilarityServiceImpl -concepts C0018787,C0024109;C0034069,C0242379 -metrics LCH,INTRINSIC_LCH

Web Interface

To start the Semantic Similarity Web Application, run CTAKES_HOME\bin\ytexweb.bat (windows) or CTAKES_HOME\bin\ytexweb.sh (linux) and open http://localhost:8080/semanticSim.jsf.

Configuration

Creating a Concept Graph

To create a concept graph, you create a properties file that contains a query that retrieves all the edges from a taxonomy. The ConceptDaoImpldoes the following:

...

You will get warnings about removing cycles. The concept graph will be stored in the CTAKES_HOME/resources/org/apache/ctakes/ytex/conceptGraph directory.

Corpus Information Content

We compute the intrinsic information content (intrinsic IC) when creating the concept graph. The InfoContentEvaluatorImpl class computes the corpus information content (corpus IC) for a given concept graph and corpus. This class takes as input a properties file that contains a query used to retrieve concept frequencies from the database; it then computes the information content of each node in the concept graph; finally it stores this in the feature_eval and feature_rank ytex database tables.

...