Page History

Wiki Markup
{scrollbar}

Section

Column

width	65%

Column

Include Page

	Menu cTAKES 4.0 to Include
	Menu cTAKES 4.0 to Include

Overview of Coreference Annotator

This parser is a wrapper around the OpenNLP parser. Since this component relies on other components as input (sentence and tokenizaton mainly), it contains configuration files that point at those components. These use relative path names for portability, but do require that the project be extracted at the same level as the other cTAKES components. For example, if your directory structure is:

Code Block
ctakes/core ctakes/clinical documents pipeline ctakes/...

you want it to look like the following after extracting this component:

Pre
ctakes/Coreference Annotator

Once placed there, the component can be imported to Eclipse using File > Import > Existing projects into workspace...
The constituency parser component includes a few different UIMA analysis engines (AEs) for different use cases:

AggregateParsingProcessor is mostly for testing and validation. You can run this in the UIMA Cas
Visual Debugger (CVD) by running the CVD launch configuration (in
resources/launch/UIMA_CVD-Coreference Annotator.launch (right-click on the file in the package explorer >run as... > UIMA_CVD-Coreference Annotator.launch

Once the CVD window opens, load the AE with Run->Load AE..., and navigatd to:

Coreference Annotator/desc/analysis_engines/AggregateParsingProcessor.xml

Load some text either by manually entering it or with File > Open text file..., then Run > Run AE.

ConstituencyParserAnnotator.xml is a standalone annotator that is meant to be incorporated into cTAKES pipelines (for example, upstream from the coreference component)

Both of the above AEs assume some pre-processing as input, namely Sentence and Token segmentation. They also obviously depend on the quality of those components for quality output. With some notes the sentence segmenter does not reliably work and the parser will perform poorly (UPMC notes are known to cause trouble).

ParserEvaluatorAnnotator.xml is used mainly internally for evaluating the parser but may be useful for anyone else interested in parser research. It is used as part of the collection processing engine ParsingCPE.xml, which reads a line at a time from a file, where each line contains whitespace-separated tokens of a single sentence.

Parser models: This release contains two different models. The default is located in resources/parsermodel, and will be used if no configuration settings are changed. It is trained on a combination of domain-specific and general domain text. Domain specific text includes clinical notes, medpedia articles, cohort queries, clinical questions. General domain text is the Wall Street Journal section of the Penn Treebank.

There are two coreference annotators to choose from within cTAKES.

MipacqSvmCoreferenceResolverAggregate

Refer to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3384116/

MentionClusterCoreferenceAnnotator

MentionClusterCoreferenceAnnotator is new for 4.0. It uses the org/apache/ctakes/coreference/models/mention-cluster/model.jar model in the coreference-res project.

There is a convenience method for adding to UimaFit pipelines that takes a model path argument.

The second model is in resources/fastmodel. This model is trained only on the in-domain data. As a result, our preliminary (unpublished) experiments showed it to be a little less accurate and a little faster.

Space shortcuts

Child pages

Versions Compared

Old Version 1

New Version 2

Key

Overview of Coreference Annotator

MipacqSvmCoreferenceResolverAggregate

MentionClusterCoreferenceAnnotator