Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

SentenceDetectorAnnotatorBIO.xml

 

The default sentence detector uses a discriminative classifier on a small set of candidate sentence-splitting characters. However, in clinical data a sentence break could be indicated by something as subtle as a series of spaces. This new model classifies a sequence of characters as the Beginning, Inside, or Outside (BIO) of a sentence, which allows the use of similar features as previous systems while allowing arbitrary sentence boundaries. This requires many more classification decisions but avoids major performance penalties by only classifying non-alphanumeric characters.

...