On this page we would like to suggest and discuss components and tooling for the UIMA sandbox.
The sandbox was designed to host on the one hand UIMA analysis components like annotators, parser or consumers and on the other hand tooling around UIMA. The provided components are free to use and
everyone is invited to suggest new components or work on some of them.
Suggested Analysis Components
Parser
- document text parser
- provide a parser component that extracts the plain text from a PDF or HTML document using some open source libraries like PDF box for example or