This page was created to gather UIMA requirements from users. Feel free to add your topics here.

Deployment support for uima-as services and pipelines over clusters for processing large amounts of work

Although we have a deployment descriptor, setting it up and tuning it to potentially varying workloads, optimizing various targets (throughput, latency, recoverability, etc.) is a manual and difficult process.

Improving transparency of UIMA pipeline operations

Currently a lot of statistical information on the operation of UIMA is available, but difficult to access. This could be fixed by developing a "console" kind of application, perhaps like a web-site, with just-in-time tutorial, overview, and drill-down capabilities that would make the operations, bottlenecks, tradeoffs etc., more obvious to interested parties.

UIMA Class Loading Extension

This page discusses a suggestion for adding classpath information to a descriptor.
An alternative might be to use other standard and widely adopted approaches for this; I'm thinking that OSGi provides this capability, along with specifying "versions" and enabling the use of repositories.

General API improvements

Improvements of FSList/FSArray management

  • make it easier to add elements
  • make it easier to iterate FSList

More support for collections of CASs

Additional Class for Collection of CASs called "CCAS"

  • CCAS will have common index for all CASs. There are faster techniques for regular expression
    based annotation on collection of documents using inverted index which can be applied on CCAS.
  • CCAS can have some kind of integration with Hadoop Distributed File System so that it
    is easier to write Map-Reduce task in Hadoop. It can be a way towards integrating UIMA
    with Hadoop.

Supporting more modularity / interoperability

Conforming to widely adopted standards (e.g., OSGi, Maven)

Versioning of Annotators, TypeSystems

Dependency specifications (including versioning)

Packaging of classpath dependencies (already in PEAR, extensions to non-Pear environments)?

Using repositories of artifacts

  • e.g. Maven or P2 repositories
  • If an artifact is referenced via it's "name" and "version", be able to retrieve that from repository if not available locally
  • use maven or maven-like local cache

    Security: signing of artifacts

Efficient CAS persistent store and loading

Currently we can serialize/deserialize CASes in xmi, xcas (old), or binary formats.

  • need to search collections of CASes with various kinds of searches
  • maybe good to persist in relational database or RDF style tables
  • need to load subset of CAS, efficiently (for small subset of large CAS)
  • No labels