Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Wiki Markup
{style}
body {
    margin-top:     1em;
    margin-bottom:  1em;
    margin-left:    1em;
}
p {
    font-family:     "Palatino Linotype", "Times New Roman", Times, serif;
    font-size: 12pt !important;
    margin-left:  3em !important;
}

ul, ol { margin-left: 4em !important; }

h1 { border-top:solid black 1.00pt;}
h2 { margin-left: 1em; border-top:solid black .75pt; }
h3 { margin-left: 2em }
h4 { margin-left: 3em }

{style}
Table of Contents
indent2px

This page was created to gather UIMA requirements from users. Feel free to add your topics here.

Deployment support for uima-as services and pipelines over clusters for processing large amounts of work

Although we have a deployment descriptor, setting it up and tuning it to potentially varying workloads, optimizing various targets (throughput, latency, recoverability, etc.) is a manual and difficult process.

Improving transparency of UIMA pipeline operations

Currently a lot of statistical information on the operation of UIMA is available, but difficult to access. This could be fixed by developing a "console" kind of application, perhaps like a web-site, with just-in-time tutorial, overview, and drill-down capabilities that would make the operations, bottlenecks, tradeoffs etc., more obvious to interested parties.

UIMA Class Loading Extension

This page discusses a suggestion for adding classpath information to a descriptor.
An alternative might be to use other standard and widely adopted approaches for this; I'm thinking that OSGi provides this capability, along with specifying "versions" and enabling the use of repositories.

General API improvements

Improvements of FSList/FSArray management

  • make it easier to add elements
  • make it easier to iterate FSList

More support for collections of CASs

Additional Class for Collection of CASs called "CCAS"

  • CCAS will have common index for all CASs. There are faster techniques for regular expression
    based annotation on collection of documents using inverted index which can be applied on CCAS.
  • CCAS can have some kind of integration with Hadoop Distributed File System so that it
    is easier to write Map-Reduce task in Hadoop. It can be a way towards integrating UIMA
    with Hadoop.

Supporting more modularity / interoperability

Conforming to widely adopted standards (e.g., OSGi, Maven)

Versioning of Annotators, TypeSystems

Dependency specifications (including versioning)

Packaging of classpath dependencies (already in PEAR, extensions to non-Pear environments)?

Using repositories of artifacts

  • e.g. Maven or P2 repositories
  • If an artifact is referenced via it's "name" and "version", be able to retrieve that from repository if not available locally
  • use maven or maven-like local cache

    Security: signing of artifacts

Efficient CAS persistent store and loading

Currently we can serialize/deserialize CASes in xmi, xcas (old), or binary formats.

  • need to search collections of CASes with various kinds of searches
  • maybe good to persist in relational database or RDF style tables
  • need to load subset of CAS, efficiently (for small subset of large CAS)