Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

A place to collect ideas for the next version of UiMA Java core.  

A nice way to see what's new on a page is to click "view change" on the tools icon (top right), pick page history, then select the top two versions and click compare.  

...

line right underneath the title above. From the compare page you can progressively (with one click) compare previous versions too.

Here's a place to assemble a "spec" of what might actually be in version 3: UimaV3Spec

Table of Contents

Defining UIMA's value proposition(s) from a Data perspective

The UIMA project's mission says in part it's related to the "spec" in OASIS which was more a spec about a wire-format (i.e., serialization format) for UIMA, based on XMI (XML Metadata Interchange) standard.  XMI has not caught on very well.  This topic is to flesh out from a data interchange point of view what the important things are.

Topic is here. 

Framework interoperability

...

  • To get around "reflection" slowness: 
    • Support set/get by int <- class <- feature-name-string
    • Support set/get (bulk) ? <ordering among fields significant?>
    • possibly use something like ReflectASM which is like Java reflection but has a byte-code generator and is much faster (but probably not as fast as custom support code compiled into the Java Cover class).

Problem with name clash with existing non-JCas class

There are use cases where JCas cover classes are not being used for some classes, yet the users define a class named identically to a JCas cover class.  This is permitted in UIMA v2. 

For example, you could have a class x.y.z.ConceptType which was defined as a Java enum.  You could also have a UIMA type, x.y.z.ConceptType, and work with it without using JCas APIs.

One possible approach is to map the uima type name to a special java class name for these use cases so there's no collision; of course, the user would need to use the non-JCas APIs for this type. 

Problem with use-case of changing TypeSystems

This has one serious issue, not yet solved, illustrated by the use case: 

  1.  make a pipeline, 
  2. deserialize some CAS's type system, and then deserialize that CAS
  3. do some generic processing on that CAS
  4. repeat 2 and 3 in a loop, with different type systems each time.

Setting up the merged type system and generating the Java class definitions means that those classes might need to be replaced, but they might be linked to the existing code.

Data Model (Types and Features) adapters

...

Currently users may customize their JCas cover classes.  PEAR classpath isolation allows the use case where different customizations are present in one pipeline.  The current implementation supports this, and switches the set of JCas cover classes as Pear boundaries are crossed.  The idea of a Feature Structure being an instance of its cover class breaks down when multiple definitions of this exist.  Some ideas for fixing this.

Alternatives: generating JCas definitions from merged type systems

There are two approaches - more dynamic and less dynamic. 

  • Have a separate step, run outside of the UIMA runtime environment, which generates the full set of JCas classes (except the built-ins), from the merged type system
    • Configure the JVM classpath to include these classes typically at the front of the classpath.
  • Have an integrated approach, based on classloaders, that generate classes at type system merge (or lazily) and load them either all at once or via a special version of UIMAClassLoader, lazily.

More here.

Support parallel execution of components (if they don't depend on each other)

This would require parallel implementations of many of the internal data structures (e.g., indexes), which come at a cost, so this should be configurable, or better yet, automatically managed. 

We could even consider implementing parallel capable versions of some internal UIMA Types (Lists, arrays, and Maps if we add that).

Consider ideas from other popular big-data frameworks: Hadoop, Spark

...

(Unlikely) Making the element of the "stream" be a new CAS - replacement for CAS Multipliers. Seems like the wrong granularity...  Maybe best to let Java evolve this for a few more releases.

New packaging support using component boundaries

Some new capabilities may benefit from specifying boundary actions.  Some possible actions:

  • If a PEAR defines an external resource for use within the pear, it could put the impl classes within the PEAR classpath boundaries. see 
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyUIMA-4499
  • If a component defined some customization of JCas for some types, and we implement this via hooks, the hooks could be inserted/withdrawn at the boundaries.  This is similar to switching JCas implementations at PEAR boundaries, but applies outside of PEARS and on finer grain size (e.

...

  • g., just one component).
  • A component uses some adapter(s) for type system differences, within the component boundaries

...

Additional capabilities

Integration with popular component reuse systems (e.g., Maven)

...

Better support for "run-time" dynamic typing

Moving towards Adding support "dynamic" typing - see paper: http://aclweb.org/anthology/W14-5209.  An interesting thought is to add this without giving up the compile-time speed and checking advantages of statically strong typing.  The result would be some kind of hybrid, with more performance available to fully specified static definitions.

Supporting "combining specifications" that map type systems

...