Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are several kinds of APIs for this.  One 

 

  • Basic: this was the original API, and makes use of UIMA Feature and Type objects as arguments.
  • JCas: this is an API that uses common Java idioms for creating, getting, and setting. 
 Descriptioncreate exampleget a valueset a value
Plain

Uses UIMA
Type and Feature
Instances

API: CAS

casView.createFS(aType)

casView.createXXArray(size),
XX was the type. 

fs.getIntValue(aFeature)

fs.get(index) when fs is
one of the built-in arrays 

fs.setFloatValue(aFeature, value)

fs.set(index, value) when fs is
one of the built-in arrays 

JCasFollows Java conventions,
Types and features must
be known at compile time 
new MyType()
  • can have additional constructors 

fs.getMyFeature()

fs.getMyArrayFeature(index)
when the value of myArrayFeature
is
one of the built-in arrays  

fs.get(index) when fs is
one of the built-in arrays 

fs.setMyFeature(value)

fs.setMyArrayFeature(index, value)

fs.set(index, value)

Low
Level 

In version 2 this allowed
CAS access without making
any Java objects; there was
much less "checking" and
it was for high-performance
cases. Feature Structures
were referred to by their
int address in the internal heap.

API: LowLevelCAS 

These had the same name as the
Plain API, except prefixed with
"ll_", e.g.
casView.ll_createFS(aType).

Instead of returning a Java object
representing the FS, these return
ints. 
casView.ll_getIntValue(addr, feat)
where the addr and feat are both
ints.
 
  • xxx_Type JCas classes removed in V3

These are eliminated in v3.  They served 2 purposes:

  • save one slot per feature structure - instead of a casImpl ref and a typeImpl ref, there was just one ref to the _Type instance, which in turn, and these two refs
  • provided a place for the low level accessors; these are accessors that take the "address" (now "id") of the FS as the way to designate which FS is being used.  There are 2 varieties of these low level accessors - those implemented in the CASImpl, and those implemented in the JCAS _Type classes.  The latter has methods like "myFeatureStructure.setXXX(address, value)".  Clearly, since these are instance methods on some FeatureStructure, there already exists a "handle" to that FS, making the use of the address superfluous.  

It's unlikely anyone is using the low level JCas-style accessors and they are no longer part of the API in V3.

  • JCas Class generation

JCas cover classes now come in single classes, rather than in pairs.  These classes are either built-in or are generated; built-in ones cannot be generatedhas a ref to the CAS view.  A single class definition might be used for multiple type systems; a single definition is used for all the built in types.  Each JCas class extends

...

Since the generated classes have static fields which ref from the _Type to the main class, generate the main class first, then the _Type one.  Avoid circular references.

  • Connecting Instances with MetaInformation

Meta information about types and features is stored in

...

Instances of a JCas type may be created via the "new" operator, passing in the JCas.  

  • Locating or instantiating the corresponding _Type instance

When a JCas instance is created, it needs to reference a corresponding _Type instance; these are "per CAS View".  A table is kept, by view, of already instantiated _Type instances, key =  JCas type class (identity key).  If not present, a new instance is generated from the corresponding (generated or provided) _Type class.  It should always be available (have been generated or set up) by the time it's needed.

  • Using the JCasRegistry

In v2, this was a map from ints to loaded JCas cover classes.

...

Go from typecode via typesystem to typeimpl to generator (creator).

  • Getters, Setters, Constructors, indirection

For JCas style, the getters, setters, and constructors are "direct": the users code says things like

...

The values would be extracted and inserted into the corresponding TypeImpl or FeatureImpl structure.  These would be invoked using the Functional Interface's method.  For example, if that method were get(), then the method would be invoked as myTypeImpl._accessors[featCode].get();

  • Issue with supporting multiple different type systems, serially.

This has one serious issue, illustrated by the use case: 

  • make a pipeline, 
  • deserialize some CAS's type system, and then deserialize that CAS
  • do some generic processing on that CAS
  • repeat 2 and 3 in a loop, with different type systems each time. 

The key points that cause a problem are 

  • having a UIMA pipeline that is being reused for multiple deserialized CASes, each of which might have a different type system
    • Note: this may not seem possible; because all UIMA pipelines have superclasses: AnalysisEngineImplBase -> ConfigurableResource_ImplBase -> Resource_ImplBase
      and Resource_ImplBase has a reference to a CasDefinition used for creating a CAS that matches the merged type system of the pipeline.  
      Deserialization may supply a different type system (e.g., having extra features for some types) and create a CAS having the definition that is read in as part of the deserialization process.
      • User code might merge the deserialized type system with the definition from the pipeline.
      • Some deserializations include the concept of setting aside or ignoring types and features used in the CAS being deserialized, but not defined in the receiving CAS (which is typically the one set up from the pipeline merged typesystem.  
  • The problem arises if the pipeline code has some JCas-like reference to some type / feature which is 
    • not built-in
    • but present in all the (varied) Type Systems being deserialized.   

    The pipeline code might have, for instance, an assignment 

    Code Block
    MyFooType mft = ....  // some code fragment yielding an instance of MyFooType
    mft.setMyFeature(333);  // sets a [named] feature in MyFooType

    When the merged type system is constructed, a "generate" step generates a JCas cover class definition which includes a class MyFooType, and in that class, a "setter" method "setMyFeature(...)", and loads this. When the pipeline is run, the code in the pipeline will be "linked" to the loaded class's setter method. 

    The difficult arises when the next CAS with a different definition of MyFooType (say, with extra features) is deserialized.  If the deserialization approach is to ignore extra features not in the merged type system from the main pipeline, then there is no problem.  But if user code, for example, merges deserialized type systems with the uima pipeline, this new definition needs to replace the old one, but the old one is now "linked" with the pipeline code "mft.setMyFeature(333);" above, and can't be replaced (to my knowledge), without also unloading the pipeline code and reloading it. (That's one potential, but significantly inefficient "solution"; another is disallowing changing type systems in this scenario.).

  • Proposed solution

See the section on class loaders.  Have different class loaders for the new TypeSystem and the user application and annotator code.

  • Collections

UIMA v2 supports specially-named arrays of primitives (+ string), e.g. BooleanArray. 

...

  • limit (initially) generic spec to only simple type names, no support for extends, ?, etc.  Use TOP for "Object".
  • Strings

Keep special UIMA String type for compatibility and subtyping.

  • Feature Structure APIs

JCas style - where the name of the Type and Feature are known, and present in the code.

...