Page History

...

There are several kinds of APIs for this. One

Basic: this was the original API, and makes use of UIMA Feature and Type objects as arguments.
JCas: this is an API that uses common Java idioms for creating, getting, and setting.

Description

create example

get a value

set a value

Plain

Uses UIMA
Type and Feature
Instances

API: CAS

casView.createFS(aType)

casView.createXXArray(size),
XX was the type.

fs.getIntValue(aFeature)

fs.get(index) when fs is
one of the built-in arrays

fs.setFloatValue(aFeature, value)

fs.set(index, value) when fs is
one of the built-in arrays

JCas

Follows Java conventions,
Types and features must
be known at compile time

new MyType()

can have additional constructors

fs.getMyFeature()

fs.getMyArrayFeature(index)
when the value of myArrayFeature
is one of the built-in arrays

fs.get(index) when fs is
one of the built-in arrays

fs.setMyFeature(value)

fs.setMyArrayFeature(index, value)

fs.set(index, value)

Low
Level

In version 2 this allowed
CAS access without making
any Java objects; there was
much less "checking" and
it was for high-performance
cases. Feature Structures
were referred to by their
int address in the internal heap.

API: LowLevelCAS

These had the same name as the
Plain API, except prefixed with
"ll_", e.g.
casView.ll_createFS(aType).

Instead of returning a Java object
representing the FS, these return
ints.

casView.ll_getIntValue(addr, feat)
where the addr and feat are both
ints.

xxx_Type JCas classes removed in V3

These are eliminated in v3. They served 2 purposes:

save one slot per feature structure - instead of a casImpl ref and a typeImpl ref, there was just one ref to the _Type instance, which in turn, and these two refs
provided a place for the low level accessors; these are accessors that take the "address" (now "id") of the FS as the way to designate which FS is being used. There are 2 varieties of these low level accessors - those implemented in the CASImpl, and those implemented in the JCAS _Type classes. The latter has methods like "myFeatureStructure.setXXX(address, value)". Clearly, since these are instance methods on some FeatureStructure, there already exists a "handle" to that FS, making the use of the address superfluous.

It's unlikely anyone is using the low level JCas-style accessors and they are no longer part of the API in V3.

JCas Class generation

JCas cover classes now come in single classes, rather than in pairs. These classes are either built-in or are generated; built-in ones cannot be generatedhas a ref to the CAS view. A single class definition might be used for multiple type systems; a single definition is used for all the built in types. Each JCas class extends

...

Since the generated classes have static fields which ref from the _Type to the main class, generate the main class first, then the _Type one. Avoid circular references.

Connecting Instances with MetaInformation

Meta information about types and features is stored in

...

Instances of a JCas type may be created via the "new" operator, passing in the JCas.

Locating or instantiating the corresponding _Type instance

When a JCas instance is created, it needs to reference a corresponding _Type instance; these are "per CAS View". A table is kept, by view, of already instantiated _Type instances, key = JCas type class (identity key). If not present, a new instance is generated from the corresponding (generated or provided) _Type class. It should always be available (have been generated or set up) by the time it's needed.

Using the JCasRegistry

In v2, this was a map from ints to loaded JCas cover classes.

...

Go from typecode via typesystem to typeimpl to generator (creator).

Getters, Setters, Constructors, indirection

For JCas style, the getters, setters, and constructors are "direct": the users code says things like

...

The values would be extracted and inserted into the corresponding TypeImpl or FeatureImpl structure. These would be invoked using the Functional Interface's method. For example, if that method were get(), then the method would be invoked as myTypeImpl._accessors[featCode].get();

Issue with supporting multiple different type systems, serially.

This has one serious issue, illustrated by the use case:

make a pipeline,
deserialize some CAS's type system, and then deserialize that CAS
do some generic processing on that CAS
repeat 2 and 3 in a loop, with different type systems each time.

The key points that cause a problem are

having a UIMA pipeline that is being reused for multiple deserialized CASes, each of which might have a different type system
- Note: this may not seem possible; because all UIMA pipelines have superclasses: AnalysisEngineImplBase -> ConfigurableResource_ImplBase -> Resource_ImplBase
  and Resource_ImplBase has a reference to a CasDefinition used for creating a CAS that matches the merged type system of the pipeline.
  Deserialization may supply a different type system (e.g., having extra features for some types) and create a CAS having the definition that is read in as part of the deserialization process.
  - User code might merge the deserialized type system with the definition from the pipeline.
  - Some deserializations include the concept of setting aside or ignoring types and features used in the CAS being deserialized, but not defined in the receiving CAS (which is typically the one set up from the pipeline merged typesystem.
The problem arises if the pipeline code has some JCas-like reference to some type / feature which is
- not built-in
- but present in all the (varied) Type Systems being deserialized.
The pipeline code might have, for instance, an assignment
Code Block
MyFooType mft = .... // some code fragment yielding an instance of MyFooType mft.setMyFeature(333); // sets a [named] feature in MyFooType
When the merged type system is constructed, a "generate" step generates a JCas cover class definition which includes a class MyFooType, and in that class, a "setter" method "setMyFeature(...)", and loads this. When the pipeline is run, the code in the pipeline will be "linked" to the loaded class's setter method.
The difficult arises when the next CAS with a different definition of MyFooType (say, with extra features) is deserialized. If the deserialization approach is to ignore extra features not in the merged type system from the main pipeline, then there is no problem. But if user code, for example, merges deserialized type systems with the uima pipeline, this new definition needs to replace the old one, but the old one is now "linked" with the pipeline code "mft.setMyFeature(333);" above, and can't be replaced (to my knowledge), without also unloading the pipeline code and reloading it. (That's one potential, but significantly inefficient "solution"; another is disallowing changing type systems in this scenario.).
Proposed solution

See the section on class loaders. Have different class loaders for the new TypeSystem and the user application and annotator code.

Collections

UIMA v2 supports specially-named arrays of primitives (+ string), e.g. BooleanArray.

...

limit (initially) generic spec to only simple type names, no support for extends, ?, etc. Use TOP for "Object".
Strings

Keep special UIMA String type for compatibility and subtyping.

Feature Structure APIs

JCas style - where the name of the Type and Feature are known, and present in the code.

...

Child pages

Versions Compared

Old Version 4

New Version 5

Key

xxx_Type JCas classes removed in V3

JCas Class generation

Connecting Instances with MetaInformation

Locating or instantiating the corresponding _Type instance

Using the JCasRegistry

Getters, Setters, Constructors, indirection

Issue with supporting multiple different type systems, serially.

Proposed solution

Collections

Strings

Feature Structure APIs