Feature Structures
These are implemented using Java objects, one per FeatureStructure. They can be Garbage Collected.
There is a generic Java class for these, plus (optional) specific classes for JCas style access.
APIs for creating Feature Structures, and setting / getting Feature Values in them
There are several kinds of APIs for this.
- Basic: this was the original API, and makes use of UIMA Feature and Type objects as arguments.
- JCas: this is an API that uses common Java idioms for creating, getting, and setting.
- LowLevel: this was like Basic, but substituted an int-valued address for the Java Feature Structure object, and in general, avoided created Java objects.
- In V3, it is dangerous to create FS using the low level API, because the resulting FS is identified only by an int, and if the Java Garbage Collector runs before any reference is created referring to the newly created FS, it will disappear (due to garbage collection). So the low level APIs in Version 3 are depreciated.
Description | create example | get a value | set a value | |
---|---|---|---|---|
Plain | Uses UIMA API: CAS | casView.createFS(aType) casView.createXXArray(size), | fs.getIntValue(aFeature) fs.get(index) when fs is | fs.setFloatValue(aFeature, value) fs.set(index, value) when fs is |
JCas | Follows Java conventions, Types and features must be known at compile time | new MyType()
| fs.getMyFeature() fs.getMyArrayFeature(index) fs.get(index) when fs is | fs.setMyFeature(value) fs.setMyArrayFeature(index, value) fs.set(index, value) |
Low Level | In version 2 this allowed API: LowLevelCAS | These had the same name as the Plain API, except prefixed with "ll_", e.g. casView.ll_createFS(aType). Instead of returning a Java object representing the FS, these return ints. | lowLvlCas.ll_getIntValue(addr, feat) where the addr and feat are both ints. | lowLvlCas.ll_setFloatValue(addr, feat) lowLvlCas.ll_setBooleanArrayValue(addr, index, value) |
Getting and setting Feature values in V3
The JCas style of getting / setting feature values requires that the feature names be known at compile time, so you can write getXXXX where XXXX is the known-at-compile-time name of the feature.
The Plain style does not need this information; instead the range must be known, and calls are made like getIntValue(featureValue), where featureValue can be dynamically computed at run time.
Plain style APIs bypass any JCas getter or setter customization
The plain style APIs do not invoke the JCas style getters and setters, even if those are present and perhaps customized. This is a design decision made to follow the V2 implementation, and also for performance reasons. So, if you have customized a getter or a setter in JCas, you must use the JCas APIs to run the customizations.
xxx_Type JCas classes removed in V3
These are eliminated in v3. They served 2 purposes:
- save one slot per feature structure - instead of a casImpl ref and a typeImpl ref, there was just one ref to the _Type instance, which in turn, and these two refs
provided a place for the low level accessors; these are accessors that take the "address" (now "id") of the FS as the way to designate which FS is being used. There are 2 varieties of these low level accessors - those implemented in the CASImpl, and those implemented in the JCAS Type classes. The latter has methods like "myShared_TypeInstance.setXXX(address, value)". These are instance methods on the shared xxx_type instance, and were intended to permit access without creating the Java cover object for the FS.
The performance reason for using the low level accessors is not present in V3; in fact, these, if implemented, would be slower than the other APIs.
JCas Class sharing
JCas cover classes now come in single classes, rather than in pairs. These classes are either built-in or are generated; built-in ones cannot be generated
JCas classes are associated with a class loader. Except for the built-in types which always have JCas Classes, other JCas classes are optional. When a UIMA type is instantiated in V3, the Java class used is the most specific instance of a JCas class for that type that is found. For example, if you have a type Foo, with superType Bar, which in turn is a subtype of Annotation, and have no JCas classes defined, then when you create an instance of Foo (using the plain API: casView.createFS(fooType) because you can't do the JCas style of new Foo, because you haven't got a JCas class for type Foo), it will create an instance of Annotation as the implementing Java class.
One set of JCas classes per class loader may be used (even simultaneously) for multiple different type systems. This can occur sequentially, for example, if a sequence of CASs and their type systems are being deserialized and worked on, sequentially. The type hierarchy defined by the JCas classes must match the UIMA type system(s) where it is being used. To run using multiple different JCas definitions, class loader isolation must be used.
When generated, they are specific to one (merged) type system, except for shared, common, built-in class definitions. To allow for multiple type systems within one JVM simultaneously, class loader isolation is used.
Connecting Instances with Type and Feature information
Information about types and features is stored in TypeImpl and FeatureImpl instances. These are unique per type system. However, multiple type system instances created using the same (merged) definition, and therefore "equal", are recognized at type system commit time, and the existing type system implementation is reused in this case.
When creating a new instance of a UIMA Type, the JCas class for that type is loaded (if not already, and if available).
Locating the corresponding UIMA Type when creating a JCas type using the "new" operator
When a JCas instance is created using the "new" operator, it locates the type using information in a JCasRegistry. The type cannot be statically kept in the JCas class definition, since one JCas class might be used by multiple different type systems. Instead, each JCas class, when it is loaded, is assigned a unique incrementing number; this number is kept with the static (one per class loader) information for TypeSystemImpl.
At instance creation time, a lookup is done, using the instance of the type system, to get the actual type associated with the registry number. This mechanism is encapsulated within the JCasRegistry.
Locating the corresponding UIMA Feature when accessing a feature
The generated getter or setter code for a JCas feature need to access the feature instance associated with the feature being set. This cannot be statically compiled into the JCas class for the same reasons as above. A similar registry is used, and a unique incrementing integer is assigned to each feature referenced in a JCas class; these int values are kept as static final int values in the loaded class, and are similarly looked up when needed.
These are used in the trampoline methods of the generated getters and setters. A getBegin() gets converted into a getIntValue(_getFeat(_FI_begin)), where getIntValue is the plain API call and _getFeat is a method taking the registered unique identifier for this feature and looking up the FeatureImpl instance for it.
Collections
UIMA v2 supports specially-named arrays of primitives (+ string), e.g. BooleanArray.
UIMA v2 supports arrays of Feature Structures, using FSArray (JCas) or ArrayFS (Generic).
For v3, support (not yet done, TBD?)
- new notation (arrays): aligned with Java: TOP[] or Annotation[] or MyType[] or short[]
- new notation (collections): aligned with Java generics: List<TOP> or ArrayList<Annotation> or HashSet<MyType>
Use Java fully qualified names as the UIMA type name.
Extend idea of "component type" to include multiple generics.
- limit (initially) generic spec to only simple type names, no support for extends, ?, etc. Use TOP for "Object".
Strings
Keep special UIMA String type for compatibility and subtyping.