...
There are several kinds of APIs for this. One
- Basic: this was the original API, and makes use of UIMA Feature and Type objects as arguments.
- JCas: this is an API that uses common Java idioms for creating, getting, and setting.
Description | create example | get a value | set a value | |
---|---|---|---|---|
Plain | Uses UIMA API: CAS | casView.createFS(aType) casView.createXXArray(size), | fs.getIntValue(aFeature) fs.get(index) when fs is | fs.setFloatValue(aFeature, value) fs.set(index, value) when fs is |
JCas | Follows Java conventions, Types and features must be known at compile time | new MyType()
| fs.getMyFeature() fs.getMyArrayFeature(index) fs.get(index) when fs is | fs.setMyFeature(value) fs.setMyArrayFeature(index, value) fs.set(index, value) |
Low Level | In version 2 this allowed API: LowLevelCAS | These had the same name as the Plain API, except prefixed with "ll_", e.g. casView.ll_createFS(aType). Instead of returning a Java object representing the FS, these return ints. | casView.ll_getIntValue(addr, feat) where the addr and feat are both ints. |
xxx_Type JCas classes removed in V3
These are eliminated in v3. They served 2 purposes:
- save one slot per feature structure - instead of a casImpl ref and a typeImpl ref, there was just one ref to the _Type instance, which in turn, and these two refs
provided a place for the low level accessors; these are accessors that take the "address" (now "id") of the FS as the way to designate which FS is being used. There are 2 varieties of these low level accessors - those implemented in the CASImpl, and those implemented in the JCAS _Type classes. The latter has methods like "myFeatureStructure.setXXX(address, value)". Clearly, since these are instance methods on some FeatureStructure, there already exists a "handle" to that FS, making the use of the address superfluous.
It's unlikely anyone is using the low level JCas-style accessors and they are no longer part of the API in V3.
JCas Class generation
JCas cover classes now come in single classes, rather than in pairs. These classes are either built-in or are generated; built-in ones cannot be generatedhas a ref to the CAS view. A single class definition might be used for multiple type systems; a single definition is used for all the built in types. Each JCas class extends
...
Since the generated classes have static fields which ref from the _Type to the main class, generate the main class first, then the _Type one. Avoid circular references.
Connecting Instances with MetaInformation
Meta information about types and features is stored in
...
Instances of a JCas type may be created via the "new" operator, passing in the JCas.
Locating or instantiating the corresponding _Type instance
When a JCas instance is created, it needs to reference a corresponding _Type instance; these are "per CAS View". A table is kept, by view, of already instantiated _Type instances, key = JCas type class (identity key). If not present, a new instance is generated from the corresponding (generated or provided) _Type class. It should always be available (have been generated or set up) by the time it's needed.
Using the JCasRegistry
In v2, this was a map from ints to loaded JCas cover classes.
...
Go from typecode via typesystem to typeimpl to generator (creator).
Getters, Setters, Constructors, indirection
For JCas style, the getters, setters, and constructors are "direct": the users code says things like
...
The values would be extracted and inserted into the corresponding TypeImpl or FeatureImpl structure. These would be invoked using the Functional Interface's method. For example, if that method were get(), then the method would be invoked as myTypeImpl._accessors[featCode].get();
Issue with supporting multiple different type systems, serially.
This has one serious issue, illustrated by the use case:
- make a pipeline,
- deserialize some CAS's type system, and then deserialize that CAS
- do some generic processing on that CAS
- repeat 2 and 3 in a loop, with different type systems each time.
The key points that cause a problem are
- having a UIMA pipeline that is being reused for multiple deserialized CASes, each of which might have a different type system
- Note: this may not seem possible; because all UIMA pipelines have superclasses: AnalysisEngineImplBase -> ConfigurableResource_ImplBase -> Resource_ImplBase
and Resource_ImplBase has a reference to a CasDefinition used for creating a CAS that matches the merged type system of the pipeline.
Deserialization may supply a different type system (e.g., having extra features for some types) and create a CAS having the definition that is read in as part of the deserialization process.- User code might merge the deserialized type system with the definition from the pipeline.
- Some deserializations include the concept of setting aside or ignoring types and features used in the CAS being deserialized, but not defined in the receiving CAS (which is typically the one set up from the pipeline merged typesystem.
- Note: this may not seem possible; because all UIMA pipelines have superclasses: AnalysisEngineImplBase -> ConfigurableResource_ImplBase -> Resource_ImplBase
- The problem arises if the pipeline code has some JCas-like reference to some type / feature which is
- not built-in
- but present in all the (varied) Type Systems being deserialized.
The pipeline code might have, for instance, an assignment
Code Block MyFooType mft = .... // some code fragment yielding an instance of MyFooType mft.setMyFeature(333); // sets a [named] feature in MyFooType
When the merged type system is constructed, a "generate" step generates a JCas cover class definition which includes a class MyFooType, and in that class, a "setter" method "setMyFeature(...)", and loads this. When the pipeline is run, the code in the pipeline will be "linked" to the loaded class's setter method.
The difficult arises when the next CAS with a different definition of MyFooType (say, with extra features) is deserialized. If the deserialization approach is to ignore extra features not in the merged type system from the main pipeline, then there is no problem. But if user code, for example, merges deserialized type systems with the uima pipeline, this new definition needs to replace the old one, but the old one is now "linked" with the pipeline code "mft.setMyFeature(333);" above, and can't be replaced (to my knowledge), without also unloading the pipeline code and reloading it. (That's one potential, but significantly inefficient "solution"; another is disallowing changing type systems in this scenario.).
Proposed solution
See the section on class loaders. Have different class loaders for the new TypeSystem and the user application and annotator code.
Collections
UIMA v2 supports specially-named arrays of primitives (+ string), e.g. BooleanArray.
...
- limit (initially) generic spec to only simple type names, no support for extends, ?, etc. Use TOP for "Object".
Strings
Keep special UIMA String type for compatibility and subtyping.
Feature Structure APIs
JCas style - where the name of the Type and Feature are known, and present in the code.
...