You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Feature Structures


These are implemented using Java objects, one per FeatureStructure.  They can be Garbage Collected.

There is a generic Java class for these, plus (optional) specific classes for JCas style access. 

v3_FeatureStructure_organization_diagram

xxx_Type files

These are eliminated in v3.  They served 2 purposes:

  1. save one slot per feature structure - instead of a casImpl ref and a typeImpl ref, there was just one ref to the _Type instance, which in turn, and these two refs
  2. provided a place for the low level accessors.

It's unclear if anyone is using the low level accessors.  These may be retained, but moved to the main xxx classes.

For the non-JCas style of Java cover classes (FeatureStructureImplC) these did not implement a _Type instance, and had as a consequence both a casImpl ref and a typeimpl ref.

JCas Class generation

JCas cover classes now come in single classes, rather than in pairs.  These classes are either built-in or are generated; built-in ones cannot be generatedhas a ref to the CAS view.  A single class definition might be used for multiple type systems; a single definition is used for all the built in types.  Each JCas class extends

  • has a ref to the corresponding TypeImpl.  This can't be in the main class as a static, as there is a one-to-many relationship because the built-in main classes are shared across type systems.

When generated, they are specific to one (merged) type system, except for shared, common, built-in class definitions.  To allow for multiple type systems within one JVM simultaneously, class loader isolation is used.

  • Class loader isolation is optional - it may not be needed for simple deployments, or it may be being handled outside of UIMA (e.g., a single UIMA pipeline running as a servlet)
  • UIMA provides the UIMATypeSystemClassLoader which can be used for classpath isolation, and also serves to implement lazy (just in time) generation of the JCas classes.
    • When this is used, UIMA artifacts that might reference types (application, external resources, or annotators) are need to be loaded under this class loader.
    • The same class loader lazily generates JCas classes (both the x.y.z.Foo and x.y.z.Foo_Type) and loads them on demand.
      • To enable this, the UIMATypeSystemClassLoader has a settable reference to the associated Type System; type system commit searches the class loader chain for an instance of UIMATypeSystemClassLoader , and sets this reference.  If it is already set, if the merged type system is different, throw an error.
      • The generation/load happens when a reference is made to a class having the same name as a UIMA type (or xxx_Type).  With Java's lazy loading, if this class is not already loaded, it is generated and loaded.
  • If there is no UIMATypeSystemClassLoader  in the parent chain:
    • type system commit does a batch generate and injection-load of all types (not lazy), using the current ExtensionClassLoader from the UIMA ResourceManager (if exists) or the current class loader (or perhaps the current context class loader if it exists).
    • If these types already are loaded (findLoadedClass doesn't return null), throw an exception because the types were loaded from the class path because a reference was made to a JCas Class prior to type system commit; this caused a load of a class by that name from the classpath, which may not be the same as the generated one.  The user (application developer) will need to fix this by insuring type system commit happens before a reference by name to a JCas class (e.g. new Foo...)

Since the generated classes have static fields which ref from the _Type to the main class, generate the main class first, then the _Type one.  Avoid circular references.

Connecting Instances with MetaInformation

Meta information about types and features is stored in

  • TypeImpl and FeatureImpl instances
    • These are not shared among TypeSystems, as they need to have (for constrained iterator impl) refs to the TypeSystem.
  • The "static" information of generated or built-in JCas classes representing types
    • Some of these classes (but not class instances) are shared among type systems (e.g. the built-in types)
      • Therefore, the static data cannot reference Type/Feature instances
  • The _Type class is generated for each Typesystem, for the merged type
    • An instance of this is kept per Cas View, and referred to from the instance of the JCas Type

Please wait for the page to refresh...

Linking drawing automatically to page UimaV3Spec

To make a new instance of Type, the Type (and _Type) classes have to be generated if not already available.  They may be available because user code might have referenced a JCas class by name, causing it to be generated and loaded. (The class loader used has a check for attempts to find a JCas cover type, and generate it on demand.)

To generate a JCas class, the class loader (an instance of UIMAClassLoader) has access to the type system impl if the type system has been committed; it checks to insure the type system is committed, and then generates and loads the Type and _Type classes, in the context of that type system.  Built-in versions of these classes are always "found" and not generated.

  • The nearest in the class-loader parent chain which is a UIMAClassLoader, to a type system being committed, sets that class loader's type system reference.
  • If the type system ref is already set, this is an error condition; a new classloader instance is required for new type systems (might be able to optimize for new but equal type system)
  • If UIMAClass loaders are not being used, then lazy loading can't be done; instead user may call method to load all the classes for all the types.

Instances of a JCas type may be created via the "new" operator, passing in the JCas.  

Locating or instantiating the corresponding _Type instance

When a JCas instance is created, it needs to reference a corresponding _Type instance; these are "per CAS View".  A table is kept, by view, of already instantiated _Type instances, key =  JCas type class (identity key).  If not present, a new instance is generated from the corresponding (generated or provided) _Type class.  It should always be available (have been generated or set up) by the time it's needed.

Using the JCasRegistry

In v2, this was a map from ints to loaded JCas cover classes.

  • Keep this for now to ease backwards compatibility. But it would be nice to get rid of it.  
    • Need to enumerate all uses of it
  • Goal: make this work with multiple type systems, and use as index the (dense) typecode from TypeImpl. 
    • These type codes are common up to the end of the built-ins, and then branch, one per type system. Some of these type systems will come and go, so insure GC can happen for the gone ones. 

Lookup needs to be by type system, obtainable from instances (via ref to _Type).  Generated classes have ref to type system and can use typecode for this value.

Go from typecode via typesystem to typeimpl to generator (creator).

Getters, Setters, Constructors, indirection

For JCas style, the getters, setters, and constructors are "direct": the users code says things like

create, getters
new Foo() // create Foo instance, or
myFooInstance.getMyFeat()  // to get a feature or
myFooArrayInstance.getMyIndexedFeat(4)  // to get the 4th element of an array
Setters
myFooInstance.setMyFeat(featValue)  // to set a feature with a value 
myFooArrayInstance.setMyIndexedFeat(4, featValue)  // to set the 4th element of an array

For non-JCas style, the user writes something like this:

Non-JCas, indirect via Type/Feature instances
acasinstance.createFs(aType)  // create a feature structure; aType is an instance of TypeImpl
myInstance.getIntValue(aFeature) // get an int valued feature; aFeature is an instance of FeatureImpl

There are also low-level equivalents, where the typeCode or featureCode is passed instead, and the featureStructureID is passed as well.  These methods are on the CAS itself, because there's no JCas object in this case.

For these to work in version 3, we need to go from the Type or Feature instance to being able to get/set/create in the Java space.  Java 8 provides a mechanism that can be optimized by JIT and appears to be equally performant as direct access, using a form of MethodHandles and LambdaMetaFactory (see http://stackoverflow.com/questions/19557829/faster-alternatives-to-javas-reflection ) or the alternative available in Java 8 of method references (e.g.   ClassXYZ::getFoo ).  A test of these approaches appears to indicate they are as fast as native access. 

To use these, the generated class needs to initialize a set of variables in the associated Type and Feature classes with the appropriate Constructor/Method references.  A way this could be done:

  • Have the class declare a set of static Supplier or Consumer or other appropriate Functional Interface values, one per getter/setter/constructor, as a particular name
  • as part of loading the class, get this value and distribute the values to all the features and type

The values would be extracted and inserted into the corresponding TypeImpl or FeatureImpl structure.  These would be invoked using the Functional Interface's method.  For example, if that method were get(), then the method would be invoked as myTypeImpl._accessors[featCode].get();

Issue with supporting multiple different type systems, serially.

This has one serious issue, illustrated by the use case: 

  1. make a pipeline, 
  2. deserialize some CAS's type system, and then deserialize that CAS
  3. do some generic processing on that CAS
  4. repeat 2 and 3 in a loop, with different type systems each time. 

The key points that cause a problem are 

  • having a UIMA pipeline that is being reused for multiple deserialized CASes, each of which might have a different type system
    • Note: this may not seem possible; because all UIMA pipelines have superclasses: AnalysisEngineImplBase -> ConfigurableResource_ImplBase -> Resource_ImplBase
      and Resource_ImplBase has a reference to a CasDefinition used for creating a CAS that matches the merged type system of the pipeline.  
      Deserialization may supply a different type system (e.g., having extra features for some types) and create a CAS having the definition that is read in as part of the deserialization process.
      • User code might merge the deserialized type system with the definition from the pipeline.
      • Some deserializations include the concept of setting aside or ignoring types and features used in the CAS being deserialized, but not defined in the receiving CAS (which is typically the one set up from the pipeline merged typesystem.  
  • The problem arises if the pipeline code has some JCas-like reference to some type / feature which is 
    • not built-in
    • but present in all the (varied) Type Systems being deserialized.   

    The pipeline code might have, for instance, an assignment 

    MyFooType mft = ....  // some code fragment yielding an instance of MyFooType
    mft.setMyFeature(333);  // sets a [named] feature in MyFooType

    When the merged type system is constructed, a "generate" step generates a JCas cover class definition which includes a class MyFooType, and in that class, a "setter" method "setMyFeature(...)", and loads this. When the pipeline is run, the code in the pipeline will be "linked" to the loaded class's setter method. 

    The difficult arises when the next CAS with a different definition of MyFooType (say, with extra features) is deserialized.  If the deserialization approach is to ignore extra features not in the merged type system from the main pipeline, then there is no problem.  But if user code, for example, merges deserialized type systems with the uima pipeline, this new definition needs to replace the old one, but the old one is now "linked" with the pipeline code "mft.setMyFeature(333);" above, and can't be replaced (to my knowledge), without also unloading the pipeline code and reloading it. (That's one potential, but significantly inefficient "solution"; another is disallowing changing type systems in this scenario.).

Proposed solution

See the section on class loaders.  Have different class loaders for the new TypeSystem and the user application and annotator code.

Collections

UIMA v2 supports specially-named arrays of primitives (+ string), e.g. BooleanArray. 

UIMA v2 supports arrays of Feature Structures, using FSArray (JCas) or ArrayFS (Generic).  

For v3, support 

  • new notation (arrays):  aligned with Java: TOP[] or Annotation[] or MyType[] or short[]
  • new notation (collections): aligned with Java generics: List<TOP> or ArrayList<Annotation> or HashSet<MyType>

Use Java fully qualified names as the UIMA type name. 

Extend idea of "component type" to include multiple generics.

  • limit (initially) generic spec to only simple type names, no support for extends, ?, etc.  Use TOP for "Object".

Strings

Keep special UIMA String type for compatibility and subtyping.

Feature Structure APIs

JCas style - where the name of the Type and Feature are known, and present in the code.

Generic style - where the name of the Type and Feature are not known ahead of time, and are referred to indirectly via variables, in the code.

"Low level" style - only for backwards compatibility.


  • No labels