You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

This page collects ideas for loading JCas classes including customization(s).

Each UIMA type have feature structure instances represented by instances of a corresponding (usually generated) Java class.  

  • The instances of these classes are the feature structure; the fields store the feature values.
    • Uima types which have natural corresponding Java equivalents have feature values stored using the natural corresponding Java types
      • primitives:  boolean, byte, short, int, long, float, double, String, \[arbitrary-java-objects] and arrays of these as xxx[].
  • Most UIMA built-in types have standard defined Java class definitions.

Design considerations

  • Built-in types support - would be nice if these were not needlessly replicated
  • Within a single JVM, there may be multiple UIMA type systems actively in use.  This is most likely supported via a separate class loader for these
    • This implies that the code which uses objects under the class loader be loaded from the same class loader; otherwise they can't "see" the loaded classes.
    • V2 UIMA may (optionally) make use of the UIMAClassLoader with a user-supplied class path, which doesn't delegate to the parent first - it checks its own classpath first,
      • This happens when the UIMA app specifies a UIMA extension class loader (via a not-well-documented method on a ResourceManager instance, setExtensionClassPath)  or a Data Path.
      • It's unlikely that most user code makes use of this.
      • The PEAR classpath isolation makes use of this. 
    • V3 augments the UIMAClassLoader with
      • the capability to recognize and generate JCas classes
        • This requires the loader have an association with one (or more?) particular committed type system instance, in order to do the proper generation.
  • A single type system may be used (via UIMA application APIs) for multiple different pipelines, and for multiple different sets of index definitions.
    • Having a single type system for multiple different sets of index definitions is unlikely
    • The index definition supplies the avoid-index-corruption information needed by the setters
      • If supplied at JCas generation time, then the test can be inserted only where needed, rather than run time testing if it is needed for many types. Test is currently a Bitset lookup.
  • Managing multiple type systems actively in use in a single JVM
    • It is possible to do this using Servlet-style complete class loading isolation
      • Done outside of the UIMA framework (but within the single JVM, e.g. Servlet-style class loading)
      • UIMA Impl classes and UIMA pipeline application classes loaded under multiple separate loaders (multiple copies)
      • Works for generated JCas classes, nothing special needed
    • UIMA framework managing multiple type systems
      • UseCase 1: a single pipeline being used, sequentially, for multiple type systems
      • UseCase 2: running multiple pipelines (in parallel) with different type systems, each of which might exhibit UseCase 1.
      • UseCase 1 in v2:  If the JCas isn't being used, then there's no class loading issue. This case can arise when deserialization is being used, and each CAS has its own (potentially different) type system.  The user code might be making use of what it a priori knows to be "common" types and features among the different deserializations.  (For example, all the built-in types are common).
        • The user code could reference non-common features, after using some other value to get the name of the feature.
        • If the JCas is in use, this is supported if the JCas generation was done with the specified feature, and that feature is present.  This requires that the JCas implementation check if the feature being accessed is defined in the current type system; an exception is thrown otherwise.
      • UseCase 1 in v3: 
        • For a given class loader, the generated JCas types cannot be updated.  (Well, there is a non-performant way, via another indirection - some fancy systems use to allow runtime redefinition of classes - see for instance http://zeroturnaround.com/software/jrebel/ or various other methods (google java runtime redefinition classes reload ).
          • Approaches: 
            • The generated types on the first deserialization need to have the Union of all expected types. This might require creation of the Union Type System, via an external utility, and an API for specifying that to UIMA.  This could also cause some inefficiencies, because the Union could grow large.
            • The classloader used needs to be "dropped" and a new one substituted - this will cause regeneration of JCas classes particular to the deserialized type system, but also cause reloading, re-JITting, etc. all the implementation classes of the pipeline.
      • UseCase2 in v2: If the JCas isn't being used, then there's no class loading issue 
        • except for potential collisions due to same-named annotator / external resource classes with different implementations.
          • Can be avoided by user using UIMA Extension Class Loaders or UIMA DataPaths, per pipeline
        • If the JCas is in use, then
          • If a common classloader is being used, then the JCas definition must be for the Union of the used parts (via JCas) of the type systems.
          • If a separate classloader is being used, then there is no constraint on the JCas definition being used.  
      • UseCase2 in v3:
        • If a common classloader is being used, then the JCas definition must be for the Union of all types/features of the used type systems.
        • If a separate classloader is being used, then there is no constraint, as above. 

Alternative to generated JCas classes to avoid classloading issues

We could have a design which has a level of indirection, similar to Nick Hill's submission, but slightly more generalized

  • a ArrayList, indexed by data in the type system, which held references, for values which were of that kind
  • an Int array-list - indexed by data in the type system, which held values of boolean, byte, short, int, long float and double (float & double taking 2 slots).

The arrays would be adjustable, to accommodate different type systems (and perhaps, dynamically augmented type systems).

What's in a JCas cover class?

There are two classes for each type.  

  • x.y.z.Foo - each instance represents one Feature Structure; in v3 these can be GC'd
  • x.y.z.Foo_Type - there is one instance per CAS (arbitrary view)

x.y.z.Foo 

Has 

  • a field for each feature
  • a reference to the _Type instance
    • only for backwards compatibility for low-level access model
    • Multiple instances per type system - one per CAS
    • has ref to TypeImpl
  • a reference to the CAS View (to support addToIndexes for the right view)
  • a ref to a type-system-wide Bitset for index corruption testing
  • Constructors
    • new Foo(Cas)
  • Methods
    • getter / setter for all fields
      • The setter methods may include index corruption checking code.
        • May be code which tests at runtime on each set, whether or not this 
    • indexed getter/setter for fields defined as arrays
    • (via inheritance) 
      • a collection of get/set methods, one per boolean/byte/short.../double/String/TOP/JavaObject and arrays of these, kinds of values.
        • The methods take an extra "offset" value, obtained from the Feature.
        • Used for backwards compatibility with non-JCas styles, and for serialization and other "generic" operations

x.y.z.Foo_Type 

An instance is loaded when a new x.y.z.Foo(some-cas) is done, lazily.  

has

  • a ref to the TypeImpl
  • a ref to the CAS (an arbitrary view, sometimes updated in v2), used for low level access patterns

Instances are accessed per CAS via

  • a Map (kept per CAS) from the x.y.z.Foo Class to the corresponding x.y.z.Foo_type instance.  
    • The key is a x.y.z.Foo Class object, so instances loaded under different class loaders may have the same class name.  This used to happen for PEARS (but not in v3).
      • This happens when different generated x.y.z.Foo (due to different merged type systems) are running in the same JVM.
      • This used to happen within one pipeline with PEAR switching, where the PEAR might have a different customization of a JCas class.  In v3, that doesn't happen; all versions of a customization must be merged.
    • If the Map has no entry, 
      • Load the _Type class itself, if not loaded (Map in TypeSystemImpl instance, key = name string, value = _Type Class). 
      • Make instance of it, populate map in CAS.svd.

Generating and Loading Cas cover class / merging with Customization

loading-jcas-classes

  • No labels