Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

aspectsub
aspect 
picturespace/time tradeoffs, locality-of-reference (L1/L2/L3 memory caching)backwards
compatibility 
alternativesnotes   
Overview 

UIMA Entities (internal view)

Image Added

       
Data Storage

Where: Each FS's storage is represented by values as part of 1 Java object

  • can be GC'd
  • No central CAS "Heap"

UIMA Feature Structure diagram


More space:

  • always have Java cover object (vs possibility of no Java objects)
  • Java cover object: 3 object overheads / FS (vs 1)
  • Java cover object has denormalized shared additional fields

Faster: locality of reference high.

Faster: operations (except for some needing FSids) won't need to use JCasHashMap to convert from int offsets in the heap to JCas cover objects.

 

for reduced space / FS could:

  • avoid java object overheads for Obj & int arrays (but gives up GC by individual object)
  • share cas ref, type ref, typesystem ref.
denormalized: each has cas ref, type ref, typesystem ref   
Data Storage

"fs-id" - an int (dense) representing the unique ID of a FS.

  • assigned lazily, not all FSs might have these
  • not reused in case FS is garbage collected
        
Data Storage

Feature Structure representation: as 3 Java objects:

  • array of Ints
  • array of Objects
  • container of above, with additional refs to
    • cas
    • typesystem
    • type
    • a ref to a shared int array representing offsets in the top two arrays, indexed by known JCas features
 

The offset array is an object that roughly corresponds to the _Type object in the JCas, in that provides a way to get from a designated field to the offset. The JCas provides this as special named fields, part of the _Type object. The CasObj provides this as an int array object.

The cas ref is used for "addToIndexes" to locate the view containing the indexes to be added to.

The offset array is shared among all FS associated with a particular type system, with some exceptions (e.g. SourceDocumentInformation) - but I think this is just a quick-fix anomaly

  

cas ref is to one view; used for add/remove-indexes, getView, get the "fs-id"

   
Data Storage

"get" and "set" operations for features

  • some builtin hard-coded offsets
  • there's a shared int[] that maps JCas features to offsets in the int[] and obj[] values
        
Data StorageJCAS _Type classes These are not used, but are "supported" for backwards compatibility. Support includes their low-level APIs (question)      
Data Storagelow level API support, including C++, binary (de)serialization partially started, remainder TBD      
Views  FS obj has link to CAS -view it was originally created in; this is used for obj.addToIndexes style for add/remove      
IndexesBag - structure

UIMA-CO Bag Index

  • 1 collection per (instantiated) type (lazy construction)
  • Collection structures (especially concurrent ones) have significant space overheads
    • but probably has low memory-cache-dumping for add/remove and simple iteration ops (linked lists) (a good thing)
  • size() operation may be slow especially for concurrent

 

      

...