Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Allowing more parallelism - e.g. running a pipeline with multiple independent Annotators working together in the same CAS
  • Shifting the space / speed tradeoffs - the current design has ways to minimize the memory footprint, but is potentially slower than designs not so focussed on this
    • Because of the relative speed of CPUs and Memory, much improved performance can be obtained by improving locality-of-reference (LOR); often done by duplicating things (taking more memory)
    • Here's a description of the l1/2/3 cache mechanism of IBM's Power 7: http://www-03.ibm.com/systems/resources/systems_power_software_i_perfmgmt_underthehood.pdf
      • A cache line is 128 bytes.  This means to refer to 1 byte, 128 bytes are uploaded.  L1 has 256 of these cache-lines for data, and 256 for instructions, per core.
      • A core can be running 4 threads, so the L1 is "shared" among these. See notion of "Cache pressure" in above linked article.
      • L2 has 2048 of these cache-lines
      • Timings:
        • L1 access = 1 cycle, but note that the core can execute up to 5 instructions / cycle
        • L2 usually less than 10 cycles
        • this core's L3 ~ 25 cycles
        • other core's L3 ~ 150 cycles
        • Main Memory ~400 cycles
  • Removing parts of UIMA not used (e.g., the CAS Data Object protocols and related things like the NetworkCasProcessorImpl
  • Switching from custom implementations of various functions to more standard Java libraries implementation
    • to reduce the code size
    • to make it more maintainable
    • to take advantage of core Java / core Library improvements going forward
  • Handling backward compatibility - how much, how faithful, etc.

A prototype in this direction, called Cas-obj, has been offered in UIMA-4329

Ideas for the next major version of UIMA

Here's a wiki page to collect more ideas for what might be some things to consider for UIMA version 3.

Edge Cases affecting internal design

...

areasub-areafrequencydetails
FS creation

AnnotationBase subtype not allowed in base View

  
FS slot setting

check for index corruption

  • see if FS field is one which is in 1 or more indexes, and if so,
  • see if this FS is in any index in any view
    • (currently an expensive operation, could be made a lot cheaper with 1 boolean per FS per view - the value could be indexed in one view and not in another)