Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are many big-data frameworks now.  UIMA has a particular slant on things to encourage component development and reuse (I'm thinking of externalization of the Type System, merging of type systems).  UIMA also has its scaleout approach, and the RUTA workbench facility.  This topic is where we can think about UIMA components in other frameworks (e.g. Apache Spark), or vice-versa.

Interoperability could be facilitated by more standards around REST service packaging.

Complete JSON deserialization with an eye toward being "permissive" to receive data models from other frameworks?

Big changes

More use of Java compiler (ecj) and decompiling

...

These typically have approaches to type systems that use user-defined Java types, and allow any kind of Java objects in the fields.  There are new kinds of Serialization / Deserialization that work for all kinds of Java objects, but are much more efficient than Java reflection-based approaches (e.g. Kryo used by Spark).  

Add support for Collections and Maps

Users have wanted these kinds of objects; some implementations I've seen have tried to implement Sets using a combination of HashSet and UIMA FSLists, duplicating the data and keeping things in sync, which was very inefficient.  More on this topic here. 

More concurrency

Support parallel running of pipeline components.

...

(Unlikely) Making the element of the "stream" be a new CAS - replacement for CAS Multipliers. Seems like the wrong granularity...  Maybe best to let Java evolve this for a few more releases.

Other changes

Integrate key ideas from uimaFIT

These include:

  • Alternative, Java-centric way of specifying a type system - user write a Java class with annotations. 
  • Alternative, Java-centric way of specifying configuration information
  • Convenience methods (e.g. selecting groups of feature structures using SQL-like specifications)
  • Others?

Better support for "run-time" dynamic typing

Moving towards "dynamic" typing - see paper: http://aclweb.org/anthology/W14-5209

Supporting "combining specifications" that map type systems

Different components should be easily combinable even if they have different type systems, if a mapping can be found and specified.  For more complex mappings, custom adapters could be supported?

Using the Web to facilitate component combinations

User wanting to combine X with Y should be able to lookup on the web and download the adapter or 90% of the work predone.  It should be easy for users to share this information on the Web. 

Judicious substitution of other packages for hand-built code

...