You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

CAS Viewer Extension For Provenance Tracking of UIMA CAS Content

1. Proposal

To improve the ability to debug and maintain UIMA components, we propose to add the ability to log the updates to the CAS and the Index Repository as follows:

  • track which Analysis Engines (AEs) created or modified the feature structures in a CA
  • track the operations (add and delete) by each AE to index feature structures

 The collected information can be classified as follows:

  • Call sequence to AEs
  • For each AE, a list of newly created feature structures (FSs) and a list of changes to pre-existing FSs. We will use FS Journal as the terminology to refer to this information
  • For each AE, a list of added, deleted, and modified FSs to/from index repository (if the same FS is deleted and added back to the index, we will classified as "modified"). We will use Index Journal as the terminology to refer to this information.

The development of this "Provenance Tracking of UIMA CAS Content" as described in the Wiki is composed of two parts:

  • providing APIs in the UIMA framework to support access of journaling information
  • visualizing the collected information
    In this documentation, we will focus on the GUI part of the visualization from the end-user's perspective.

2. Development Process

The development will be done as a tooling project of Apache UIMA with the participation from the community. Since there are a lot of codes in the CAS Viewer (submitted as a contribution to Apache UIMA) that can be reused, we propose to develop the visualization of CAS/Index Journal as the extension to the CAS Viewer.

3. Proposed User Interface

In the following proposed GUI mockup, we use "deploy/as/MeetingFinderAggregate.xml" from uimaj-example of the UIMA AS package with some modification to its behavior to illustrate the design. This aggregate AE has the following structure:
    MeetingFinderAggregate
        Collection Reader
        TokenAndSentence AE
        MeetingDetector Aggregate
            RoomNumber AE
            DateTime AE
            Meeting AE
        Cas Consumer
 
We assume that, after running MeetingFinderAggregate with an input document, the following basic information is produced:
 (1) A list of calls to AEs
 (2) Within the call to each AE, a list of new FSs created by this AE and a list of modified FSs
 (3) Within the call to each AE, a list of add/delete/modify FS operations to index repository
 
The issue here is how to visualize the above three kinds of basic information to the developers?

Note that, for the initial implementation, we propose to only preserve the final value for a FS (intermediate values are not kept).
 
Based on the above example and assumptions, the following shows some screen-shots of the proposed GUI used to visualize the journal information.

3.1 Viewing Changes to CAS

The information about CAS changes is visualized by the FS Journal tab as shown in Figure 3.1.
The sequence of AE calls is showed in the top section of the tab and is organized as a hierarchy (the key string defined in the aggregate descriptor will be used to identify the AE). The number next to the AE's name is the total number of FSs added or modified by the AE. For example, "Meeting AE (2 FSs)" means that there are two FSs added or modified by the Meeting AE.

Figure 3.1. For viewing changes to FS in the CAS

Since it is possible to have a long list of FSs (e.g., a few thousands of Token annotations), the list of FSs is compressed within the type name node and the number at the end of the name indicates the total number of added/modified FSs as shown in Figure 3.2.a.

When the type name's nodes are expanded (by clicking on the + sign), the added and modified FSs are revealed as shown in Figure 3.2.b.

We use the (+), (-) and (~) signs to represent added FS, deleted FS, and modified FS, respectively. Note that, for the FS Journal, we don't have the case of deleted FSs.

Figure 3.2.a. List of type nodes containing changed FSs


 

  • No labels