To be Reviewed By: 22 Aug, 2019

Authors: Anil Gingade

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem Unable to render Jira issues macro, execution error.

The Geode system requires AEQs to be configured before regions are created. If an AEQ listener is operating on a secondary region, this could cause listener to operate on a region which is not yet created or fully initialized (for region with co-located regions) which could result in missing events or dead-lock scenario between region creation threads (creating regions and its co-located regions in the listener). This scenario is likely to happen during persistence recovery; when AEQs are created in the start, the recovered AEQ events are dispatched immediately, thus invoking the AEQ listeners.

Anti-Goals

None

Solution

The proposed solution is to provide a way to control dispatching AEQ events to the AEQ Listeners, this could be done by adding "pause"  and "resume" capability to the AEQ, which will allow application to decide when to dispatch events to the listeners.

The proposal is similar to existing "pause" and "resume" behavior on the GatewaySender, on which the AEQ is based on (AEQ implementation is a wrapper around GatewaySender).

Changes and Additions to Public Interfaces

The proposed APIs are:

On "AsyncEventQueueFactory" interface -

AsyncEventQueueFactory pauseEventDispatchToListener();  // This causes AEQ to be created with paused state.

On "AsyncEventQueue" interface -

boolean resumeEventDispatchToListener(); returns true or false if the event dispatch is resumed successfully.


The constraints on the pauseEventDispatchToListener() will remain similar to as in "GatewaySender.pause()" :

"It should be kept in mind that the events will still be getting queued into the queue. The scope of this operation is the VM on which it is invoked. In case the AEQ is parallel, the AEQ will be paused on individual node where this API is called and the AEQ on other VM's can still dispatch events. In case the AEQ is not parallel, and the running AEQ on which this API is invoked is not primary then primary AEQ will still continue dispatching events."

Performance Impact

This will have similar performance and resource implication as with the "GatewaySender.pause()" functionality. If the AEQ is not resumed or kept in "pause" state for long, it may start consuming the configured memory and overflow it into disk and may cause disk full scenario.

Backwards Compatibility and Upgrade Path

Impact with rolling upgrade:

As the api is applicable at individual VM level, there is no message serialization changes involved. And only applicable to the events getting dispatched to the listeners on that VM. And the AEQ which are replicated (for redundancy) continues to work as before.

Backward compatibility requirements:

None. The AEQs are configured and managed at the server side. There is no messaging involved between client/server.

Disk formatting changes:

None.

Deprecation and Application Changes:

None. If needed, the existing application can be modified to control event dispatch with AEQ listener.

Prior Art

Without this, the AEQ listeners operating on other regions could experience missing events or dead lock, if there are co-located regions.

This approach is simple and can take advantage of the existing functionality that is already supported in GatewaySender on which AEQ is based on.

FAQ

Answers to questions you’ve commonly been asked after requesting comments for this proposal.

Errata

What are minor adjustments that had to be made to the proposal since it was approved?

  • No labels

1 Comment

  1. Summary from email discussion thread:

    Thanks for all the great feedback and comments.

    API Name change:
    Suggestion: *startPaused*,*setManualStart*, *startWithEventDispatcherPaused*?, createPaused()

    Start/Stop behavior:
    - Manual start has caused a lot of trouble over the years.
    - Explain starting AEQ in a paused state is different from creating gateway senders with manual start


    Yes, we can change/adopt the name which is meaningful with the functionality. The name suggested "pauseEventDispatchToListener()" is to make its usage/action clear; and address any ambiguity with its usage; between adding/removing event from AEQ and dispatching events to AEQ listener. 

    To emphasis, this is not same as GatewaySender manual start (start/stop); the manual start/stop is with enqueuing and dequeuing events from GatewaySender itself, and as Mike pointed out there are issues with this (during recovery with parallel gateway) and the reason its been deprecated. 

    The new functionality is similar to the "pause" and "resume" operation on the GatewaySender. Except that here with the new api, the AEQ is created with pause state.

    The new api doesn't control adding and removing event from AEQ. Its to control dispatching event to the listener. When created in paused state, the events are continued be added into the AEQ and removed from it (expiry). The new API will allow applications to create/manage any required state/resource for the events before processing those events in the application code. 

    Cache level setting:
    - Will it be more feasible if we can set the flag at cache level.

    The cache level configuration affects all the AEQs, which may not be the requirement. Having at AEQ level help the application to use this capability only at the required AEQ, gives more controlling capability.