To be Reviewed By: 22 Aug, 2019
Authors: Anil Gingade
Status: Draft | Discussion | Active | Dropped | Superseded
Superseded by: N/A
Related: N/A
Problem Jira showSummary false server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key GEODE-7121
showSummary | false |
---|---|
server | ASF JIRA |
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
key | GEODE-7121 |
The Geode system requires AEQs to be configured before regions are created. If an AEQ listener is operating on a secondary region, this could cause listener to operate on a region which is not yet created or fully initialized (for region with co-located regions) which could result in missing events or dead-lock scenario between region creation threads (creating regions and its co-located regions in the listener). This scenario is likely to happen during persistence recovery; when AEQs are created in the start, the recovered AEQ events are dispatched immediately, thus invoking the AEQ listeners.
Anti-Goals
None
Solution
The proposed solution is to provide a way to control dispatching AEQ events to the AEQ Listeners, this could be done by adding "pause" and "resume" capability to the AEQ, which will allow application to decide when to dispatch events to the listeners.
The proposal is similar to existing "pause" and "resume" behavior on the GatewaySender, on which the AEQ is based on (AEQ implementation is a wrapper around GatewaySender).
Changes and Additions to Public Interfaces
The proposed APIs are:
On "AsyncEventQueueFactory" interface -
AsyncEventQueueFactory pauseEventDispatchToListener(); // This causes AEQ to be created with paused state.
On "AsyncEventQueue" interface -
boolean resumeEventDispatchToListener(); returns true or false if the event dispatch is resumed successfully.
The constraints on the pauseEventDispatchToListener() will remain similar to as in "GatewaySender.pause()" :
"It should be kept in mind that the events will still be getting queued into the queue. The scope of this operation is the VM on which it is invoked. In case the AEQ is parallel, the AEQ will be paused on individual node where this API is called and the AEQ on other VM's can still dispatch events. In case the AEQ is not parallel, and the running AEQ on which this API is invoked is not primary then primary AEQ will still continue dispatching events."
Performance Impact
This will have similar performance and resource implication as with the "GatewaySender.pause()" functionality. If the AEQ is not resumed or kept in "pause" state for long, it may start consuming the configured memory and overflow it into disk and may cause disk full scenario.
Backwards Compatibility and Upgrade Path
Impact with rolling upgrade:
As the api is applicable at individual VM level, there is no message serialization changes involved. And only applicable to the events getting dispatched to the listeners on that VM. And the AEQ which are replicated (for redundancy) continues to work as before.
Backward compatibility requirements:
None. The AEQs are configured and managed at the server side. There is no messaging involved between client/server.
Disk formatting changes:
None.
Deprecation and Application Changes:
None. If needed, the existing application can be modified to control event dispatch with AEQ listener.
Prior Art
Without this, the AEQ listeners operating on other regions could experience missing events or dead lock, if there are co-located regions.
This approach is simple and can take advantage of the existing functionality that is already supported in GatewaySender on which AEQ is based on.
FAQ
Answers to questions you’ve commonly been asked after requesting comments for this proposal.
Errata
What are minor adjustments that had to be made to the proposal since it was approved?