Avoid the queuing of dropped events by the primary gateway sender when the gateway sender is stopped
To be Reviewed By: July 9th, 2020
Authors: Alberto Gomez (alberto.gomez@est.tech)
Status: Draft | Discussion | Active | Dropped | Superseded
Superseded by: N/A
Related: N/A
Problem
Gateway senders drop all events received when they are stopped. Nevertheless, primary gateway senders, while stopped, store all events received in the tmpDroppedEvents
member variable of the AbstractGatewaySender
class. These events are stored so that they can be sent later (when the primary gateway sender is started) to the secondary gateway senders in order for them to remove those events from their queues. If it were not so, secondary gateway senders could have events in their queues that would never be removed.
This feature was implemented in
as a solution to avoid secondary gateway senders to leave un-drained events after GII.This solution works well when stopped gateway senders are not to remain in that state for a long time, e.g., when they are stopped but in the process of starting. But, if a gateway sender is stopped to be left in that state for some time, the incoming events reaching the primary gateway sender will be stored in the mentioned member variable of AbstractGatewaySender
and could eventually provoke a heap exhaustion error. Moreover, dropped events stored while the gateway sender is stopped will not be queued by secondary gateway senders which makes the storing of the dropped events in the primary gateway sender unnecessary.
Stopping a gateway sender is an action that may be used to avoid the filling of gateway sender queues in long lasting split brain situations. But, given the current status of the implementation, it would not be effective because incoming events will still be stored by the primary gateway senders with higher memory consumption than the events when the sender is running (these may be overflown to disk) and with a very high risk of heap memory exhaustion.
Anti-Goals
As described above, dropped events in the primary gateway sender are stored in a member variable. It is out of the scope of this RFC to change how those events are stored.
Solution
The solution proposed aims at not storing dropped events when a gateway sender is stopped and not in the starting process, given that these events could never end in the queue of any secondary gateway sender and will use memory unnecessarily.
In order to do so, it is proposed to add a new boolean member variable in the AbstractGatewaySender
that will tell if the primary gateway sender must store dropped events or not.
This flag will be set to false (do not store dropped events) in all gateway sender instances (primary and secondaries) after a stop gateway sender
command using gfsh has successfully completed. And this flag will be set to true in all gateway sender instances (primary and secondaries) as a prior step to the start gateway sender
gfsh command.
A draft PR of the solution can be found here: https://github.com/apache/geode/pull/5348
Changes and Additions to Public Interfaces
No changes to public interfaces are proposed.
Performance Impact
As the proposal implies changing the implementation of the gfsh start gateway sender
and the stop gateway sender
commands to be done in two steps, these commands may be slightly slower but not significantly.
Backwards Compatibility and Upgrade Path
The proposal does not affect the rolling upgrade and has not impacts in the regular rolling upgrade process.
Prior Art
-
FAQ
Answers to questions you’ve commonly been asked after requesting comments for this proposal.
Errata
What are minor adjustments that had to be made to the proposal since it was approved?