Avoid the queuing of dropped events by the primary gateway sender when the gateway sender is stopped
To be Reviewed By: July 9th, 2020
Authors: Alberto Gomez (alberto.gomez@est.tech)
Status: Draft | Discussion | Active | Dropped | Superseded
Superseded by: N/A
Related: N/A
Problem
Gateway senders drop all events received when they are stopped. Nevertheless, primary gateway senders, while stopped, store all events received in the tmpDroppedEvents
member variable of the AbstractGatewaySender
class. These events are stored so that they can be sent later (when the primary gateway sender is started) to the secondary gateway senders in order for them to remove those events from their queues. If it were not so, secondary gateway senders could have events in their queues that would never be removed.
This feature was implemented in
as a solution to avoid secondary gateway senders to leave un-drained events after GII.This solution works well when stopped gateway senders are not to remain in that state for a long time, e.g., when they are stopped but in the process of starting. But, if a gateway sender is stopped to be left in that state for some time, the incoming events reaching the primary gateway sender will be stored in the mentioned member variable of AbstractGatewaySender
and could eventually provoke a heap exhaustion error. Moreover, dropped events stored while the gateway sender is stopped will not be queued by secondary gateway senders which makes the storing of the dropped events in the primary gateway sender unnecessary.
Stopping a gateway sender is an action that may be used to avoid the filling of gateway sender queues in long lasting split brain situations. But, given the current status of the implementation, it would not be effective because incoming events will still be stored by the primary gateway senders, using at least the same amount of memory (if not more if overflow to disk is configured) as the events queued by the sender when it is running, and with a very high risk of heap memory exhaustion.
Anti-Goals
As described above, dropped events in the primary gateway sender are stored in a member variable. It is out of the scope of this RFC to change how those events are stored.
Solution
The solution proposes to change the primary gateway sender so that it does not store dropped events when it is stopped explicitly (not while starting). The reason is that these events could never end in the queue of any secondary gateway sender and will use memory unnecessarily.
In order to do so, it is proposed to add a new boolean member variable to the AbstractGatewaySender
that will tell if the primary gateway sender must store dropped events or not.
- This member variable will be set to false (do not store dropped events) in the primary and secondary gateway sender instances in a second step added to the the gfsh
stop gateway sender
command, right after the stop of the gateway senders has been completed. - This member variable will be set to true (store dropped events) in the primary and secondary gateway sender instances in a prior step added to the
start gateway sender
gfsh command, before the start of the senders is executed.
A draft PR of the solution can be found here: https://github.com/apache/geode/pull/5348
Changes and Additions to Public Interfaces
No changes to public interfaces are proposed.
Performance Impact
As the proposal implies changing the implementation of the gfsh start gateway sender
and the stop gateway sender
commands to be done in two steps, these commands may be slightly slower but not significantly.
Backwards Compatibility and Upgrade Path
The proposal does not affect the rolling upgrade and has not impacts in the regular rolling upgrade process.
Prior Art
-
FAQ
Answers to questions you’ve commonly been asked after requesting comments for this proposal.
Errata
What are minor adjustments that had to be made to the proposal since it was approved?