Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Avoid the queuing of dropped events by the primary gateway sender when the gateway sender is stopped

To be Reviewed By: July 9th, 2020

...

Superseded by: N/A

Related: N/A

Problem

Primary Gateway senders drop all events received when they are stopped. Nevertheless, primary gateway senders, while stopped, store all events received in the ```tmpDroppedEvents``` member variable of the ```AbstractGatewaySender``` class. These events are stored so that they can be sent later (when the primary gateway sender is started) to the secondary gateway senders in order for them to remove those events from their queues. If it were not so, secondary gateway senders could have events in their queues that would never be removed.

Anti-Goals

What is outside the scope of what the proposal is trying to solve?

Solution

This feature was implemented in

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyGEODE-4942
  as a solution to avoid secondary gateway senders to leave un-drained events after GII.

This solution works well when stopped gateway senders are not to remain in that state for a long time, e.g., when they are stopped but in the process of starting. But, if a gateway sender is stopped (for example using gfsh) to be left in that state for some time, the incoming events reaching the primary gateway senders will be stored in the mentioned member variable of ```AbstractGatewaySender``` and eventually will provoke a heap exhaustion error. Moreover, dropped events stored while the gateway sender is stopped will not be queued by secondary gateway senders which makes the storing of the dropped events in the primary gateway sender unnecessary.

Stopping a gateway sender is an action that may be used to avoid the filling of gateway sender queues in long lasting split brain situations. But, given the current status of the implementation, it would not be effective because incoming events will still be stored by the primary gateway senders with higher memory consumption than the events when the sender is running (these may be overflown to disk) and with a very high risk of heap memory exhaustion.

Anti-Goals

As described above, dropped events in the primary gateway sender are stored in a member variable. It is out of the scope of this RFC to change how those events are stored.

Solution

The solution proposed aims at not storing dropped events when a gateway sender is stopped and not in the starting process, given that these events could never end in the queue of any secondary gateway sender and will use memory unnecessarily.

In order to do so, it is proposed to add a new boolean member variable in the ```AbstractGatewaySender``` that will tell if the primary gateway sender must store dropped events or not.

This flag will be set to false (do not store dropped events) in all gateway sender instances (primary and secondaries) after a ```stop gateway sender``` command using gfsh has successfully completed. And this flag will be set to true in all gateway sender instances (primary and secondaries) as a prior step to the ```start gateway sender``` gfsh command.

A draft PR of the solution can be found here: https://github.com/apache/geode/pull/5348Describe your solution and how it’s going to solve the problem. This is likely the largest section of your proposal and might even include some high-level diagrams if you are proposing code changes. While all important aspects need to be covered, also keep in mind that shorter documents are more likely to be read.

Changes and Additions to Public Interfaces

If you are proposing to add or modify public interfaces, those changes should be outlined here in detailNo changes to public interfaces are proposed.

Performance Impact

As the proposal implies changing the implementation of the gfsh ```start gateway sender``` and  the ```stop gateway sender``` commands to be done in two steps, these commands may be slightly slower but not significantly.Do you anticipate the proposed changes to impact performance in any way? Are there plans to measure and/or mitigate the impact?

Backwards Compatibility and Upgrade Path

Will The proposal does not affect the rolling upgrade and has not impacts in the regular rolling upgrade process work with these changes?

How do the proposed changes impact backwards-compatibility? Are message or file formats changing?

Is there a need for a deprecation process to provide an upgrade path to users who will need to adjust their applications?.

Prior Art

What would be the alternatives to the proposed solution? What would happen if we don’t solve the problem? Why should this proposal be preferred?-

FAQ

Answers to questions you’ve commonly been asked after requesting comments for this proposal.

Errata

What are minor adjustments that had to be made to the proposal since it was approved?