Persist gateway-sender startup-action within Cluster Configuration

To be Reviewed By: January 20th, 2021

Authors: Jakov Varenina

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

Apache Geode members does not persist gateway-sender state after it is changed with pause, stop, resume or start gateway-sender command. Currently server will by default try to start gateway-sender automatically at startup, if it is not configured differently with the deprecated parameter manual-start (see create gateway-sender command). For example, due to above limitation, when user issue stop gateway-sender command (and he might want to do that when the queues are growing wildly to protect the server from using all memory) cannot be sure that after a server restart that gateway-sender will be kept down.

Anti-Goals

What is outside the scope of what the proposal is trying to solve?

Solution

New startup-action parameter with values stop, pause and start will be persisted during the runtime when following commands are issued:

pause gateway-sender --> startup-action="pause"
stop gateway-sender --> startup-action="stop"
start gateway-sender --> startup-action="start"
resume gateway-sender --> startup-action="start"

The startup-action parameter will be persisted within cluster configuration, but only if gfsh commands are executed successfully on at least one of the servers. The only case when state will not be updated and persisted is when commands are executed per member (using --members parameter). The reason behind this is because cluster configuration is only persisted per cluster and group, and not per member. This exception will be documented.

Currently it is possible to configure gateway-sender to always start automatically or to always require manual start (manual-start=true) at member startup. With addition of the new parameter in Cluster Configuration this behavior will be changed in a following way:

If manual-start="true" and startup-action parameter is missing, then gateway sender will require manual start (same as before).
If manual-start is not set (or "false") and startup-action parameter is missing, then gateway sender will be started automatically (same as before).
If parameter startup-action is available in cluster configuration at startup, then gateway-sender will try to reach that state regardless of manual-start parameter value.

Any failures that happen when gateway-sender try to reach desire state will be handled same as now when for example automatic startup fail. The startup-action parameter will remain unchanged in all failure cases at startup.

The behavior of manual-start parameter must be improved in order to comply to above requirements.

Current issue with the manual-start parameter:

Currently when manual-start is configured to be true the colocated persistent parallel gateway sender queue region and buckets are not recovered after server is restarted. Because of that the main persistent region that is colocated with gateway sender queue region cannot reach online status.

Solution to manual-start parameter issue:

When manual-start parameter is true or gateway sender startup-action is stop, then persistent parallel gateway-sender queues should be recovered (if needed) from persistent storage during startup of the server. Queues should be recovered by using the existing mechanism that is also used when gateway sender is automatically recovered (manual-start==false) after server is restarted. In that case parallel gateway sender queue persistent region and buckets are recovered (if needed) right after the main persistent region and buckets are recovered.

Additionally, parallel gateway sender should reach the same state that is has when first started and then stopped by using gfsh commands. In that state parallel gateway sender buckets remain on the servers, but dispatcher threads are stopped and non of the events are stored in queues.

This new behavior will be really beneficial in automated environments like Kubernetes cluster where servers could be automatically restarted.

Changes and Additions to Public Interfaces

none

Performance Impact

not expected

Backwards Compatibility and Upgrade Path

We will introduce backward incompatible change with this feature, because now state parameter will always have advantage over the manual-start parameter as describe in solution part.

This could be avoided with additional parameter in create gateway-sender command which should be then used to enable new behavior.

Prior Art

What would be the alternatives to the proposed solution? What would happen if we don’t solve the problem? Why should this proposal be preferred?

FAQ

Answers to questions you’ve commonly been asked after requesting comments for this proposal.

Errata

What are minor adjustments that had to be made to the proposal since it was approved?

Space shortcuts

Page tree