RFC Title

To be Reviewed By: May 7th, 2021

Authors: Alberto Gomez (alberto.gomez@est.tech)

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

There are several situations that require to synchronize the data between two Geode sites connected via WAN:

A Geode system is to be extended with a replica over a WAN by adding a new Geode site. In this case, the new site should load the region data of the first Geode site.
Two Geode sites have become out of sync due to a long-lasting network split. In this case, it might have been necessary to stop replication during the network split in order not to fill up the memory and disk dedicated to queue replication events. When the WAN connection between the Geode sites is restored, there is a need to synchronize both sites in order for the data to be consistent. A possible approach is to pick one of the sites as the master one and load the data of the regions of the winner site into the other site.
The recovery after a failure in a site of a WAN replicated deployment that requires reinstallation and resync of the data.

Geode offers several mechanisms to replicate the data of data regions from one WAN site to another for the above types of situations.

Backup and restore commands
Export and import data commands
Gemtouch tool (https://github.com/vmware-archive/gemfire-toolkit).

These mechanisms, nevertheless, suffer from some shortcomings that make them far from ideal to be used in the described situations.

Backup/restore:
- It only works for persistent regions.
- It does not support the restoration of a backup taken from a system in a newer version of Geode into a system in an older version.
- It is not intended to be used to restore a backup from one site to another but rather to restore a previous backup of one site into the same site.
- It cannot guarantee consistency of data across regions.
- If the data in the region was populated using transactions, there would be no guarantee that those transactions are honored in the region unless there was not traffic ongoing when the backup was taken (as transaction events are not written atomically on disk).
- The whole process to recover the WAN replication between the sites after the restore without stopping the service is cumbersome. It requires a procedure to clean replication queues in the master site, restart the replication right before the backup is taken and have available memory/disk resources to hold replication events while the remote site is not yet restored.
Import/export:
- As it is based on the Snapshot service, it does not guarantee consistency of the data unless it is run when there is no traffic running.
- It cannot guarantee consistency of snapshots across regions.
- It suffers the same problem as backup and restore with respect to consistency of transactional data.
- It suffers the same problems as backup and restore with respect to the process to recover the WAN replication without stopping the service.
Gemtouch:
- Gemtouch is an OpenSource tool not part of Geode and no longer maintained.
- As this tool replicates the entries of a region by means of generating a get and a put for every entry, it causes the undesired modification of the region entries (at least of the timestamp).
- Also, even if the gets and puts of every entry is done inside a transaction, the process is subject to race conditions if it is run while non-transactional traffic is being sent to the Geode cluster. Such race conditions could provoke that traffic writes are overwritten by Gemtouch writes if the former writes are not done inside transactions.

This RFC tries to solve the specific problem of putting back into service (or for the first time) a Geode site that is to be part of a Geode WAN replicated system, that needs to load the data of the regions of the other Geode sites already running and at the same time get the new events generated via WAN replication in the source site.

The solution should be suitable for any WAN replication topology.

Anti-Goals

This RFC does not try to solve the problem of synchronizing on the fly two WAN replicated sites that may have diverged on the fly. The use case aimed at consists of having one or more WAN sites online with data and clients connected to them and a another site with no clients connected that needs to get the data from the other sites.

Solution

To overcome the different disadvantages of each of the mechanisms enumerated in the Problem section and to offer a more integrated solution in Geode, this RFC proposes to implement a command that replicates the data from a region in a site to another site that operates in a similar way as the Gemtouch tool but without the inconveniences shown above.

This gfsh command would allow to replicate a region from one site into another by specifying the region name and the senderId to be used to replicate the events.

The command would make use of an internal function that would run on every member of the Geode cluster hosting the region. This function, for each entry in the region, would create an event, equivalent to the one created when the entry was created/updated (with the same timestamp, key and value) and pass it to the gateway sender to replicate it.

The way to generate the event would be different for partitioned regions than for replicated regions but once the event is correctly created, a call to the distribute method of the AbstractGatewaySender should do the work needed to replicate the region event. Also, in the case of replicated regions, events should only be generated on the member that hosts the primary gateway sender.

The command/function could accept as a parameter the maximum replication rate (events/sec) in order limit the resources consumed while there is traffic running.

It would be desirable that another command is provided in order to stop an ongoing replication started by the replicate region command.

The command will assume that the destination site has the regions to be replicated from another site already created and that it has gateway receivers to get the data from the source site.

The command will also assume that during the replication of the data there will be no clients connected to destination site until the copy has finished. If, for any reason, the replication fails or does not finish, the command will have to be run again. The command should provide information about the result of the execution.

Events replicated by this command, must not be notified to the clients in the remote site and should not be sent to other sites from the remote site.

Changes and Additions to Public Interfaces

A new gfsh command is proposed to execute the replication of the region data.
The command would accept as input the region name, the senderId to be used and the maximum replication rate:

gfsh> replicate-region --region-name=<region name> --senderId=<sender-id> [--max-rate=<rate in events/sec>]

The command to stop an ongoing execution of the above command could be implemented as another command or reusing the above one with a parameter indicating that intention (–cancel option):

gfsh> replicate-region --region-name=<region name> --senderId=<sender-id> --cancel

Performance Impact

If the replicate-region command is run while the servers are running traffic it is expected to add some performance penalty in the traffic due to the consumption of resources of the command.

Backwards Compatibility and Upgrade Path

No impacts foreseen

Prior Art

What would be the alternatives to the proposed solution? What would happen if we don’t solve the problem? Why should this proposal be preferred?

FAQ

Answers to questions you’ve commonly been asked after requesting comments for this proposal.

Errata

Instead of putting events in the sender's queue, the command will put events directly in batches and pass them to the gateway sender to be sent to the remote site.
The name of the command has been changed to: wan-copy region

Space shortcuts

Page tree

RFC Title

Problem

Anti-Goals

Solution

Changes and Additions to Public Interfaces

Performance Impact

Backwards Compatibility and Upgrade Path

Prior Art

FAQ

Errata

21 Comments

Barry Oglesby

Alberto Gomez

Udo Kohlmeyer

Alberto Gomez

Barry Oglesby

Alberto Gomez

Anthony Baker

Alberto Gomez

Dan Smith

Alberto Gomez

Udo Kohlmeyer

Alberto Gomez

Anthony Baker

Anthony Baker

Dan Smith

Darrel Schneider

Alberto Gomez

Alberto Gomez

Dan Smith

Darrel Schneider

Alberto Gomez