You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

To be Reviewed By: 1 March 2022

Authors: Mario Ivanac

Status: Draft | Discussion | Active | Dropped | Superseded

Superseded by: N/A

Related: N/A

Problem

For example, we have configured replicated region with 3 or more servers, and client is configured with read timeout set to value same or smaller than member timeout.

In case while client is putting data in region,  one of replicated servers is shutdown, it is observed that we will have data inconsistency.

As a result, you will see that  part of data is written in server connected with client, but in remaining replicated servers it is missing.


After analysis, it is observed, as client puts data in connected server, that server will replicate data to other servers. If any server in chain of replicated servers is restarted, replication will be halted until server is declared dead.

Since clients read-timeout is lower then member-timeout, client will continue with insertion of data. And again, replication will be halted to replicated servers. At some moment, restarted member will be declared dead, and now we will have halted events  to put in replicated servers, and in parallel new events from client. If new event from client is replicated to other servers, before halted events, then in moment that halted events are notified to replicated servers, they will assume that these are duplicate events (compare sequence Id of received event  to last stored sequence ID), and will not store this data.


In linked PR you can find distributed test for reproduction of problem.

Anti-Goals

Solution

Add logic, for storing of last N (configurable value) key - sequence Id pairs, instead of only last sequence Id.

User will configure size of last stored "key - sequence Id" Map.

Default value is 0, feature disabled

Value greater than 0, feature activated (calculate according to formula =  3 x member-timeout / read-timeout)

PR with proposed solution: https://github.com/apache/geode/pull/7214

Changes and Additions to Public Interfaces

None

Performance Impact

None

Backwards Compatibility and Upgrade Path

No upgrade or backwards compatibility issues.

Prior Art

What would be the alternatives to the proposed solution? What would happen if we don’t solve the problem? Why should this proposal be preferred?

FAQ

Answers to questions you’ve commonly been asked after requesting comments for this proposal.

Errata

What are minor adjustments that had to be made to the proposal since it was approved?

  • No labels