You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

IDIEP-31
Author
Sponsor
Created 4 March 2019
Status
DRAFT


Motivation

Apache Ignite provides a consistency guarantee, each backup should contain the same value for the same key, at least eventually.

But this guarantee can be violated because of ... bugs.


And yes, we had some and fixed them.

But, is there any chances we fixed all and will not write new? :)


So, we have to have additional failover layer to check and fix such violations.

Description

Cache.withConsistency()

The main idea is to provide special read mode which will read a value from primary and all backups and will check that values are the same.

In case values differ they should be fixed according to the appropriate strategy.

Possible strategies

Quorum (When majority wins)

But what if we have 3+ different values for the same key at topology?

Primary or oldest node always wins

It's not true.

Bugs able to make every node outdated.

LWW (Last Write Wins)

Not a 100% guarantee, but Simple!

Seems to be suitable because of each value related to the GridCacheVersion which is comparable.

The strategy provided by the user

Best case.

Can be implemented as an addon.

Consistency guard

Also, we have to have special process checks we have no inconsistency.

Values should be compared and fixed periodically, but should not cause cooldown (hot pages to be replaced with pages from persistence).

Risks and Assumptions

1) LWW and any other strategy do not guarantee that the correct value will be chosen.

We have to record the event contains all values and the chosen one.

The event will allow to

- got we have an inconsistent state situation

- investigate which value is correct manually and refix if necessary


1.1) Seems, it's not possible to fix any cases.

For any transactional cache we able to perform pessimistic serializable transaction per key.

But, atomic caches cannot be fixed this way.


2) Consistency guard able to produce the following problems

- replace hot data with cold data

We have to have special configuration property allows or restricts to check (read) data available only at the persistence layer

- decrease throughput/latency metrics

Some throttle feature should be implemented

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

// Links to various reference documents, if applicable.

Tickets

key summary type created updated due assignee reporter priority status resolution

JQL and issue key arguments for this macro require at least one Jira application link to be configured

  • No labels