Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Table of Contents

Motivation

Apache Ignite provides a consistency guarantee, each backup should contain the same value for the same key, at least eventually.

But this guarantee can be violated because of ... bugs.


And yes, we had some and fixed them.

But, is there any chances we fixed all and will not write new? :)


So, we have to have additional failover layer to check and fix such violations.

...

The main idea is to provide special " read from cache" mode which will read a value from primary and all backups and will check that values are the same.

In case values differ they should be fixed according to the appropriate strategy.

...

Quorum (When majority wins)

But what if we have 3 or more + different values for the same key at topology?

Primary or oldest node always wins

It's not true.

Bugs able to make every node outdated.

...

Not a 100% guarantee, but Simple!

Seems to be suitable because of each value related to the GridCacheVersion which is comparable.

The strategy provided by the user

Best case.

Can be implemented as an addon.

...

Also, we have to have special process checks we have no inconsistency.

Values should be compared and fixed periodically.
Even in the case, nobody requested this data, but should not cause cooldown (hot pages to be replaced with pages from persistence).

Risks and Assumptions

1) LWW and any other strategy do not guarantee that the correct value will be chosen.

We have to record the event contains all values and the chosen one.

The event will allow to

- got we have an inconsistent state situation

- investigate which value is correct manually and refix if necessary


1.1) Seems, it's not possible to fix any cases.

For any transactional cache we able to perform pessimistic serializable transaction per key.

But, atomic caches cannot be fixed this way.
Some Entry Processor can be used or we can finally implement thread per partition for atomic caches.


2) Consistency guard able to produce the following problems

- replace hot data with cold data

We have to have special configuration property allows or restricts to check (read) data available only at the persistence layer

- decrease throughput/latency metrics

Some throttle feature should be implemented

...