Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

IDIEP-31
Author
Sponsor
Created 4 March 2019
Status

Status
colour

Grey

Green
title

DRAFT

ACTIVE


Table of Contents

Motivation

...

But this guarantee can be violated because of ...

  • bugs,
  • crashes,
  • non-idempotent-operation on data executed more than once locally,
  • cosmic rays,
  • hackers,

...

  • uborshitca with shwabra,

but ... mot most likely because of bugs.

And yes, we had some (eg. IGNITE-10078) and fixed them.

But, is there any chances we fixed all and will not write new? :)So, we have to have additional failover layer to check and fix such violationsthe new? While Ignite is distributed ... such bugs are possible.

Description

Cache.

...

withReadRepair()

The main idea is to provide special read mode which will read a value from primary and all backups and will check that values are the same.

In case values are differ they should be fixed repaired according to the appropriate strategy.

So, the final goal is to have ability to detect and fix consistency issues.

Case #1 - "offline check with eventual fix"

Currently, we able to use "idle_verify" feature to detect broken partitions.

Once we detected them, we should be able to fix them.

Case #2 - "online check"

Another way is to use this feature on each get request.

Case #3 - "background check"

One more way is to check all entries in loop way pemanently.

Possible strategies

Quorum (When majority wins)

...

Best case.

Can be implemented as an addon.

Consistency guard

Also, we have to have special process checks we have no inconsistency.

Values should be compared and fixed periodically, but should not cause cooldown (hot pages to be replaced with pages from persistence).

add-on.

Risks and Assumptions

1) LWW and any other strategy do not guarantee that the correct value will be chosen.

...

- investigate which value is correct manually and refix re-repair if necessary


1.1) Seems, it's not possible to fix any cases.

For any transactional cache caches we able to perform pessimistic serializable transaction per key.

...

Some throttle feature should be implemented

Discussion Links

// Links to discussions on the devlist, if applicable.

Reference Links

Initial review request - http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html

"Idle verify" to "Online verify" discussion - http://apache-ignite-developers.2346864.n4.nabble.com/quot-Idle-verify-quot-to-quot-Online-verify-quot-td41928.html

Second review request - http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html

Reference Links

Cassandra's Read Repair feature - https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html

Idle_verify - https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums// Links to various reference documents, if applicable.

Tickets

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues20
jqlQueryproject = Ignite AND labels IN (iep-31) ORDER BY status
serverId5aa69414-a9e9-3523-82ec-879b028fb15b