Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. re-encryption completed for cache group
  2. last WAL segment in which the key was used was removed (on node startup should check if the WAL history has been cleared manually)

Fault tolerance

If GROUP_KEY_CHANGE_PREPARE has not been successfully completed on all nodes, the process is interrupted and must be restarted.
When the process restarts, a new key identifier is generated (an unused key will be removed on finish-phase, when we set the new key for writing).

It is possible that the node will fail after adding a new key, but before setting it for writing (as an *active* key).
This node doesn't know whether the PREPARE phase was successful or not, therefore, it does not know which key is currently being used for writing.
By default, it will try to rejoin with the old key, if the join is rejected, then it should be possible to manually set the correct key identifier using the system property or command line tools.

When a non-baseline node joins a cluster (with baseline change), it cleans up all existing data, so this should be a problem case.

If the node fails during re-encryption, after restarting it must continue re-encryption from the stored offset (if checkpoint failed it should restore physical records from WAL, as usual).

// TBD

Risks and assumptions

// TBD

...