...
If GROUP_KEY_CHANGE_PREPARE has not been successfully completed on all nodes, the process is interrupted and must be restarted.
When the process restarts, a new key identifier is generated (an unused key will be removed on finish-phase, when we set the new key for writing).
It is possible that the node will fail after adding a new key, but before setting it for writing (as an *active* key).
This node doesn't know whether the PREPARE phase was successful or not, therefore, it does not know which key is currently being used for writing.
By default, it will try to rejoin with the old key, if the join is rejected, then it should be possible to manually set the correct key identifier using the system property or command line tools.
When a non-baseline node joins a cluster (with baseline change), it cleans up all existing data, so this should be a problem case.
If the node fails during re-encryption, after restarting it must continue re-encryption from the stored offset (if checkpoint failed it should restore physical records from WAL, as usual).
// TBD
// TBD
...