Cache encryption key rotation required in case of it compromising or at the end of crypto period (key validity period). in addition, such feature is required to provide support for encrypt and descrypt existing caches in the future.
Local partition re-encryption strategy is similar to partition snapshotting - create partition snapshot re-encrypted with the new key and then swap the original partition file with the new one.
Cluster-wide process consists of the following steps:
After completion of the key change preparation process, a new distributed process is initiated to complete the key change.
The discovery event from the distributed process pushes a new exchange task to the exchange worker to start PME (PME is required to prevent reordering of WAL records when key will be changed and to simplify initial design, this could and will be changed in the future)
While updates are blocked each local node:
After changing the cache encryption key, its entries in the WAL will be encrypted with the new key. However, it must be possible to read older WAL records (at least to support historical rebalance).
For each cache, instead of a key, it is necessary to keep a history of keys in the form WALPointer -> key
(stored the maximum pointer for which the associated key is applicable).
When removing a WAL segment to which WALPointer(s) refers - key(s) should be also removed.
When the WAL is cleared, respectively, the key history must also be cleared (except the last one).
By canceling the re-encryption procedure is meant clearing all temporary data.
If a node crashes during the replacement of the partitions, the original backup copies of the partitions are restored when the node starts.
If major topology changes during key rotation - cancelling whole procedure.
If cache is stopping during re-encryption - cancelling whole procedure, other minor topology changes should not affect re-encryption procedure.
(TBD) When baseline node with data joins the cluster and the cache group has a different key:
1. If historical rebalancing is not applicable encryption key will be changed when node joins and the partitions are cleared.
2. If historical rebalancing is applicable - existing data should be re-encrypted with the new key before(?) node joins the cluster.
TBD
TBD
Re-encryption process state
Key rotation required in case of it compromising or at the end of crypto period(key validity period).
...
New processes:
...
Cache key rotation.
...
New administrator commands:
Current state of cache key rotation: node -> group name -> status -> encryption key hash.
...
...
On message receive following actions are executed:
...
Process state: IN PROGRESS.
...
Further WAL records are encrypted with the new key.
...
Thread pool configured in IgniteConfiguration.
...
For each partition file.
...
The file is read page by page.
...
Page is unlocked.
...
Сompletion of partition re-encryption is accompanied by adding a WAL entry
...
Process state: FINISHED.
Motivation:
...
Memory footprint [Thread count]*[page size]
...
Minor affect on regular data operations.
...
To decrypt page we have to do the following steps:
...
If page not reencrypted yet we use old key for decryption.
...
If page reported as reencrypted(Bloom filter may be false positive) we:
Try to decrypt page with new key.
If fail we should try to use old key.
...
Unblock page from reencryption.
...
Scan partition from the beginning to last progress record point and just add eache page to reencrypted pages set.
...
After it we have X pages that MAY BE reencrypted(and may be not). We should find fir not reencrypted page:
Trying to decrypt page with new key.
If fails then page is found.
We should continue reencryption process starting from it.
Administrator initiates process completion via interface by using “cache key removal” command.
Design assume, administrator will check that all nodes successfully change cache key and reencrypt all pages and all required nodes are alive.
Administrator initiates process via some kind of user interface(CLI, Visor, WebConsole, JMX, etc).
Message is sent by discovery.
Message should contain:
new cache key hash.
When server node processed message following actions are executed:
Received cache key hash compared with known cache key hash.
Previous cache key removed from MetaStore.
...
.
TBD
Jira server ASF JIRA columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key IGNITE-12843