Cache encryption key rotation required in case of it compromising or at the end of crypto period (key validity period). in addition, such feature is required to provide support for encrypt and descrypt existing caches in the future.
Payment card industry data security standard (PCI DSS) requires that key-management procedures include a defined cryptoperiod for each key type in use and define a process for key changes at the end of the defined cryptoperiod(s). An expired key should not be used to encrypt new data, but it can be used for archived data, such keys should be strongly protected (section 3.5 - 3.6) [1].
The maximum recommended key lifetime is 2 years [2], and on average it is supposed to be changed every few months [3].
Out of the box, Oracle and MySQL do not provide an automatic procedure for rotating tablespace keys, master key rotation is supported [4][5], MS SQL Server provide rotation of database encryption key with background re-encryption of existing data [6]. Currently, TDE is being developed for PostgreSQL, but support for tablespace key rotation is not planned [7].
Local partition re-encryption strategy is similar to partition snapshotting - create partition snapshot encrypted with the new key and then replace the original partition file with the new one.
Cluster-wide process consists of the following steps:
After completion of the key change preparation process, a new distributed process is initiated to complete the key change.
The discovery event from the distributed process pushes a new exchange task to the exchange worker to start PME free switch (it is required to prevent reordering of WAL records when key will be changed and to simplify initial design, this could be changed in the future)
While updates are blocked each node:
After changing the encryption key, new WAL records will be encrypted with the new key. However, it must be possible to read older WAL records (at least to support historical rebalance).
reference documentsFor each cache, instead of a key, it is necessary to keep a history of keys in the form WALPointer -> key
(stored the maximum pointer for which the associated key is applicable).
When removing a WAL segment to which WALPointer(s) refers - key(s) should be also removed.
When the WAL is cleared, respectively, the key history must also be cleared (except the last one).
The re-encryption procedure does not start if there are LOST partitions in the cache group or any baseline node is missing (this is a limitation of the initial design and should be improved in the future).
The cache stop operation is rejected, for cache groups in which re-encryption is performed.
By canceling the re-encryption procedure is meant clearing all temporary data.
If a node crashes during the replacement of the partitions, the original backup copies of the partitions are restored when the node starts.
If major topology changes during key rotation - cancelling whole procedure.
Minor topology changes should not affect re-encryption procedure.
If the partition is scheduled for eviction during re-encryption, cancel the re-encryption of this partition.
TBD
TBD
Re-encryption process state.