Cache encryption key rotation required in case of it compromising or at the end of crypto period (key validity period). in addition, such feature is required to provide support for encrypt and descrypt existing caches in the future.
Payment card industry data security standard (PCI DSS) requires that key-management procedures include a defined cryptoperiod for each key type in use and define a process for key changes at the end of the defined cryptoperiod(s). An expired key should not be used to encrypt new data, but it can be used for archived data, such keys should be strongly protected (section 3.5 - 3.6) [1].
The maximum recommended key lifetime is 2 years [2], and on average it is supposed to be changed every few months [3].
MS SQL Server provide rotation of database encryption key with background re-encryption of existing data [4]. Oracle and MySQL, out of the box, do not provide an automatic procedure for rotating tablespace keys, master key rotation is supported [5][6], Currently, TDE is being developed for PostgreSQL, but support for tablespace key rotation is not planned [7].
The overall process consists of the following steps
To support multiple keys for reading encrypted data it is required to store a key identifier on each encrypted page and on each encrypted WAL record (see more details). The key identifier is a sequential counter and should be the same on all nodes.
After the FINISH phase is complete, a new encryption key for writing is set on all nodes, i.e. the key change process is formally completed.
Background re-encryption of existing data will be completed sometime in the future, the new "pagesLeftForReencryption" cache group metric can be used to track re-encryption progress ('0' means the process has ended).
The process applies for all existing partitions including index.
Scan all pages from specified range (metaPageId + [offset -> total])
Re-encryption progress is stored into metapage (int offset, int total), updates during the checkpoint.
The process aborts only when a partition is destroyed.
Old group key will be removed when
Reencryption status requires an additional 8 bytes on the meta page of each partition.
Index partition uses PageMetaIO to read/write meta information.
Each other partition uses PagePartitionMetaIO to read/write meta information.
Partition meta starts just after the end of the page meta.
To store an additional 8 bytes partition meta shifted by 8 bytes.
WAL delta records have also been modified to store re-encryption status.
Each encrypted page has reserved free space to store CRC of encrypted data.
The size of this free space depends on the size of the encryption block, but cannot be less than 8 bytes (Ignite default encryption implementation (KeystoreEncryptionSpi) uses AES with 16 bytes block size).
Added 1 byte for encryption key ID on each encrypted page (after CRC).
(WAL records ENCRYPTED_RECORD and ENCRYPTED_DATA_RECORD have been changed accordingly)
The node join is rejected during the encryption key rotation, but this limitation may be revised in the future.
When a node joins the cluster (before/after key rotation), it receives the current encryption keys for the cache groups used for writing (it "rotates" encryption key automatically). If the encryption key is a new key, then the node sets it for writing and starts the background re-encryption process (it starts re-encryption automatically).
Therefore, a node may leave the cluster during a key change, or a node may be absent and rejoin later (it does not matter if the baseline changes or not).
If the node stops/fails during re-encryption, after restarting it continue re-encryption from the stored offset:
// TBD
New method will be introduced
public IgniteFuture<Void> changeCacheGroupKey(Collection<String> cacheOrGroupNames)
Re-encryption process state in CacheGroupMetrics