MS SQL Server provide rotation of database encryption key with background re-encryption of existing data [4]. Oracle and MySQL, out of the box, do not provide an automatic procedure for rotating tablespace keys, master key rotation is supported [5][6], Currently, TDE is being developed for PostgreSQL, but support for tablespace key rotation is not planned [7].

Partition re-encryption strategies

At the moment, encryption occurs at the pagememory level, when a page is written to the pagestore or WAL.

Copy with re-encryption.

This strategy is similar to partition partition snapshotting - create partition snapshot encrypted with the new key and then replace the original partition file with the new one.

In place re-encryption

Sequentially read all the pages from the datastore and mark as dirty, log them into WAL. Checkpointer writes the pages encrypted with the new key.

This strategy requires changing the format of the encryption page to store the identifier (number) of the encryption key (for recovery). Each encrypted page has a space reserved for a page crc (4 bytes) that has an encryption block size (minimum 8 bytes).

Comparison

...

Performance^{(rough estimate)}

...

Implementation complexity ^{(rough estimate)}

...

Description

The

In place re-encryption design.

The overall process consists of the following steps

Rotate cache group key - add a new encryption key on each node and set it for writing.
Schedule background re-encryption for archived data and cleanup the old key when it completes.

...

To support multiple keys for reading encrypted data it is required to store a key identifier on each encrypted page and on each encrypted WAL record (see more details). The key identifier is a sequential counter , and should be the same on all nodes.

...

all

...

nodes

...

.

Start distributed process CACHE_GROUP_KEY_CHANGE_PREPARE, each node
1. verifies that reencryption re-encryption is not in progress for the specified cache group.
2. ensures that new key identifier does not existsexist
3. adds new key
After successful completion of PREPARE, start distributed process CACHE_GROUP_KEY_CHANGE_FINISH, each node
1. saves logical WAL record (ENCRYPTION_STATUS_RECORD) with current groups and key identifiers to start re-encryption after logical recovery
2. save the new key in the metastore (as inactive key)
3. sets it sets new key for writing
4. adds the mapping "WAL segment -> *old* key identifier" (to safely cleanup this previous key in the future)
5. stores current pages count as total pages for background re-encryption (on applicable partitions).
6. save current keys and WALl mappings into the metastore.
7. starts background re-encryption of an existing data.

After the FINISH phase is complete, a new encryption key for writing is set on all nodes, i.e. the key change process is formally completed.

Background re-encryption of existing data will be completed sometime in the future, the new "reencryptionFinished" cache group metric can be used to track re-encryption progress.

Background re-encryption

Process The process applies only for OWNING/MOVING partitions that are not currently clearedall existing partitions including index.

Every time the cache group key changes, we store the current page count of the partition in the meta page (this value is used as the total page count to re-encrypt).

Scan all pages from specified range (metapageid metaPageId + [offset -> total])

acquire /page
1. if page is not dirty
  1. lock page
log into wal (PageSnapshot?)
1. 1. unlock page (dirty=true)
release page

Re-encryption progress is stored into metapage (int offset, int total), it updates during the checkpoint.

The process aborts for partition that only when a partition is destroyed.

At node startup, during partition initialization, if the total number of pages for re-encryption is greater than zero - this cache group is scheduled for evicting/clearing during re-encryption.

Cleanup old key

Old cache group encryption key will be removed when

re-encryption completed for cache group (and after that at least one checkpoint was successfully completed)
last WAL segment in which the key was used was removed (on node startup should check if the WAL history has been cleared manually)

Fault tolerance

If GROUP_KEY_CHANGE_PREPARE has not been successfully completed on all nodes, the process is interrupted and must be restarted.
When the process restarts, a new key identifier is generated (an unused key will be removed on finish-phase, when we set the new key for writing).

It is possible that the node will fail after adding a new key, but before setting it for writing (as an active key).
This node doesn't know whether the PREPARE phase was successful or not, therefore, it does not know which key is currently being used for writing.
By default, it will try to rejoin with the old key, if the join is rejected, then it should be possible to manually set the correct key identifier using the system property or command line tools.

When a non-baseline node joins a cluster (with baseline change), it cleans up all existing data, so this should be a problem case.

is removed

Changes in memory page format

PageMetaIO and PagePartitionMetaIO

Reencryption status requires an additional 8 bytes on the meta page of each partition.
Index partition uses PageMetaIO to read/write meta information (page type T_META).
Each other partition uses PagePartitionMetaIO to read/write meta information (page type T_PART_META).

Partition meta starts just after the end of the page meta.

draw.io Diagram

border	true

diagramName	pagemeta_old
simpleViewer	false
width
links	auto
tbstyle	top
lbox	true
diagramWidth	501
revision	1

To support binary compatibility and keep code clean we creating a new successor of PageMetaIO - PageMetaIOV2 with the same type T_META.

We converting all existing T_META pages into a new version.

We storing additional 8 bytes at the end of each T_META and T_PART_META memory pages.

draw.io Diagram

border	true

diagramName	PagePartMetaModV2
simpleViewer	false
width
links	auto
tbstyle	top
lbox	true
diagramWidth	441
revision	5

draw.io Diagram

border	true

diagramName	PageIndexMetaModV2
simpleViewer	false
width
links	auto
tbstyle	top
lbox	true
diagramWidth	611
revision	5

WAL delta records have also been modified to store re-encryption status.

Encrypted (persisted) page

Each encrypted page has reserved free space to store CRC of encrypted data.
The size of this free space depends on the size of the encryption block, but cannot be less than 8 bytes (Ignite default encryption implementation (KeystoreEncryptionSpi) uses AES with 16 bytes block size).

Added 1 byte for encryption key ID on each encrypted page (after CRC).

draw.io Diagram

border	true

diagramName	encrypted_page
simpleViewer	false
width	400
links	auto
tbstyle	top
lbox	true
diagramWidth	317
revision	2

(WAL records ENCRYPTED_RECORD and ENCRYPTED_DATA_RECORD have been changed accordingly)

Fault tolerance

Distributed key rotation

Node join is rejected during the key rotation, but this limitation may be revised in the future.

When a node joins the cluster (before/after key rotation), it receives the current encryption keys for the cache groups used for writing. If the encryption key is a new key, then the node sets it for writing and starts the background re-encryption process (in other words, the node automatically "rotates" the encryption key. when joining a cluster, if necessary).
Therefore, a node may leave the cluster during a key change, or a node may be absent and rejoin later (it does not matter if the baseline changes or not), it will receive a new key and schedule re-encryption, if necessary.

Background re-encryption

If the node stops/If the node fails during re-encryption, after restarting it must continue re-encryption from the stored offset (if :

If checkpoint failed it should restore physical records from WAL, as usual.
If checkpoint was not invoked re-encryption is started from the beginning using saved logical WAL record (that was recorded during key rotation).

// TBD

Risks and assumptions

// TBD

...

Background re-encryption

...

Cluster-wide process consists of the following steps:

Prepare changing the encryption key - send new key and start re-encryption task on each affinity node.
Finish changing the encryption key - swap partitions and replace cache encryption key in the metastore.

Prepare changing the encryption key

The node initiator generates new encryption key(s) for cache group(s) and begins new distributed process to start a new cache encryption key change operation by sending an initial discovery message with the list of re-encrypted cache groups and encrypted keys.
The distributed process configured action initiates a new local re-encryption task on each node.

Local re-encryption task

Start copying of each partition file (including index) to the target directory with the re-encryption. These files will have dirty data due to concurrent checkpoint thread writes.
Collect all dirty pages related to ongoing checkpoint process and corresponding partition files and apply them (with re-encryption) to the copied file right after the copy process ends.
When local re-encryption of all required cache groups completes - send message that this phase is finished on this node (in other words, distributed process "prepare" is finished on local node).
Continue to collect and apply dirty pages encrypted with the new key to copied partition until "finish" phase is started.

Finish changing the encryption key

After completion of the key change preparation process, a new distributed process is initiated to complete the key change.

The distributed process configured action initiates partition swapping on each node (this action may require suspension of local or global operations if WAL records can be reordered during key change).

Acquire checkpoint lock.
Swap all partition files:
1. Backup original file.
2. Move re-encrypted file to the place of the original one.
Change encryption key(s) in metastore (update encryption keys history - add new key and set current WAL pointer to previous key).
Cancel checkpoint updates for copied partitions.
Release checkpoint lock.
Force checkpoint
Remove partition backups (2a).

WAL

After changing the encryption key, new WAL records will be encrypted with the new key. However, it must be possible to read older WAL records (at least to support historical rebalance).

reference documentsFor each cache, instead of a key, it is necessary to keep a history of keys in the form WALPointer -> key
(stored the maximum pointer for which the associated key is applicable).

When removing a WAL segment to which WALPointer(s) refers - key(s) should be also removed.
When the WAL is cleared, respectively, the key history must also be cleared (except the last one).

Recovery

The re-encryption procedure does not start if there are LOST partitions in the cache group or any baseline node is missing (this is a limitation of the initial design and should be improved in the future).
The cache stop operation is rejected, for cache groups in which re-encryption is performed.

By canceling the re-encryption procedure is meant clearing all temporary data.

If a node crashes during the replacement of the partitions, the original backup copies of the partitions are restored when the node starts.
If major topology changes during key rotation - cancelling whole procedure.
Minor topology changes should not affect re-encryption procedure.
If the partition is scheduled for eviction during re-encryption, cancel the re-encryption of this partition.

Process management

TBD

Public API changes

IgniteEncryption

New method will be introduced

public IgniteFuture<Void> changeGroupKey(Collection<Integer> groups) throws IgniteCheckedException;

Monitoring

Re-encryption process state.

...

may affect performance. Performance impact can be managed using the following configuration options:
1. reencryptionBatchSize - number of pages that are scanned during re-encryption under checkpoint lock.
2. reencryptionRateLimit - page scanning speed limit in megabytes per second.
3. ~~reencryptionThreadCnt - number of threads used for re-encryption~~(?)~~.~~
The WAL history can be not enough to store all entries between checkpoints (this should be carefully tuned by properly setting the size of the WAL history and tuning the re-encryption performance).
The WAL history (for delta rebalancing) may be lost for all cache groups due to background re-encryption.

Public API changes

IgniteEncryption

New method will be introduced

public IgniteFuture<Void> changeCacheGroupKey(Collection<String> cacheOrGroupNames)

Metrics

Re-encryption process state in CacheGroupMetrics

ReencryptionPagesLeft - (long) Total pages left for reencryption.
ReencryptionFinished - (boolean) Indicates whether re-encryption is finished or not (it will set to true only when a checkpoint is finished).

Process management

The following commands should be added to the control.sh utility:

Rotate encryption key.

Code Block

language	text
title	command syntax

control.(sh|bat) --encryption change_cache_key cacheGroupName --yes

Code Block

language	text
title	command output

The encryption key has been changed for cache group "default".

View encryption key identifiers.

Code Block

title	command syntax

control.(sh|bat) --encryption cache_key_ids cacheGroupName

Code Block

language	text
title	command output

Encryption key identifiers for cache: default
  Node 6085d500-2736-4c1f-b47c-444cf0a00000:
    1 (active)
    0
  Node d98654c0-6dfb-4996-993e-387156300001:
    1 (active)
    0

View cache group re-encryption status.

Code Block

language	text
title	command syntax

control.(sh|bat) --encryption reencryption_status cacheGroupName

Code Block

language	text
title	command output

  Node 4ed26231-f92d-4b1c-86ba-7a117c200001:
    1552 KB of data left for re-encryption
  Node 89a456e5-59c5-4f13-a75b-39ab25000000:
    1552 KB of data left for re-encryption

Suspend cache group re-encryption.

Code Block

language	text
title	command syntax

control.(sh|bat) --encryption suspend_reencryption cacheGroupName

Code Block

language	text
title	command output

  Node ad1328e7-11e0-4ecb-8ef2-066519e00001:
    re-encryption of the cache group "default" has been suspended.
  Node 2a9e291f-e2d1-46e3-9954-18deb0e00000:
    re-encryption of the cache group "default" has been suspended.

Resume cache group re-encryption.

Code Block

language	text
title	command syntax

control.(sh|bat) --encryption resume_reencryption cacheGroupName

Code Block

language	text
title	command output

  Node 2ed43509-caab-48dc-a27d-3be65d800000:
    re-encryption of the cache group "default" has been resumed.
  Node b52d6451-a948-48d5-b79a-411956700001:
    re-encryption of the cache group "default" has been resumed.

View/change re-encryption rate limit.

Code Block

language	text
title	command syntax

control.(sh|bat) --encryption reencryption_rate [limit]

Parameters:
    limit  - decimal value to change rate limit (MB/s)

Code Block

language	text
title	command output

  Node 15cb8485-0c09-4361-b267-107d38400000:
    re-encryption rate has been limited to 0.01 MB/s.
  Node 909ed414-22e6-477b-b2ca-d1934cd00001:
    re-encryption rate has been limited to 0.01 MB/s.

...

Reference Links

PCI DSS Requirements and Security Assessment Procedures
https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
How Often Do I Need to Rotate Encryption Keys on My SQL Server?
https://info.townsendsecurity.com/bid/49019/How-Often-Do-I-Need-to-Rotate-Encryption-Keys-on-My-SQL-Server
PCI DSS and key rotations simplified
https://www.crypteron.com/blog/pci-dss-key-rotations-simplified/
Transparent Data Encryption in MS SQL Server
https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/transparent-data-encryption?view=sql-server-ver15
Oracle Transparent Data Encryption FAQ
https://www.oracle.com/database/technologies/faq-tde.html
InnoDB Data-at-Rest Encryption
https://dev.mysql.com/doc/refman/8.0/en/innodb-data-encryption.html
Transparent data encryption feature proposed in pgsql-hackers.
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Key_Rotation

...

Jira

server	ASF JIRA
columns	key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
maximumIssues	20
jqlQuery	project = Ignite AND labels in (tde-phase-3) ORDER BY status
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b	key	IGNITE-12843

Page tree

Page History

Versions Compared

Old Version 43

New Version Current

Key

Partition re-encryption strategies

Copy with re-encryption.

In place re-encryption

Comparison

Description

In place re-encryption design.

Background re-encryption

Cleanup old key

Fault tolerance

Changes in memory page format

PageMetaIO and PagePartitionMetaIO

Encrypted (persisted) page

Fault tolerance

Distributed key rotation

Background re-encryption

Risks and assumptions

Prepare changing the encryption key

Local re-encryption task

Finish changing the encryption key

WAL

Recovery

Process management

Public API changes

IgniteEncryption

Monitoring

Public API changes

IgniteEncryption

Metrics

Process management

Rotate encryption key.

View encryption key identifiers.

View cache group re-encryption status.

Suspend cache group re-encryption.

Resume cache group re-encryption.

View/change re-encryption rate limit.

Reference Links