You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 61 Next »

Motivation

Cache encryption key rotation required in case of it compromising or at the end of crypto period (key validity period). in addition, such feature is required to provide support for encrypt and descrypt existing caches in the future.

Security requirement

Payment card industry data security standard (PCI DSS) requires that key-management procedures include a defined cryptoperiod for each key type in use and define a process for key changes at the end of the defined cryptoperiod(s). An expired key should not be used to encrypt new data, but it can be used for archived data, such keys should be strongly protected (section 3.5 - 3.6) [1].
The maximum recommended key lifetime is 2 years [2], and on average it is supposed to be changed every few months [3].

Key rotation in other systems

MS SQL Server provide rotation of database encryption key with background re-encryption of existing data [4]. Oracle and MySQL, out of the box, do not provide an automatic procedure for rotating tablespace keys, master key rotation is supported [5][6], Currently, TDE is being developed for PostgreSQL, but support for tablespace key rotation is not planned [7].

Description

The overall process consists of the following steps

  • Rotate cache group key - add new encryption key on each node and set it for writing.
  • Schedule background re-encryption for archived data and cleanup the old key when it completes.

Process description

To support multiple keys for reading encrypted data it is required to store key identifier on each encrypted page and on each encrypted WAL record. The key identifier is a sequential counter, and should be the same on all nodes.

  1. Check that all baseline nodes online.
  2. Start distributed process CACHE_GROUP_KEY_CHANGE_PREPARE, each node
    1. verifies that re-encryption not in progress
    2. ensures that new key identifier does not exists
    3. adds new key
  3. After successful completion of PREPARE, start distributed process CACHE_GROUP_KEY_CHANGE_FINISH, each node
    1. saves logical WAL record (ENCRYPTION_STATUS_RECORD) with current page count in partitions.
    2. stores current page count as total pages for background re-encryption on partitions.
    3. adds the mapping "WAL segment -> *old* key identifier" (to safely cleanup this key in the future)
    4. sets new key for writing
    5. starts background re-encryption

Background re-encryption

Process applies for all existing partitions including index.

Scan all pages from specified range (metaPageId + [offset -> total])

  1. acquire page
    1. if checkpoint is finished (after key change) and page is dirty - skip this page.
    2. if checkpoint is not finished or page is not dirty
      1. lock page
      2. unlock page (dirty=true)
  2. release page

Re-encryption progress is stored into metapage (int offset, int total), updates during checkpoint.

The process aborts only when partition is destroyed.

Cleanup old key

Old group key will be removed when

  1. re-encryption completed for cache group (and after that at least one checkpoint was completed)
  2. last WAL segment in which the key was used is removed

Fault tolerance

If CACHE_GROUP_KEY_CHANGE_PREPARE has not been successfully completed on all nodes, the process is interrupted and must be restarted.
When the process restarts, a new key identifier is generated (an unused key will be overwritten).

It is possible that the node will fail after adding a new key, but before setting it for writing (as an active key).
This node doesn't know whether the PREPARE phase was successful or not, therefore, it does not know which key is currently being used for writing.
By default, it will try to rejoin with the old key, if the join is rejected, then it should be possible to manually set the correct key identifier using the system property or command line tools.

When a non-baseline node joins a cluster (with baseline change), it cleans up all existing data, so this shouldn't be a problem case.

If the node stops/fails during re-encryption, after restarting it must continue re-encryption from the stored offset:

  1. If checkpoint failed it should restore physical records from WAL, as usual).
  2. If checkpoint was not invoked reencryption is started from the beginning using saved logical WAL record.

Risks and assumptions

  • Background re-encryption may affect performance. Performance impact can be managed using following properties:
    1. IGNITE_REENCRYPTION_THREAD_POOL_SIZE - number of threads used for reencryption.
    2. IGNITE_REENCRYPTION_BATCH_SIZE - number of pages that is scanned during reencryption under checkpoint lock.
    3. IGNITE_REENCRYPTION_THROTTLE - delay in milliseconds between batches during a partition scanning.
  • The WAL history can be not enough to store all entries between checkpoints (this should be carefully tuned by properly setting the size of the WAL history and tuning the re-encryption performance).
  • The WAL history (for delta rebalancing) may be lost for all cache groups due to background re-encryption.

Process management

// TBD

Public API changes

IgniteEncryption

New method will be introduced

public IgniteFuture<Void> changeCacheGroupKey(Collection<String> cacheOrGroupNames)

Metrics

Re-encryption process state in CacheGroupMetrics

  • ReencryptionPagesLeft - (long) Total pages left for reencryption.
  • ReencryptionFinished - (boolean) Indicates whether reencryption is finished or not (it will set to true only when checkpoint is finished).

Reference Links

  1. PCI DSS Requirements and Security Assessment Procedures
    https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
  2. How Often Do I Need to Rotate Encryption Keys on My SQL Server?
    https://info.townsendsecurity.com/bid/49019/How-Often-Do-I-Need-to-Rotate-Encryption-Keys-on-My-SQL-Server
  3. PCI DSS and key rotations simplified
    https://www.crypteron.com/blog/pci-dss-key-rotations-simplified/
  4. Transparent Data Encryption in MS SQL Server
    https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/transparent-data-encryption?view=sql-server-ver15
  5. Oracle Transparent Data Encryption FAQ
    https://www.oracle.com/database/technologies/faq-tde.html
  6. InnoDB Data-at-Rest Encryption
    https://dev.mysql.com/doc/refman/8.0/en/innodb-data-encryption.html
  7. Transparent data encryption feature proposed in pgsql-hackers.
    https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Key_Rotation

Tickets

Unable to render Jira issues macro, execution error.

  • No labels