Motivation

Master key rotation is required if it has been compromised or at the end of the crypto period (key validity period). 

Design assumes, an administrator will provide an ability to get a new master key by EncryptionSPI from underlying storage.

Definitions

  • MK – Master KeyEncrypts group keys. Master key is stored in some key storage. Master keys are identified by name.

Prerequisites

New master key must be available to EncryptionSPI for each server node. The cluster must be active.

Process management

Users can control the master key rotation process via  some user interface(CLI, JMX, Java API). 

  • JAVA API:
    ignite.encryption().changeMasterKey(String masterKeyName)  - starts master key rotation process.
    String ignite.encryption().getMasterKeyName()  - gets current master key name.

  • JMX:
    changeMasterKey(String masterKeyName)  - starts master key rotation process.
    String getMasterKeyName()   - gets current master key name.

  • CLI:
    # Starts master key rotation.
    control.sh --encryption change_master_key newMasterKeyName

    # Displays cluster's current master key name.
    control.sh --encryption get_master_key_name

Process description

Master key change process consist of two phases:

  1. Prepare master key change.
  2. Perform master key change.

Each phase is a distributed process.

Prepare master key change

The goal is to verify that all server nodes have the same master key. The server node begins prepare phase with the MasterKeyChangeRequest that contains:

  1. New master key name.
  2. New master key digest.

Each server node executes the following actions:

  1. Obtains a digest of a new master key. If the digest is unavailable, the process completes with an error.

  2. Compares it with the one in the request.
  3. If it differs then the process completes with an error checking the consistency of the master key digest.
  4. Stores locally master key name and digest.

The coordinator starts the perform phase when the prepare phase is completed without errors.

Perform master key change

The coordinator node starts the prepare phase with the MasterKeyChangeRequest that contains:

  1. New master key name.
  2. New master key digest.

Each server node executes the following actions:

  1. It checks that the cluster is active (WAL must be writable to correctly log changes and survive cluster restarts). Otherwise, the process completes with the error.
  2. Checks that master key name and digest is the same as it was taken from the prepare phase. Otherwice, log it and cancel the process.
  3. Blocks creation of encrypted group keys.
  4. Re-encrypts all cache group keys with new master key in a temporary datastructure. No changes in MetaStore.
  5. Creates WAL logical record (ChangeMasterKeyRecord ) that consist of:
    1. New master key name.
    2. Reenctyped cache group keys.
  6. Writes cache group keys to MetaStore.
  7. Unblocks creation of encrypted group keys.

Distributed process

Distributed process is a cluster-wide process that accumulates single nodes results to finish itself.

The process consists of the following phases:

  1. The initial request starts the process. The InitMessage sent via discovery.
  2. Each server node processes the initial request and sends the single node result to the coordinator. The SingleNodeMessage sent via communication.
  3. The coordinator accumulate all single nodes results and completes process. The FullMessage sent via discovery.

Several processes of the same type can be started at the same time.

Guarantees:

  • Survives on topology and coordinator change (the SingleNodeMessage with a result will be redirected to the new one).
  • The exec and the finish actions will be called only ones.

Process completion

The process completes when the perform phase completed (all nodes have been re-encrypted their keys).

Corner cases

Node was down during key rotation. MasterKeyChangeRecord not found.

If some node was unavailable during master key rotation process it will unable to join the cluster because it has old master key.

To update this node user should run Ignite with system property (IGNITE_MASTER_KEY_NAME_TO_CHANGE_BEFORE_STARTUP=newMasterKeyName)

The node will re-encrypt cache keys with new MK and try to join the cluster.

Node was down during key rotation. MasterKeyChangeRecord found.

A node should not try to join to the cluster before the process of ChangeMasterKeyRecord. Regardless of whether the key rotation was finished successfully or not, the recovery will  be from the record.

  1. If we found ChangeMasterKeyRecord  in the process node recovery it was passed to EncryptionManager .
  2. When MetaStore becomes writable, EncryptionManager  writes new cache group keys to it.

Node join during key rotation process

Reject node join. It may lead to inconsistent master keys in cluster.

Starting cache during key rotation process

Cache keys must not be created during the master key rotation process. So, a node will throw an exception if a users will start cache during the key rotation process. Moreover, if group keys were generated before the master key was change, starting the cache will be rejected (case of client node starts the cache).

Node couldn’t complete the perform phase

Node will process the critical failure error. Failure handler must stop the node to prevent inconsistent keys in the cluster.

Public java API changes

EncryptionSpi

The concept of the masterKeyId will be added to the cache keys encryption process in EncryptionSpi :

New methods will be introduced:

  • setMasterKeyName(String masterKeyName)  // Sets "current" master key name
  • String getMasterKeyName()  // Gets "current" master key name

The following methods will work with master key that was set by previous method:

  • byte[] masterKeyDigest() 
  • byte[] encryptKey(Serializable key) 
  • Serializable decryptKey(byte[] key) 

This is necessary so that Ignite can decrypt cache keys with the old master key and encrypt with the new one.

Code changes

Meta Storage

Meta storage will store name of master key. Key name from meta storage has a higher priority to key name from EncryptionSpi .

Node attribute

Currently, the joining node sends hash MK for validation in attributes. Attributes can't be modified at runtime. So the joining node will send hash MK in JoiningNodeDiscoveryData .

Tickets

Unable to render Jira issues macro, execution error.

  • No labels