Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Motivation

With the growing amount of information stored, the importance of stored data to individuals, companies and governments is ever increasing. In addition to traditional storage criteria such as performance, capacity and reliability, security is also becoming an important feature of storage systems.and there are some industries that have mandatory legal requirements for it, such as bank, transportation, etc.

To use of encrypted data has the following benefits:

  1. Confidentiality: Protection against unauthorized information leakage
  2. Integrity: Protection against unauthorized modification of data.
  3. Authenticity: Protection against unauthorized access to data.

Other data lake frameworks (like hudi, iceberg) already support data encryption capability.:

  1. hudi : Hudi already supports writing encrypted parquet files using spark. [1]
  2. iceberg : iceberg provide the basic encryption interface so that user can add custom implemention.
  • EncryptionManager : Module for encrypting and decrypting table data files[2]
  • KeyManagementClient : A minimum client interface to connect to a key management service (KMS) [3]

Hudi and iceberg also have some shortcomings. Hudi relies on Spark and Parquet, and cannot encrypt data in ORC format; Iceberg only provides basic interfaces without corresponding implementation classes.

As a data lake framework, it is very important for paimon in supporting data encryption to meet enterprise security standards. This document describes how to extend the current paimon architecture to provide users with out-of-the-box encryption capabilities.


Goals

Pluggable KMS

The key management service (KMS) is pluggable in the system,

Pluggable Encryption Mechanism

The encryption mechanism is pluggable in the system, system provide envelope encryption and plaintext mechanism by default, user can extend it base on their needs.

High Performance

Minimize interaction between paimon and KMS as much as possible to improve throughput

Column level encryption

Using Parquet and ORC native encryption features

Public Interfaces


KmsClient

...

3. Symmetric-key algorithms
    Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both the encryption of plaintext and the decryption of ciphertext.[4]

4.  Data Encryption Key (DEK)
    A key used to encrypt the data file. In this design, every data file has has a unique DEK.

...

6.  Encryption algorithm
    *   AES GCM
        AES GCM is an authenticated encryption. Besides the data confidentiality (encryption), it supports two levels of integrity verification (authentication): of the data (default),and of the data combined with an optional AAD (“additional authenticated data”). The authentication allows to make sure the data has not been tampered with.
    *   AES CTR
        AES CTR is a regular (not authenticated) cipher. It is faster than the GCM cipher, since it doesn’t perform integrity verification.

Design

Encryption level

The first version ,we will simplify the design, currently only encrypting data files in parquet and orc formats, without encrypting metadata files and avro data file, and using the same key to encrypt the entire data file. In the future, we will continue to optimize and add new features, such as encrypting avro data files, encrypting metadata, and using multiple keys to encrypt the footer and columns of parquet files separately, Or use different algorithms for different columns.

Encryption Manager

Provide a basic module for encrypt and decrypt table data files

...

Provide test case in the paimon-micro-benchmarks module to verify the read and write performance after encryption.

Rejected Alternatives



[1].https://hudi.apache.org/docs/encryption/
[2].https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/api/src/main/java/org/apache/iceberg/encryption/EncryptionManager.java#L32
[3].https://github.com/apache/iceberg/blob/1e57760394583889f2cb7fb87d021471e8c46f0c/core/src/main/java/org/apache/iceberg/encryption/KeyManagementClient.java#L27
[4].https://en.wikipedia.org/wiki/Symmetric-key_algorithm

.