Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current state:  Under Discussion

Discussion thread: here

Voting thread: here

JIRA: KAFKA-15444

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

...

Native binaries operate independently and do not require specific packages to run. Consequently, opting for the most minimal base images will enable us to produce compact Docker images.
We propose to make use of alpine image as the base image.

Image Naming

Image naming should:

  1. Transparently communicate the packaged Kafka version.

  2. Maintain the above point in the event of CVEs/bugs requiring a dedicated Docker release.

Adhering to the outlined constraints, image tagging can follow this format
<image-name>:<kafka-version>-<optional-suffix>

  • [Preferred Approach]kafka-native:3.5.1

    • native indicates that the image consists of the native binary.

    • For many users, native might not make much sense.

    • kafka-native:3.5.1-1 In case of docker dedicated release, added a suffix -1

Directory Structure

A new directory named docker will be added to the repository. This directory will contain all the Docker related code.
Directory Structure:

kafka/
    - docker/

        - native-image/
            - Dockerfile          #Dockerfile for the GraalVM native-image based Apache Kafka Docker image.
        - jvm/
            - Dockerfile          #Dockerfile for the JVM-based Apache Kafka Docker image.
        - resources/              #Contains resources needed to create the Docker image.
        - test/                   #Contains sanity tests for the Docker image.
        - docker_build_test.py    #Python script for building and testing the Docker image.
        - docker_release.py       #Python script for building the Docker image and pushing it to Docker Hub.

NOTE: This structure is designed with the anticipation of introducing another Docker image based on the native Apache Kafka Broker (as per KIP-975). Both images will share the same resources for image building. 

Configuring Properties

We offer two methods for passing the above properties to the container:

  1. File Mounting: Users can mount a properties file to a specific path within the container (we will clearly document this path). This file will then be utilized to start up Kafka.

  2. Using Environment Variables: Alternatively, users have the option to provide configurations via environment variables. Here's how to structure these variables:

    • Replace . with _
    • Replace _ with __(double underscore)
    • Replace - with ___(triple underscore)
    • Prefix the result with KAFKA_

    Examples:

    • For abc.def, use KAFKA_ABC_DEF
    • For abc-def, use KAFKA_ABC___DEF
    • For abc_def, use KAFKA_ABC__DEF

This way, you have flexibility in how you pass configurations to the container, making it more adaptable to various user preferences and requirements.
NOTE:

  1. Secrets will be provided to the container using folder mount.
  2. If a property is provided both in the mounted file and as an environment variable, the value from the environment variable will take precedence.

Compatibility, Deprecation, and Migration Plan

  • For existing apache kafka users there will be no impact as native-image based kafka docker image will be a new feature.
  • The GraalVM native-image based Apache Kafka docker image will be an experimental docker image.
  • Unlike JVM, GraalVM native-image performs ahead-of-time compilation and does not support dynamic class loading. It requires extensive testing to understand the total broker functionality support and performance through GraalVM native-image. The GraalVM native-image based container is recommended only for development, and testing and not for production workloads.
  • For docker image catering production workloads refer the KIP-975.

Test Plan

GraalVM based Apache Kafka Image is an experimental docker image for local development and testing usage. GraalVM Native-Image tool is still in maturing stage, hence the usage of this image for production can’t be recommended.
Testing of the Docker Image: Sanity Tests for the P0 functionalities like Image coming up, topics creation, producing, consuming, restart etc will be added

Build, Test and Scanning Pipeline

This section will be same as mentioned for the JVM Docker Image in KIP-975

Build and Test

Prior to release, the Docker images must undergo building, testing, and vulnerability scanning. To streamline this process, we'll be setting up a GitHub Actions workflow. This workflow will generate two reports: one for test results and another for scanning results. These reports will be available for community review before voting.

Scanning Previously Released Images

We intend to setup a nightly cron job using GitHub Actions and leverage an open-source vulnerability scanning tool like trivy (https://github.com/aquasecurity/trivy), to get vulnerability reports on all supported images. This tool offers a straightforward way to integrate vulnerability checks directly into our GitHub Actions workflow. 

Release Process

Following are the 2 ways to introduce Docker image:

While Alpine images offer a lightweight solution, contributing to a smaller Docker image size, there are certain considerations to bear in mind

  • Alpine uses musl libc, but for native image compatibility, we require glibc. To address this, we'll need to install gcompat.
  • Alpine uses an older shell instead of bash, necessitating the installation of bash to run our helper scripts.
  • Alpine employs the apk package manager, which, being relatively less popular, may pose challenges in the future. There's a potential risk that certain libraries we might need could lack support from apk.

Alpine vs Ubuntu Docker Base Image

The next best option I explored is the Ubuntu Docker image( https://hub.docker.com/_/ubuntu/tags) which is a more complete image.

  • Size: It has a size of 70MB compared to the 15MB of the Alpine image (post-installation of glibc and bash), resulting in a difference of 55MB.
  • Performance: I executed produce/consume performance scripts on the Kafka native Docker image using both Alpine and Ubuntu, and the results indicated comparable performance between the two.

Image Naming

Image naming should:

  1. Transparently communicate the packaged Kafka version.

  2. Maintain the above point in the event of CVEs/bugs requiring a dedicated Docker release.

Adhering to the outlined constraints, image tagging can follow this format
<image-name>:<kafka-version>

  • kafka-native:3.7.0

    • Name of the image: kafka-native
      For example, for 3.7.0 version of kafka, the image name with tagging would be apache/kafka-native:3.7.0
    • native indicates that the image consists of the native binary.

NOTE: The JVM based Apache Kafka docker image will be named as apache/kafka:<version>

Directory Structure

A new directory named docker will be added to the repository. This directory will contain all the Docker related code.
Directory Structure:

kafka/
    - docker/

        - native-image/
            - Dockerfile          #Dockerfile for the GraalVM native-image based Apache Kafka Docker image.
        - jvm/
            - Dockerfile          #Dockerfile for the JVM-based Apache Kafka Docker image.
        - resources/              #Contains resources needed to create the Docker image.
        - test/                   #Contains sanity tests for the Docker image.
        - docker_build_test.py    #Python script for building and testing the Docker image.
        - docker_release.py       #Python script for building the Docker image and pushing it to Docker Hub.

NOTE: This structure is designed with the anticipation of introducing another Docker image based on the native Apache Kafka Broker (as per KIP-975). Both images will share the same resources for image building. 

Configuring Properties

We offer two methods for passing the above properties to the container:

  1. File Mounting: Users can mount a properties file to a specific path within the container (we will clearly document this path). This file will then be utilized to start up Kafka.

  2. Using Environment Variables: Alternatively, users have the option to provide configurations via environment variables. Here's how to structure these variables:

    • Replace . with _
    • Replace _ with __(double underscore)
    • Replace - with ___(triple underscore)
    • Prefix the result with KAFKA_

    Examples:

    • For abc.def, use KAFKA_ABC_DEF
    • For abc-def, use KAFKA_ABC___DEF
    • For abc_def, use KAFKA_ABC__DEF

This way, you have flexibility in how you pass configurations to the container, making it more adaptable to various user preferences and requirements.
NOTE:

  1. Secrets will be provided to the container using folder mount.
  2. If a property is provided both in the mounted file and as an environment variable, the value from the environment variable will take precedence.

Compatibility, Deprecation, and Migration Plan

  • For existing apache kafka users there will be no impact as native-image based kafka docker image will be a new feature.
  • The GraalVM native-image based Apache Kafka docker image will be an experimental docker image.
  • Unlike JVM, GraalVM native-image performs ahead-of-time compilation and does not support dynamic class loading. It requires extensive testing to understand the total broker functionality support and performance through GraalVM native-image. The GraalVM native-image based container is recommended only for development, and testing and not for production workloads.
  • For docker image catering production workloads refer the KIP-975.

Test Plan

GraalVM based Apache Kafka Image is an experimental docker image for local development and testing usage. GraalVM Native-Image tool is still in maturing stage, hence the usage of this image for production can’t be recommended.
Testing of the Docker Image: Sanity Tests for the P0 functionalities like Image coming up, topics creation, producing, consuming, restart etc will be added. We will also try to run the existing system tests on the built Apache Kafka native executable.

Build, Test and Scanning Pipeline

This section will be same as mentioned for the JVM Docker Image in KIP-975 build and test pipeline.

Build and Test

Prior to release, the Docker images must undergo building, testing, and vulnerability scanning. To streamline this process, we'll be setting up a GitHub Actions workflow. This workflow will generate two reports: one for test results and another for scanning results. These reports will be available for community review before voting.

Scanning Previously Released Images

We intend to setup a nightly cron job using GitHub Actions and leverage an open-source vulnerability scanning tool like trivy (https://github.com/aquasecurity/trivy), to get vulnerability reports on all supported images. This tool offers a straightforward way to integrate vulnerability checks directly into our GitHub Actions workflow. 

Release Process

Following is the plan to release the Docker image:

  1. RM would have generated and pushed Apache Kafka's Release Candidate artifacts to apache sftp server hosted in Image Addedhome.apache.org by release.py script
  2. Run the automation to build the docker image(using the above Release Candidate tarball URL) and test the image.
  3. The docker image needs to be pushed to some Dockerhub repo(eg. Release Manager's) for the evaluation of RC Docker image.

  4. Start the Voting for RC, which will include the Docker image as well as docker sanity tests report.

  5. In case any docker image specific issue is detected, that will be evaluated by the community, if it’s a release blocker or not.

  6. Once the vote passes, the image will be pushed to apache/kafka-native with the version as tag.

  7. Steps for the Docker image release will be included in the Release Process doc of Apache Kafka

  8. eg. for AK release 3.7.0 and image released will be apache/kafka-native:3.7.0 (=> image contains AK 3.7.0)
  9. Docker image release during AK release

    1. RM would have generated and pushed Apache Kafka's Release Candidate artifacts to apache sftp server hosted in Image Removedhome.apache.org by release.py script
    2. Run the script to build the docker image(using the above Release Candidate tarball URL) and test the image locally.
    3. The docker image needs to be pushed to some Dockerhub repo(eg. Release Manager's) for the evaluation of RC Docker image.

    4. Start the Voting for RC, which will include the Docker image as well as docker sanity tests report.

    5. In case any docker image specific issue is detected, that will be evaluated by the community, if it’s a release blocker or not.

    6. Once the vote passes, the image will be pushed to apache/kafka with the version as tag.

    7. Steps for the Docker image release will be included in the Release Process doc of Apache Kafka

  10. Docker Image release post AK Release

  11. This step will be followed in case only Docker Image need to be released(eg CVE in the base image).
  12. Execute the script to build the docker image(using the already publicly released AK tarball URL) and test the image locally.

  13. Once the Docker image artifact is ready, it will get reviewed by the community and voting will be conducted, just for the Docker image release.

  14. This image will then be pushed to apache/kafka  with proper tagging to communicate kafka version.

Ownership of the Docker Images' Release

...