Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Motivation

Current official Airflow image is rebuilt from the scratch every time new commit is done to the repo. It is a "mono-layered" one and does not use Docker's multi-layer architecture nor multi-stage Docker architecture.

Mono-layered image means that builds after only small changes take as long as full build rather than utilise caching and only rebuild what's needed.

With multi-layered approach and caching enabled in Docker Hub we can optimise it to download only the layers that changed. This enables the users using the images to download only incremental changes, and opens up a number of options how such incremental build/download process can be utilised:

  • Multi-layered images can be used as based for AIP-7 Simplified development workflow - where locally downloaded images are used during development and they are incrementally updated quickly during development with newly added dependencies.
  • Multi-layered images being part of the "airflow" project can be used to run Travis CI integration tests (simplifying the idea described in Optimizing Docker Image Workflow ). Having incremental builds will allow DockerHub registry to be used as source for base images (pulled before build) to build locally final image used for test execution in an incremental way.
  • Why initially the images are not meant to be used in production, using multi-staging, variable arguments and multiple layers to produce production-ready Airflow image that can be used to pre-bake Dags into the image - thus making Airflow closer to be Kubernetes-native. This has been discussed as potential future improvement in  AIP-12 Persist DAG into DB
  • Ideally both Airflow and CI images should be maintained in single place - "source of truth" to ease maintenance and development. Currently they are maintained in separate repositories and have potentially different dependencies and build process. It also makes it difficult to add your own dependencies during development as there is no regular/development friendly process to update CI image with new dependencies. 

Considerations

In the PR : https://github.com/apache/airflow/pull/4543 the current mono-layered docker has been implemented as multi-layered one. The PR uses "hooks/build" hook that is used by DockerHub build process to control caching and build process. Thanks to that we can build different variants of the images (Main - slim - airflow image, CI image with more dependencies, Wheel cache image for efficient caching of PIP dependencies).

Basic assumptions

  • There are two images:
    • "Airflow" image - slim image with only necessary Airflow dependencies
    • "CI" image - fat image with additional dependencies necessary for CI tests
  • there are separate images for each python version (currently 2.7, 3.5, 3.6)
  • each image uses python-x.y-slim as a base
  • all stages are defined in single multi-stage Dockerfile
  • it's possible to build main airflow image by issuing "docker build ." command. It's not optimised for DockerHub cache reuse but it will build locally.
  • hook/build script can build the image utilising DockerHub cache - pulling the images from registry and using as cache
  • binary/apt dependencies are build as separate stages - so that we can use whole cached images with main/CI dependencies as cache source
  • the builds are versioned - airflow 2.0.0.dev0 images are different than airflow 2.0.1dev0

Changes that trigger rebuilds

Those changes below are described starting from the most frequent ones - so staring backwards from the end of Dockerfile, going up to the beginning.

  • apt and pip dependencies: they are "upgraded" as last part of the build (after sources are added) - thus upgrade to latest versions available is triggered every time sources change (utilising cache from previous installations).
  • source changes do not invalidate previously installed packages from apt/pip/npm. They trigger upgrades to pip/apt package as explained above.
  • changing to www sources trigger pre-compiling the web page for production (npm run prod) and everything above.
  • changing package.json or package-lock.json trigger reinstallation of all npm packages (npm ci) and everything above.
  • changing any of setup.py-related files trigger reinstallation of all pip packages. In case of CI build, previously compiled wheel packages from wheel image are used to install the dependencies (saving time for downloading and compilation of packages) and everything above.
  • changing the wheel cache causes everything above
  • for CI build, changing CI apt dependencies triggers reinstallation of those dependencies and everything above
  • changing Airflow apt dependencies triggers reinstallation of those dependencies and everything above
  • there is a possibility to trigger whole build process by changing one line in Dockerfile (FORCE_REINSTALL_ALL_DEPENDENCIES)
  • new python stable image triggers rebuild of the whole image

Stages of the image

Those are the stages of the image that we have defined in Dockerfile

  • X.Y - python version (2.7, 3.5 or 3.6 currently)
  • VERSION - airflow version (v2.0.0.dev0)
No.StageDescriptionLabels in DockerHub

Airflow build

depsdependencies

CI build

depsdependencies

1PythonBase python imagepython-X.Y-slim--
2ariflow-apt-depsVital Airflow apt dependencieslatest-X.Y-apt-deps-VERSION11
3airflow-ci-apt-depsAdditional CI image dependencieslatest-X.Y-ci-apt-deps-VERSION[Not used]2
4wheel-cache-previousmasterPreviously build Master wheel cache build on DockerHub from latest master for faster PIP installslatest-X.Y-wheelcache-VERSION[Not used]3
5wheel-cacheCurrently build wheel cache (for future builds)latest-X.Y-wheelcache-VERSION[Not used]3
6mainMain airflow sources build. Used for both Airflow and CI build

Airflow builds:

  • latest-X.Y (only latest version)
  • latest-X.Y-VERSION

CI builds:

  • latest-X.Y-ci (only newest version)
  • latest-X.Y-ci-VERSION
2

2 - image

4 - /cache folder with wheels

Dependencies between stages

Effectively those images we create have those dependencies:. In case of Dockerfile changes, Docker multi-staging mechanism takes care about rebuilding only those stages that need to be rebuild in case of Dockerfile definition change - changes in a stage trigger rebuilds only in stages that depend on it.

draw.io Diagram
bordertrue
viewerToolbartrue
fitWindowfalse
diagramNameStage dependencies
simpleViewerfalse
width
diagramWidth1082
revision1
TODO:
5

Layers in the main

build
  • Different build types
  • image

    The main image has a number of layers, that make the image rebuilds incrementally depending on changes in the repository vs. the previous build. Mechanism of Docker build (context/cache invalidation) are used to determine if the subsequent layers should be invalidated and rebuild.

    No.LayerDescriptionTrigger for rebuildAirflow build behaviourCI build behaviour
    1Wheel cache master

    /cache folder with cached wheels from previous build

    Rebuild of the wheelcache source.Empty wheel cache used to minimise size of the imageWheel cache build in latest DockerHub "master" image used.
    2PIP configurationSetup.py and related files (version.py etc.)Updated dependencies for PIPCopy setup.py related files to contextCopy setup.py related files to context
    3PIP installPIP installationPrevious layer changeAll PIP dependencies downloaded and installedPIP dependencies installed from wheel cache - new dependencies downloaded and installed
    4NPM package configurationpackage.json and package-lock.sonUpdated dependencies for NPMCopy package files to contextCopy package files to context
    5npm ciInstalls locked dependencies from NPMPrevious layer changeAll NPM dependencies downloaded and installedAll NPM dependencies downloaded and installed
    6www filesairflow/www all filesUpdated any of the www filesCopy www files to contextCopy www files to context
    7npm run prodPrepares production javascript packaging for webserverPrevious layer changeJavascript preparedPackages prepared
    8airflow sourcesCopy all sources to contextAny change in sourcesCopy sources to contextCopy sources to context
    9apt-get upgradeUpgrading apt dependenciesPrevious layer changeAll apt packages upgraded to latest stable versionsAll apt packages upgraded to latest stable versions
    10pip installReinstalling PIP dependenciesPrevious layer changePip packages are potentially upgradedAll PIP packages are upgraded

    The results of such layer structure are the following behaviours:

    • in case wheel image is changed: PIP packages + NPM packages + NPM compile + sources are reinstalled for CI build (nothing changes for Airflow build)
    • in case PIP configuration is changed: PIP packages + NPM packages + NPM compile + sources are reinstalled. For Airflow build, all PIP packages are downloaded and installed, for CI build Wheel cache is used as base for installation (faster)
    • in case NPM configuration is changed: NPM packages + NPM compile + sources are reinstalled
    • in case any of WWW files changed: NPM compile + sources are reinstalled
    • in case of any source change: sources are reinstalled

    Different types of builds

    The images for Airflow are build for several scenarios - and the "hook/build" script with accompanying environment variable controls which images are built during those scenarios:

    ScenarioTriggerPurposeCacheFrequencyPull from DockerHubPush to DockerHubImages prepared during the build (controled by environment variables)
    Apt depsCI Apt depsMaster WheelcacheLocal wheelcacheAirflowCI
    DockerHub build for master branchA commit merged to "master"Build and push reference images that are used as cache for subsequent buildsFrom masterSeveral times per dayYesYesYesYesYesYesYesYes
    Local developer buildTriggered by the userBuild when developer adds dependencies or downloads new code and prepares development environmentFrom local images (pulled initially) unless cache is disabledOnce per dayFirst time or when requestedWhen requested and user logged inYesYes
    Yes


    Yes
    CI buildA commit is pushed to any branchBuilds image that is used to execute CI tests for commits pushed by developers.From masterSeveral times an hourYesNoYesYes


    Yes


    Timings for different scenarios

    Those timings were measured during tests. This includes image pull - full pull for CI builds and incremental pulls for

    Where builtNo source changeSources changedWWW sources changedNPM packages changedPIP Packages changedFull rebuildDocker build . (clean cache)

    DockerHub

    (Airflow +CI)







    -

    Travis CI

    (CI)







    -

    Cloud Build *

    (CI)







    -
    Google Compute Engine **






    Local Machine ***

    (CI) - pull images








    Local Machine ***

    (CI) - images pulled








    * Cloud Build - M8 High CPU - 3 Python versions built in parallel on single instance

    ** Google Compute Engine: custom (8 vCPUs, 31 GB memory)

    *** Local Machine: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 4 Cores

    Appendixes

    Results for initial measurements of sizes of layer images is shown. It has proven that multi-layered image size is comparable to mono-layered one and that there are significant download traffic savings in case of incremental builds.


    Expand
    titleComparision of mono/multi layered image sizes

    Details for Mono-layered Docker image for Airflow

    Implemented in https://github.com/apache/airflow/commit/e2c22fe70a488feea0cfecde890c20f8c984c09c 

    Available to pull at: 

    docker pull potiuk/airflow-monodocker:latest

    Only significant layers are shown:

    Layer

    Size

    When rebuilt/downloaded

    python:3.6-slim layers

    (there are 12 layers)

    138 MB

    Only the first time it is built

    Airflow Sources

    73 MB

    After every commit

    Airflow installed binaries

    (all - apt and pip installed together)

    765 MB

    After every commit


    Total: 976 MB


    Example download time when tested (full download after removing the image and docker system prune): 32.7 s (note this was not scientific enough and can be influenced by external factors)


    time docker pull potiuk/airflow-monodocker:latest
    latest: Pulling from potiuk/airflow-monodocker
    177e7ef0df69: Pull complete
    1dee839b70d8: Pull complete
    aafb04a34d0d: Pull complete
    9a36f2b2e390: Pull complete
    51ac94058903: Pull complete
    17105da27567: Pull complete
    08903c354ddd: Pull complete
    234eaa99bee5: Pull complete
    8c3bd3e34c20: Pull complete
    Digest: sha256:db5b707ddec35b5ceeb1caba9be5192965ad00ba34ec630fe5ee6b6d06c49b85
    Status: Downloaded newer image for potiuk/airflow-monodocker:latest

    real 0m32.744s
    user 0m0.090s
    sys 0m0.065s

    Details for Multi-layered Docker image of Airflow

    POC implemented in https://github.com/apache/airflow/pull/4543 

    Available to pull at:

    docker pull potiuk/airflow-layereddocker:latest

    Only significant layers are shown:

    Layer

    Size

    When rebuilt/downloaded

    python:3.6-slim layers

    (there are 12 layers)

    138 MB

    Only the first time it is built

    apt-get install core build deps

    118 MB

    Only when core dependencies change or when we force fresh build (extremely rare)

    apt-get install extra deps

    155MB

    Only when extra deps change (extremely rare)

    pip install deps (just setup no airflow sources)

    523 MB

    Only when setup.py changes (every few weeks usually)

    copy airflow sources

    73 MB

    After every commit

    Install extra airflow deps just in case

    6 MB

    After every commit


    Total: 1007 MB


    Example download time when tested (full download after removing the image and docker system prune): 33.7 s (note this was not scientific enough and can be influenced by external factors)


    time docker pull potiuk/airflow-layereddocker:latest
    latest: Pulling from potiuk/airflow-layereddocker
    177e7ef0df69: Pull complete
    1dee839b70d8: Pull complete
    aafb04a34d0d: Pull complete
    9a36f2b2e390: Pull complete
    51ac94058903: Pull complete
    18b01857bb01: Pull complete
    23ba9d802d8e: Pull complete
    28157c14842b: Pull complete
    8c6340a2c38d: Pull complete
    a1b4c634dcbc: Pull complete
    b0ce958037ac: Pull complete
    c93f50ea89e5: Pull complete
    939e3f06fc4b: Pull complete
    ed1e854d5b96: Pull complete
    918a0767c9ad: Pull complete
    b207cdc2df35: Pull complete
    99a53823ab76: Pull complete
    8c3bd3e34c20: Pull complete
    Digest: sha256:08a6e8ac7ae7b5c0de0b4d1c6cae3fbb8cb868f12ea3363dfb18374daa62b47a
    Status: Downloaded newer image for potiuk/airflow-layereddocker:latest
    real 0m33.761s
    user 0m0.100s
    sys 0m0.068s

    Note that ariflow sources + reinstall will grow between force - reinstalling of all dependencies because upgrades of packages will be added. However this should not be significant. If full reinstall is done periodically, the size of this layer is reset.

    It turns out that multi layered image is even a bit smaller than the monolayered one. But those are not all benefits that you get from multi-layered image. If you take into account usage patterns and users who download the image semi-frequently they will have to download the whole single layer pretty much every time, where in multi-layered approach they would only need to pull incremental changes - the size of incremental changes will change depending on whether setup.py dependencies are updated, or whether all dependencies are forced to be rebuilt from scratch.

    Simulation of downloads for a user that pulls the image regularly

    Here is the simulation showing how big downloads users will experience when downloading Airflow image semi-frequently (twice a week).

    Assumptions:

    • A user downloads a new image twice a week.

    • Setup.py is updated every two weeks.

    • Commits are happening daily.

    • Force rebuild from scratch every 4 weeks - to account for changed dependencies

    Mono layered downloads:

    • First download: 976 MB

    • all other downloads: 838 MB = 765 MB + 73 MB

    Multi-layered downloads:

    • First download: 1007 MB

    • Download if only sources changed (no setup.py): 73 MB

    • Download if setup.py changed: 757 MB = 155 MB + 523 MB + 73 MB+ 6 MB

    • Download if forced apt-get dependencies forced: 1007 MB - 138 MB = 869 MB


    User download size pattern:


    Weeks

    1

    2

    3

    4

    5

    6

    7

    8

    Total downloaded over the
    course of
    8 weeks (MB)

    Sources change

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x


    Setup.py

    changes

    x




    x




    x




    x





    Forced dependencies

    x








    x









    Monolayered (MB)

    976

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    13546

    Multilayered (MB)

    1007

    73

    73

    73

    757

    73

    73

    757

    869

    73

    73

    73

    757

    73

    73

    73

    4950 (36% of monolayered)



    Expand
    titleSources for calculation

    Sources for calculation

    Mono-layered image:

    docker history potiuk/airflow-monodocker:latest
    IMAGE CREATED CREATED BY SIZE COMMENT
    725143eaf153 17 minutes ago /bin/sh -c #(nop) CMD ["--help"] 0B
    <missing> 17 minutes ago /bin/sh -c #(nop) ENTRYPOINT ["/entrypoint.… 0B
    <missing> 17 minutes ago /bin/sh -c #(nop) COPY file:22d6c0f397f65528… 907B
    <missing> 17 minutes ago |5 AIRFLOW_DEPS=all AIRFLOW_HOME=/usr/local/… 0B
    <missing> 17 minutes ago /bin/sh -c #(nop) WORKDIR /usr/local/airflow 0B
    <missing> 17 minutes ago |5 AIRFLOW_DEPS=all AIRFLOW_HOME=/usr/local/… 765MB
    <missing> 24 minutes ago /bin/sh -c #(nop) WORKDIR /opt/airflow 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG APT_DEPS=freetds-dev … 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG buildDeps=freetds-dev… 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG PYTHON_DEPS= 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG AIRFLOW_DEPS=all 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG AIRFLOW_HOME=/usr/loc… 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) COPY dir:c08fa4a00d4740680… 72.8MB
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["python3"] 0B
    <missing> 2 weeks ago /bin/sh -c set -ex; savedAptMark="$(apt-ma… 7.13MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=18… 0B
    <missing> 2 weeks ago /bin/sh -c cd /usr/local/bin && ln -s idle3… 32B
    <missing> 2 weeks ago /bin/sh -c set -ex && savedAptMark="$(apt-… 69.2MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.6.8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV GPG_KEY=0D96DF4D4110E… 0B
    <missing> 2 weeks ago /bin/sh -c apt-get update && apt-get install… 6.48MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["bash"] 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ADD file:6d6f6f123e45697d3… 55.3MB


    Multi-layered image:


    docker history potiuk/airflow-layereddocker:latest
    IMAGE CREATED CREATED BY SIZE COMMENT
    055d0daee787 About an hour ago /bin/bash -c #(nop) CMD ["--help"] 0B
    <missing> About an hour ago /bin/bash -c #(nop) ENTRYPOINT ["/entrypoin… 0B
    <missing> About an hour ago /bin/bash -c #(nop) COPY file:22d6c0f397f655… 907B
    <missing> About an hour ago |4 ADDITIONAL_PYTHON_DEPS= AIRFLOW_EXTRAS=al… 0B
    <missing> About an hour ago /bin/bash -c #(nop) ARG ADDITIONAL_PYTHON_D… 0B
    <missing> About an hour ago |3 AIRFLOW_EXTRAS=all AIRFLOW_HOME=/usr/loca… 128kB
    <missing> About an hour ago |3 AIRFLOW_EXTRAS=all AIRFLOW_HOME=/usr/loca… 6.04MB
    <missing> About an hour ago /bin/bash -c #(nop) COPY dir:5d6f5c2f0d7171e… 72.8MB
    <missing> About an hour ago |3 AIRFLOW_EXTRAS=all AIRFLOW_HOME=/usr/loca… 523MB
    <missing> About an hour ago /bin/bash -c #(nop) WORKDIR /opt/airflow 0B
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:143db2e76b8f16… 1.26kB
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:590340f7066102… 3.04kB
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:3e78814fb55a47… 838B
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:53d0bc9002b31a… 29.6kB
    <missing> 15 hours ago /bin/bash -c #(nop) COPY multi:8bb5ed331b460… 14.2kB
    <missing> 15 hours ago /bin/bash -c #(nop) ENV SLUGIFY_USES_TEXT_U… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ENV CASS_DRIVER_NO_CYTH… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ENV CASS_DRIVER_BUILD_C… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ARG CASS_DRIVER_NO_CYTH… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ENV FORCE_REINSTALL_ALL… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ARG AIRFLOW_EXTRAS=all 0B
    <missing> 15 hours ago |1 AIRFLOW_HOME=/usr/local/airflow /bin/bash… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ARG AIRFLOW_HOME=/usr/l… 0B
    <missing> 5 days ago /bin/bash -c apt-get update && apt-get i… 155MB
    <missing> 5 days ago /bin/bash -c apt-get update && apt-get i… 118MB
    <missing> 5 days ago /bin/bash -c #(nop) ENV FORCE_REINSTALL_APT… 0B
    <missing> 5 days ago /bin/bash -c #(nop) ENV DEBIAN_FRONTEND=non… 0B
    <missing> 5 days ago /bin/bash -c #(nop) SHELL [/bin/bash -c] 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["python3"] 0B
    <missing> 2 weeks ago /bin/sh -c set -ex; savedAptMark="$(apt-ma… 7.13MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=18… 0B
    <missing> 2 weeks ago /bin/sh -c cd /usr/local/bin && ln -s idle3… 32B
    <missing> 2 weeks ago /bin/sh -c set -ex && savedAptMark="$(apt-… 69.2MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.6.8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV GPG_KEY=0D96DF4D4110E… 0B
    <missing> 2 weeks ago /bin/sh -c apt-get update && apt-get install… 6.48MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["bash"] 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ADD file:6d6f6f123e45697d3… 55.3MB



    Timings for different scenarios

    Those timings were measured during tests:

    Where built


    DockerHub


    Travis CI


    Cloud Build *




    Local Machine **


    * Cloud Build - M8 High CPU - 3 Python versions built in parallel 

    ** Local Machine: 

    M8 High CPU

    (3 Python versions

    in paralle.l

    Appendixes

    Results for initial measurements of sizes of layer images is shown. It has proven that multi-layered image size is comparable to mono-layered one and that there are significant download traffic savings in case of incremental builds.


    Expand
    titleComparision of mono/multi layered image sizes

    Details for Mono-layered Docker image for Airflow

    Implemented in https://github.com/apache/airflow/commit/e2c22fe70a488feea0cfecde890c20f8c984c09c 

    Available to pull at: 

    docker pull potiuk/airflow-monodocker:latest

    Only significant layers are shown:

    Layer

    Size

    When rebuilt/downloaded

    python:3.6-slim layers

    (there are 12 layers)

    138 MB

    Only the first time it is built

    Airflow Sources

    73 MB

    After every commit

    Airflow installed binaries

    (all - apt and pip installed together)

    765 MB

    After every commit


    Total: 976 MB


    Example download time when tested (full download after removing the image and docker system prune): 32.7 s (note this was not scientific enough and can be influenced by external factors)


    time docker pull potiuk/airflow-monodocker:latest
    latest: Pulling from potiuk/airflow-monodocker
    177e7ef0df69: Pull complete
    1dee839b70d8: Pull complete
    aafb04a34d0d: Pull complete
    9a36f2b2e390: Pull complete
    51ac94058903: Pull complete
    17105da27567: Pull complete
    08903c354ddd: Pull complete
    234eaa99bee5: Pull complete
    8c3bd3e34c20: Pull complete
    Digest: sha256:db5b707ddec35b5ceeb1caba9be5192965ad00ba34ec630fe5ee6b6d06c49b85
    Status: Downloaded newer image for potiuk/airflow-monodocker:latest

    real 0m32.744s
    user 0m0.090s
    sys 0m0.065s

    Details for Multi-layered Docker image of Airflow

    POC implemented in https://github.com/apache/airflow/pull/4543 

    Available to pull at:

    docker pull potiuk/airflow-layereddocker:latest

    Only significant layers are shown:

    Layer

    Size

    When rebuilt/downloaded

    python:3.6-slim layers

    (there are 12 layers)

    138 MB

    Only the first time it is built

    apt-get install core build deps

    118 MB

    Only when core dependencies change or when we force fresh build (extremely rare)

    apt-get install extra deps

    155MB

    Only when extra deps change (extremely rare)

    pip install deps (just setup no airflow sources)

    523 MB

    Only when setup.py changes (every few weeks usually)

    copy airflow sources

    73 MB

    After every commit

    Install extra airflow deps just in case

    6 MB

    After every commit


    Total: 1007 MB


    Example download time when tested (full download after removing the image and docker system prune): 33.7 s (note this was not scientific enough and can be influenced by external factors)


    time docker pull potiuk/airflow-layereddocker:latest
    latest: Pulling from potiuk/airflow-layereddocker
    177e7ef0df69: Pull complete
    1dee839b70d8: Pull complete
    aafb04a34d0d: Pull complete
    9a36f2b2e390: Pull complete
    51ac94058903: Pull complete
    18b01857bb01: Pull complete
    23ba9d802d8e: Pull complete
    28157c14842b: Pull complete
    8c6340a2c38d: Pull complete
    a1b4c634dcbc: Pull complete
    b0ce958037ac: Pull complete
    c93f50ea89e5: Pull complete
    939e3f06fc4b: Pull complete
    ed1e854d5b96: Pull complete
    918a0767c9ad: Pull complete
    b207cdc2df35: Pull complete
    99a53823ab76: Pull complete
    8c3bd3e34c20: Pull complete
    Digest: sha256:08a6e8ac7ae7b5c0de0b4d1c6cae3fbb8cb868f12ea3363dfb18374daa62b47a
    Status: Downloaded newer image for potiuk/airflow-layereddocker:latest
    real 0m33.761s
    user 0m0.100s
    sys 0m0.068s

    Note that ariflow sources + reinstall will grow between force - reinstalling of all dependencies because upgrades of packages will be added. However this should not be significant. If full reinstall is done periodically, the size of this layer is reset.

    It turns out that multi layered image is even a bit smaller than the monolayered one. But those are not all benefits that you get from multi-layered image. If you take into account usage patterns and users who download the image semi-frequently they will have to download the whole single layer pretty much every time, where in multi-layered approach they would only need to pull incremental changes - the size of incremental changes will change depending on whether setup.py dependencies are updated, or whether all dependencies are forced to be rebuilt from scratch.

    Simulation of downloads for a user that pulls the image regularly

    Here is the simulation showing how big downloads users will experience when downloading Airflow image semi-frequently (twice a week).

    Assumptions:

    • A user downloads a new image twice a week.

    • Setup.py is updated every two weeks.

    • Commits are happening daily.

    • Force rebuild from scratch every 4 weeks - to account for changed dependencies

    Mono layered downloads:

    • First download: 976 MB

    • all other downloads: 838 MB = 765 MB + 73 MB

    Multi-layered downloads:

    • First download: 1007 MB

    • Download if only sources changed (no setup.py): 73 MB

    • Download if setup.py changed: 757 MB = 155 MB + 523 MB + 73 MB+ 6 MB

    • Download if forced apt-get dependencies forced: 1007 MB - 138 MB = 869 MB


    User download size pattern:


    Weeks

    1

    2

    3

    4

    5

    6

    7

    8

    Total downloaded over the
    course of
    8 weeks (MB)

    Sources change

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x


    Setup.py

    changes

    x




    x




    x




    x





    Forced dependencies

    x








    x









    Monolayered (MB)

    976

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    13546

    Multilayered (MB)

    1007

    73

    73

    73

    757

    73

    73

    757

    869

    73

    73

    73

    757

    73

    73

    73

    4950 (36% of monolayered)



    Expand
    titleSources for calculation

    Sources for calculation

    Mono-layered image:

    docker history potiuk/airflow-monodocker:latest
    IMAGE CREATED CREATED BY SIZE COMMENT
    725143eaf153 17 minutes ago /bin/sh -c #(nop) CMD ["--help"] 0B
    <missing> 17 minutes ago /bin/sh -c #(nop) ENTRYPOINT ["/entrypoint.… 0B
    <missing> 17 minutes ago /bin/sh -c #(nop) COPY file:22d6c0f397f65528… 907B
    <missing> 17 minutes ago |5 AIRFLOW_DEPS=all AIRFLOW_HOME=/usr/local/… 0B
    <missing> 17 minutes ago /bin/sh -c #(nop) WORKDIR /usr/local/airflow 0B
    <missing> 17 minutes ago |5 AIRFLOW_DEPS=all AIRFLOW_HOME=/usr/local/… 765MB
    <missing> 24 minutes ago /bin/sh -c #(nop) WORKDIR /opt/airflow 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG APT_DEPS=freetds-dev … 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG buildDeps=freetds-dev… 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG PYTHON_DEPS= 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG AIRFLOW_DEPS=all 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG AIRFLOW_HOME=/usr/loc… 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) COPY dir:c08fa4a00d4740680… 72.8MB
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["python3"] 0B
    <missing> 2 weeks ago /bin/sh -c set -ex; savedAptMark="$(apt-ma… 7.13MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=18… 0B
    <missing> 2 weeks ago /bin/sh -c cd /usr/local/bin && ln -s idle3… 32B
    <missing> 2 weeks ago /bin/sh -c set -ex && savedAptMark="$(apt-… 69.2MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.6.8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV GPG_KEY=0D96DF4D4110E… 0B
    <missing> 2 weeks ago /bin/sh -c apt-get update && apt-get install… 6.48MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["bash"] 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ADD file:6d6f6f123e45697d3… 55.3MB


    Multi-layered image:


    docker history potiuk/airflow-layereddocker:latest
    IMAGE CREATED CREATED BY SIZE COMMENT
    055d0daee787 About an hour ago /bin/bash -c #(nop) CMD ["--help"] 0B
    <missing> About an hour ago /bin/bash -c #(nop) ENTRYPOINT ["/entrypoin… 0B
    <missing> About an hour ago /bin/bash -c #(nop) COPY file:22d6c0f397f655… 907B
    <missing> About an hour ago |4 ADDITIONAL_PYTHON_DEPS= AIRFLOW_EXTRAS=al… 0B
    <missing> About an hour ago /bin/bash -c #(nop) ARG ADDITIONAL_PYTHON_D… 0B
    <missing> About an hour ago |3 AIRFLOW_EXTRAS=all AIRFLOW_HOME=/usr/loca… 128kB
    <missing> About an hour ago |3 AIRFLOW_EXTRAS=all AIRFLOW_HOME=/usr/loca… 6.04MB
    <missing> About an hour ago /bin/bash -c #(nop) COPY dir:5d6f5c2f0d7171e… 72.8MB
    <missing> About an hour ago |3 AIRFLOW_EXTRAS=all AIRFLOW_HOME=/usr/loca… 523MB
    <missing> About an hour ago /bin/bash -c #(nop) WORKDIR /opt/airflow 0B
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:143db2e76b8f16… 1.26kB
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:590340f7066102… 3.04kB
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:3e78814fb55a47… 838B
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:53d0bc9002b31a… 29.6kB
    <missing> 15 hours ago /bin/bash -c #(nop) COPY multi:8bb5ed331b460… 14.2kB
    <missing> 15 hours ago /bin/bash -c #(nop) ENV SLUGIFY_USES_TEXT_U… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ENV CASS_DRIVER_NO_CYTH… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ENV CASS_DRIVER_BUILD_C… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ARG CASS_DRIVER_NO_CYTH… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ENV FORCE_REINSTALL_ALL… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ARG AIRFLOW_EXTRAS=all 0B
    <missing> 15 hours ago |1 AIRFLOW_HOME=/usr/local/airflow /bin/bash… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ARG AIRFLOW_HOME=/usr/l… 0B
    <missing> 5 days ago /bin/bash -c apt-get update && apt-get i… 155MB
    <missing> 5 days ago /bin/bash -c apt-get update && apt-get i… 118MB
    <missing> 5 days ago /bin/bash -c #(nop) ENV FORCE_REINSTALL_APT… 0B
    <missing> 5 days ago /bin/bash -c #(nop) ENV DEBIAN_FRONTEND=non… 0B
    <missing> 5 days ago /bin/bash -c #(nop) SHELL [/bin/bash -c] 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["python3"] 0B
    <missing> 2 weeks ago /bin/sh -c set -ex; savedAptMark="$(apt-ma… 7.13MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=18… 0B
    <missing> 2 weeks ago /bin/sh -c cd /usr/local/bin && ln -s idle3… 32B
    <missing> 2 weeks ago /bin/sh -c set -ex && savedAptMark="$(apt-… 69.2MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.6.8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV GPG_KEY=0D96DF4D4110E… 0B
    <missing> 2 weeks ago /bin/sh -c apt-get update && apt-get install… 6.48MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["bash"] 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ADD file:6d6f6f123e45697d3… 55.3MB

    Conclusions

    • The multi-layered image is only slightly bigger than the mono-layered one (976 MB mono-layered, 1007 MB multi-layeredaround 2% more in total ) - download time is also slightly longer by 1 s  (33.7 vs 32.7s) which is 3% longer.
    • Downloading the image regularly by the users is way better in case of multi-layered image - for simulated user, downloading airflow image twice a week it is:  4950 MB  (multi-layered) vs. 13546 MB (mono-layered) downloads over the course of 8 weeks. Yielding 64% less data to download.
    • Multi-layered image seems to be much better for users regularly downloading the image.
    • TODO:

    Appendixes

    Sources for calculation

    Mono-layered image:

    docker history potiuk/airflow-monodocker:latest
    IMAGE CREATED CREATED BY SIZE COMMENT
    725143eaf153 17 minutes ago /bin/sh -c #(nop) CMD ["--help"] 0B
    <missing> 17 minutes ago /bin/sh -c #(nop) ENTRYPOINT ["/entrypoint.… 0B
    <missing> 17 minutes ago /bin/sh -c #(nop) COPY file:22d6c0f397f65528… 907B
    <missing> 17 minutes ago |5 AIRFLOW_DEPS=all AIRFLOW_HOME=/usr/local/… 0B
    <missing> 17 minutes ago /bin/sh -c #(nop) WORKDIR /usr/local/airflow 0B
    <missing> 17 minutes ago |5 AIRFLOW_DEPS=all AIRFLOW_HOME=/usr/local/… 765MB
    <missing> 24 minutes ago /bin/sh -c #(nop) WORKDIR /opt/airflow 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG APT_DEPS=freetds-dev … 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG buildDeps=freetds-dev… 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG PYTHON_DEPS= 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG AIRFLOW_DEPS=all 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) ARG AIRFLOW_HOME=/usr/loc… 0B
    <missing> 24 minutes ago /bin/sh -c #(nop) COPY dir:c08fa4a00d4740680… 72.8MB
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["python3"] 0B
    <missing> 2 weeks ago /bin/sh -c set -ex; savedAptMark="$(apt-ma… 7.13MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=18… 0B
    <missing> 2 weeks ago /bin/sh -c cd /usr/local/bin && ln -s idle3… 32B
    <missing> 2 weeks ago /bin/sh -c set -ex && savedAptMark="$(apt-… 69.2MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.6.8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV GPG_KEY=0D96DF4D4110E… 0B
    <missing> 2 weeks ago /bin/sh -c apt-get update && apt-get install… 6.48MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["bash"] 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ADD file:6d6f6f123e45697d3… 55.3MB

    Multi-layered image:

    docker history potiuk/airflow-layereddocker:latest
    IMAGE CREATED CREATED BY SIZE COMMENT
    055d0daee787 About an hour ago /bin/bash -c #(nop) CMD ["--help"] 0B
    <missing> About an hour ago /bin/bash -c #(nop) ENTRYPOINT ["/entrypoin… 0B
    <missing> About an hour ago /bin/bash -c #(nop) COPY file:22d6c0f397f655… 907B
    <missing> About an hour ago |4 ADDITIONAL_PYTHON_DEPS= AIRFLOW_EXTRAS=al… 0B
    <missing> About an hour ago /bin/bash -c #(nop) ARG ADDITIONAL_PYTHON_D… 0B
    <missing> About an hour ago |3 AIRFLOW_EXTRAS=all AIRFLOW_HOME=/usr/loca… 128kB
    <missing> About an hour ago |3 AIRFLOW_EXTRAS=all AIRFLOW_HOME=/usr/loca… 6.04MB
    <missing> About an hour ago /bin/bash -c #(nop) COPY dir:5d6f5c2f0d7171e… 72.8MB
    <missing> About an hour ago |3 AIRFLOW_EXTRAS=all AIRFLOW_HOME=/usr/loca… 523MB
    <missing> About an hour ago /bin/bash -c #(nop) WORKDIR /opt/airflow 0B
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:143db2e76b8f16… 1.26kB
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:590340f7066102… 3.04kB
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:3e78814fb55a47… 838B
    <missing> 15 hours ago /bin/bash -c #(nop) COPY file:53d0bc9002b31a… 29.6kB
    <missing> 15 hours ago /bin/bash -c #(nop) COPY multi:8bb5ed331b460… 14.2kB
    <missing> 15 hours ago /bin/bash -c #(nop) ENV SLUGIFY_USES_TEXT_U… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ENV CASS_DRIVER_NO_CYTH… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ENV CASS_DRIVER_BUILD_C… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ARG CASS_DRIVER_NO_CYTH… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ENV FORCE_REINSTALL_ALL… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ARG AIRFLOW_EXTRAS=all 0B
    <missing> 15 hours ago |1 AIRFLOW_HOME=/usr/local/airflow /bin/bash… 0B
    <missing> 15 hours ago /bin/bash -c #(nop) ARG AIRFLOW_HOME=/usr/l… 0B
    <missing> 5 days ago /bin/bash -c apt-get update && apt-get i… 155MB
    <missing> 5 days ago /bin/bash -c apt-get update && apt-get i… 118MB
    <missing> 5 days ago /bin/bash -c #(nop) ENV FORCE_REINSTALL_APT… 0B
    <missing> 5 days ago /bin/bash -c #(nop) ENV DEBIAN_FRONTEND=non… 0B
    <missing> 5 days ago /bin/bash -c #(nop) SHELL [/bin/bash -c] 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["python3"] 0B
    <missing> 2 weeks ago /bin/sh -c set -ex; savedAptMark="$(apt-ma… 7.13MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=18… 0B
    <missing> 2 weeks ago /bin/sh -c cd /usr/local/bin && ln -s idle3… 32B
    <missing> 2 weeks ago /bin/sh -c set -ex && savedAptMark="$(apt-… 69.2MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.6.8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV GPG_KEY=0D96DF4D4110E… 0B
    <missing> 2 weeks ago /bin/sh -c apt-get update && apt-get install… 6.48MB
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) CMD ["bash"] 0B
    <missing> 2 weeks ago /bin/sh -c #(nop) ADD file:6d6f6f123e45697d3… 55.3MB

    Total: 976 MB

    Example download time when tested (full download after removing the image and docker system prune): 32.7 s (note this was not scientific enough and can be influenced by external factors)

    time docker pull potiuk/airflow-monodocker:latest
    latest: Pulling from potiuk/airflow-monodocker
    177e7ef0df69: Pull complete
    1dee839b70d8: Pull complete
    aafb04a34d0d: Pull complete
    9a36f2b2e390: Pull complete
    51ac94058903: Pull complete
    17105da27567: Pull complete
    08903c354ddd: Pull complete
    234eaa99bee5: Pull complete
    8c3bd3e34c20: Pull complete
    Digest: sha256:db5b707ddec35b5ceeb1caba9be5192965ad00ba34ec630fe5ee6b6d06c49b85
    Status: Downloaded newer image for potiuk/airflow-monodocker:latest
    real 0m32.744s
    user 0m0.090s
    sys 0m0.065s

    Details for Multi-layered Docker image of Airflow

    POC implemented in https://github.com/apache/airflow/pull/4543 

    Available to pull at:

    docker pull potiuk/airflow-layereddocker:latest

    Only significant layers are shown:

    Total: 1007 MB

    Example download time when tested (full download after removing the image and docker system prune): 33.7 s (note this was not scientific enough and can be influenced by external factors)

    time docker pull potiuk/airflow-layereddocker:latest
    latest: Pulling from potiuk/airflow-layereddocker
    177e7ef0df69: Pull complete
    1dee839b70d8: Pull complete
    aafb04a34d0d: Pull complete
    9a36f2b2e390: Pull complete
    51ac94058903: Pull complete
    18b01857bb01: Pull complete
    23ba9d802d8e: Pull complete
    28157c14842b: Pull complete
    8c6340a2c38d: Pull complete
    a1b4c634dcbc: Pull complete
    b0ce958037ac: Pull complete
    c93f50ea89e5: Pull complete
    939e3f06fc4b: Pull complete
    ed1e854d5b96: Pull complete
    918a0767c9ad: Pull complete
    b207cdc2df35: Pull complete
    99a53823ab76: Pull complete
    8c3bd3e34c20: Pull complete
    Digest: sha256:08a6e8ac7ae7b5c0de0b4d1c6cae3fbb8cb868f12ea3363dfb18374daa62b47a
    Status: Downloaded newer image for potiuk/airflow-layereddocker:latest
    real 0m33.761s
    user 0m0.100s
    sys 0m0.068s

    Note that ariflow sources + reinstall will grow between force - reinstalling of all dependencies because upgrades of packages will be added. However this should not be significant. If full reinstall is done periodically, the size of this layer is reset.

    It turns out that multi layered image is even a bit smaller than the monolayered one. But those are not all benefits that you get from multi-layered image. If you take into account usage patterns and users who download the image semi-frequently they will have to download the whole single layer pretty much every time, where in multi-layered approach they would only need to pull incremental changes - the size of incremental changes will change depending on whether setup.py dependencies are updated, or whether all dependencies are forced to be rebuilt from scratch.

    Simulation of downloads for a user that pulls the image regularly

    Here is the simulation showing how big downloads users will experience when downloading Airflow image semi-frequently (twice a week).

    Assumptions:

    • A user downloads a new image twice a week.

    • Setup.py is updated every two weeks.

    • Commits are happening daily.

    • Force rebuild from scratch every 4 weeks - to account for changed dependencies

    Mono layered downloads:

    • First download: 976 MB

    • all other downloads: 838 MB = 765 MB + 73 MB

    Multi-layered downloads:

    • First download: 1007 MB

    • Download if only sources changed (no setup.py): 73 MB

    • Download if setup.py changed: 757 MB = 155 MB + 523 MB + 73 MB+ 6 MB

    • Download if forced apt-get dependencies forced: 1007 MB - 138 MB = 869 MB

    User download size pattern:

    4950 (36% of monolayered)
    Expand
    titleSources for calculation
    Expand
    titleComparision of mono/multi layered image sizes

    Details for Mono-layered Docker image for Airflow

    Implemented in https://github.com/apache/airflow/commit/e2c22fe70a488feea0cfecde890c20f8c984c09c 

    Available to pull at: 

    docker pull potiuk/airflow-monodocker:latest

    Only significant layers are shown:

    Layer

    Size

    When rebuilt/downloaded

    python:3.6-slim layers

    (there are 12 layers)

    138 MB

    Only the first time it is built

    Airflow Sources

    73 MB

    After every commit

    Airflow installed binaries

    (all - apt and pip installed together)

    765 MB

    After every commit

    Layer

    Size

    When rebuilt/downloaded

    python:3.6-slim layers

    (there are 12 layers)

    138 MB

    Only the first time it is built

    apt-get install core build deps

    118 MB

    Only when core dependencies change or when we force fresh build (extremely rare)

    apt-get install extra deps

    155MB

    Only when extra deps change (extremely rare)

    pip install deps (just setup no airflow sources)

    523 MB

    Only when setup.py changes (every few weeks usually)

    copy airflow sources

    73 MB

    After every commit

    Install extra airflow deps just in case

    6 MB

    After every commit

    Weeks

    1

    2

    3

    4

    5

    6

    7

    8

    Total downloaded over the
    course of
    8 weeks (MB)

    Sources change

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    x

    Setup.py

    changes

    x

    x

    x

    x

    Forced dependencies

    x

    x

    Monolayered (MB)

    976

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    838

    13546

    Multilayered (MB)

    1007

    73

    73

    73

    757

    73

    73

    757

    869

    73

    73

    73

    757

    73

    73

    73