...
Build timings for different scenarios
Those timings were measured during tests. Times are in HH:MM:SS.
The yellow rows indicate timings for the orignal "Mono-layered" builds for comparision of incremental build times.
Where built | Images | No source change | Sources changed | WWW sources changed | NPM packages changed | PIP Packages changed | CI Apt deps changed | Apt deps changed | Full build |
---|
(from scratch) | Commwnts | ||
---|---|---|---|
DockerHub Includes pull of cache | Airflow CI | 8:20 | 8: |
40 |
11: |
01 | 13:40 | 33: |
30 |
38:45 | 44:00 | 44:00 | Delays on DockerHub | |||||||
Travis CI | CI | 3:24 | 3:32 | 3:30 | 3:47 | 5:45 | 7:39 | 8:24 | 8:26 | Typical timing for CI builds |
---|---|---|---|---|---|---|---|---|---|---|
Cloud Build * Includes pull of cache | CI | 2:53 | 3:00 | 3:07 | 3:31 | 4:40 | 6:44 | 8:33 | 9:35 | |
Google Compute Engine ** /hooks/build | Airflow CI | 1:13 | 1:23 | 1:43 | 2:26 | 10:20 | 12:30 | 13:09 |
16:35 | ||||||||
Google Compute Engine ** Only CI build using breeze | CI | 0:10 (no rebuild) | 0:10 (no rebuild) | 0:10 (no rebuild) | 1:40 | 3:14 | 5:30 | 8:40 |
---|
7:22 | More time needed to pull than y=to build from scratch | |||||||||
Google Compute Engine ** 'docker build . --build-arg APT_DEPS_IMAGE=airflow-ci-apt-deps' | CI | 0:02 | 0:13 | 0:23 | 1:00 | 4:36 | 6:05 | 7:50 | 10:28 | |
---|---|---|---|---|---|---|---|---|---|---|
Google Compute Engine ** 'docker build .' | Airflow | 0:02 | 0:13 | 0:23 | 1:00 | 4:35 |
5:10 | 7:42 | 8:22 | ||||||||
Google Compute Engine ** Monolayer (Cassandra fix) **** | Airflow | 0:01 | 4:23 | 4:23 | 4:23 | 4:23 | 4:23 | 4:23 | 5:30 | Cassandra fix is biggest improvement |
---|---|---|---|---|---|---|---|---|---|---|
Google Compute Engine ** Monolayer | Airflow | 0:01 | 9:07 | 9:07 | 9:07 | 9:07 | 9:07 | 9:07 | 10:43 | |
Local Machine *** Only CI build using breeze | CI | 0:05 (no rebuild) | 0:05 (no rebuild) | 0:05 (no rebuild) | 1:36 | 4:20 | 7:13 | 8:07 |
Typical timing for local development | ||||||||||
Local Machine *** 'docker build . --build-arg APT_DEPS_IMAGE=airflow-ci-apt-deps' | CI | 0:02 | 0:15 | 0:25 | 0:44 | 4:07 | 6:22 | 7:43 | 10:20 | |
---|---|---|---|---|---|---|---|---|---|---|
Local Machine *** 'docker build .' | Airflow | 0:09 | 0:20 | 0:29 | 0:56 | 4:28 | 3:30 | 8:09 | 10:18 | |
Local Machine *** Monolayer (Cassandra fix) **** | Airflow | 0:30 | 4:34 | 4:34 | 4:34 | 4:34 | 4:34 | 4:34 | 5:56 | |
Local Machine *** Monolayer | Airflow | 0:27 | 8:26 | 8:26 | 8:26 | 8:26 | 8:26 | 8:26 | 9:26 |
* Cloud Build - M8 High CPU - 3 Python versions built in parallel on single instance
** Google Compute Engine: custom (8 vCPUs, 31 GB memory)
*** Local Machine: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 4 Cores. Using MacBook impacts Context sending times → it takes significantly longer to send context to Linux Kernel VM which is used on Mac.
**** Cassandra fix - installing cassandra driver takes a lot of time - it compiles cython-based driver (which is good for performance) - Cassandra fix speeds up the build by removing cython optimisations. Multi-layer images are build with cassandra fix.
Image size comparison
Size | |
---|---|
Airflow monolayer image | 1.2GB |
Airflow multi-layer | 1.2GB |
CI multi-layer |
Appendices
Results for initial measurements of sizes of layer images is shown. It has proven that multi-layered image size is comparable to mono-layered one and that there are significant download traffic savings in case of incremental builds.
Expand | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Details for Mono-layered Docker image for AirflowImplemented in https://github.com/apache/airflow/commit/e2c22fe70a488feea0cfecde890c20f8c984c09c Available to pull at: docker pull potiuk/airflow-monodocker:latest Only significant layers are shown:
Total: 976 MB Example download time when tested (full download after removing the image and docker system prune): 32.7 s (note this was not scientific enough and can be influenced by external factors) time docker pull potiuk/airflow-monodocker:latest real 0m32.744s Details for Multi-layered Docker image of AirflowPOC implemented in https://github.com/apache/airflow/pull/4543 Available to pull at: docker pull potiuk/airflow-layereddocker:latest Only significant layers are shown:
Total: 1007 MB Example download time when tested (full download after removing the image and docker system prune): 33.7 s (note this was not scientific enough and can be influenced by external factors) time docker pull potiuk/airflow-layereddocker:latest real 0m33.761s Note that ariflow sources + reinstall will grow between force - reinstalling of all dependencies because upgrades of packages will be added. However this should not be significant. If full reinstall is done periodically, the size of this layer is reset. It turns out that multi layered image is even a bit smaller than the monolayered one. But those are not all benefits that you get from multi-layered image. If you take into account usage patterns and users who download the image semi-frequently they will have to download the whole single layer pretty much every time, where in multi-layered approach they would only need to pull incremental changes - the size of incremental changes will change depending on whether setup.py dependencies are updated, or whether all dependencies are forced to be rebuilt from scratch. Simulation of downloads for a user that pulls the image regularlyHere is the simulation showing how big downloads users will experience when downloading Airflow image semi-frequently (twice a week). Assumptions:
Mono layered downloads:
Multi-layered downloads:
User download size pattern:
|
Expand | ||
---|---|---|
| ||
Sources for calculationMono-layered image: docker history potiuk/airflow-monodocker:latest IMAGE CREATED CREATED BY SIZE COMMENT Multi-layered image: docker history potiuk/airflow-layereddocker:latest IMAGE CREATED CREATED BY SIZE COMMENT |
...