Content
Context
With Travis CI, our builds are limited to Apache org's quota which is shared by many other Apache projects, which resulted in long waiting time and merging PRs becoming an unbearable process. Inspired by Apache Flink's approach, we are setting up a similar CI infrastructure by making use of Azure DevOps and Google Cloud's Compute Engine.
Overview
We created https://github.com/apachehudi-ci to manage all CI-related repositories.
- https://github.com/apachehudi-ci/hudi-mirror
- This repo mirrors the official repo (apache/hudi) for building commits merged into master and release-* branches
- https://github.com/apachehudi-ci/hudi-branch-ci
- This repo is for building branches created by PRs
- Its master branch stays at the commit when it's created and does not need to sync with the official repo; the repo is only used as a host for building branches
The diagram below gives an overview of the infrastructure.
GCP Compute Engine (VM)
We are running a e2-micro (free) vm instance in this GCP project. Bootstrap steps
sudo apt update sudo apt install git cron adoptopenjdk-8-hotspot maven git clone git@github.com:apachehudi-ci/git-repo-sync.git git clone git@github.com:apachehudi-ci/ci-bot.git cd ci-bot git checkout -t origin/fix-for-hudi mvn clean install
Mirror master & release commits
Use crontab -e
to schedule mirroring job. It's currently set to the following to mirror master and release commits to apachehudi-ci
repo
*/10 * * * * $HOME/git-repo-sync/sync_repo.sh > /dev/null 2>&1
Scan PRs and trigger branch builds
Run $HOME/git-repo-sync/run_cibot.sh
to start the CI branch builds in the background.
Maintenance
Manage the background process with htop
. Usually the steps include: kill the process from htop
, clean up ~/ci-bot.log , and re-run the script.
Check Azure and GitHub token expiry date and update them accordingly. They are used in $HOME/git-repo-sync/run_cibot.sh
Possible issues
- Any mirrored repo (e.g., master, release-*) is not getting pushed and the mirrored CI won't run.
git fetch or push in $HOME/git-repo-sync/sync_repo.sh not working properly. Manually try running the git commands and troubleshoot accordingly.
Azure Pipelines
There are two pipelines defined in this Azure DevOps project
- hudi-mirror: for master/release version builds
- hudi-branch-ci: for PR builds
For each hudi-branch-ci build, hudi-bot will post and update comment on its corresponding PR like this.
PR reviewer should use this CI report's result as one of the merging criteria.
Note: These PRs will be skipped for Azure CI build
- website PR targeting asf-site branch
- Labeled
rfc
or has[RFC-
in the title - Labeled
pr:wip
or has[WIP]
in the title
Get Help
- Azure DevOps docs
- Raise questions in https://developercommunity.visualstudio.com/ for community support
Nightly Build
TBD
Performance Benchmarking
TBD