Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
This document describes the security design about MXNet CI.

Components

The following components are part of MXNet’s CI setup.

EC2

 

Ubuntu 16.04

Windows Server 2016

Jenkins

Apache Jenkins acts as the CI software package to validate the integrity of MXNet’s code and pull requests. Jenkins consists two parts: The Master acts as scheduler and offers a web interface while the actual code compilation and execution happens on slaves. In both cases, the process is being executed using a restricted user specifically created for Jenkins(TODO: Windows?!).

Master

OUTDATED: replaced with GitHub SSO. TODO: Update

...

In this setup, option 2 has been chosen. This allows separation of concerns, no credential management is necessary and communication between instances is kept to a minimum on the application level.  

 

Slave

Slaves connect to the master using JNLP and await to execute Jenkins-jobs. Due to the nature of CI and the fact that all branches as well as Pull Requests are being compiled and tested, all slaves have to execute completely arbitrary code. This may pose a security risk as all types of malware can be installed on a slave by submitting a malicious Pull Request. By design, this cannot be avoided without introducing a lot of burden on Committers by adding the requirement that all builds have to be triggered manually. In order to reduce possible impact, all slaves store no credentials, generate no end-user-artifacts (no Continous Deployment!) and are completely disposable. Pull Requests always have to get reviewed by an MXNet Committer before they get merged into the Master branch – a persistent malware residing in MXNet is thus very unlikely. Additionally, temporary malware introduced by a Pull Request may be disposed automatically after a few days due to the auto scaling features. A freshly started slave will always use a manually created AMI and use read-only resources during the initial start-up – thus, there’s no way that malware may reside in between instance generations. Due to the nature of an Open Source project, PRs are usually getting reviewed by various people at arbitrary points in time; it’s thus very unlikely that a malicious PR stays undetected for a long time.

Docker

Docker and Nvidia-Docker are utilized to provide a deterministic environment on slaves and not part of the security measurements. This is due to the reason that all Dockerfiles as well as the Jenkinsfile reside in the GitHub-repository and thus can be modified by anybody.

GitHub

The monitored repository is located at https://github.com/apache/incubator-mxnet. All branches and Pull Requests are being retrieved on a regular base, triggered by a Web Hook. This requires running the Web Hook service on Jenkins’ web-interface-port 80/443. To ensure no anonymous requests are accepted, a shared secret is being used. This secret is being stored in KMS and retrieved during start-up of the master instance. Secret-rotation is not preferred due to heavy resource constraints on the Apache Infra team – they manage the GitHub-repository and every rotation would require a ticket.

To commit the build status back to GitHub, credentials of an authorized GitHub-account have to be used. These are stored in KMS and retrieved during start up of the Jenkins Master.

...

Serverless

Gordon Serverless is an infrastructure-as-code-tool to create, wire and deploy AWS Lambdas using CloudFormation. In this setup, all IAM-roles, VPCs, security groups and S3-buckets used by Lambda are being defined.

Terraform

Terraform is an in infrastructure-as-code-tool to define the environment for EC2 instances. In this setup, Route 53, S3, IAM-users, IAM-roles, IAM-policies, VPCs, security groups, EFS-volumes and EC2-instances are being defined. 

...

  1. Acquire instance
  2. Upload jenkins-config to S3
  3. Store start-up script and data in CloudInit-files
  4. Set Route53 records

CloudWatch

This service is being used to store metrics necessary for the auto scaling. All stored information is publicly accessible over the publicly accessible Jenkins REST API (TODO: Link) and thus not classified. To detect harmful behaviour caused by malfunctioning auto scaling or malicious actions, various alarms are being used.

Lambda

In order to retrieve data for CloudWatch, Lambda executes Python 3.4 scripts to read from the public Jenkins REST API using JenkinsAPI. The deployment of all scripts is triggered manually using Gordon. After aggregating the information, they're getting pushed as Metrics to CloudWatch using aws-cli. No other services or servers are being accessed. 

CloudFormation

CloudFormation is used to define the environment of Lambda scripts, controlled by Gordon.

S3

S3 is used to store configuration files only. The following S3-buckets are being used:

  • mxnet-ci-master: Jenkins-Master configuration files
  • mxnet-ci-logging: Logging directory
  • gordon-lambda-mxnet-ci: CloudFormation templates, Lambda-libraries and Lambda scripts

Firewall

Security groups

VPC

AWS permissions

Security measurements