...
As soon as you enter one of these directory names, a terraform template launches an instance in your AWS account, it will execute the necessary setup logic and then stop the instance in order to allow you to continue with the launch template creation process. Warning: do not stop the instance manually! Please note that you will need a named AWS CLI profile called 'mxnet-ci-dev' or this operation is going to fail.
...
On Ubuntu, no additional steps are necessary after executing the create-slave shellscriptshell script. Just create an AMI in the EC2 console after the instance has reached the Stopped-state. Warning: do not stop the instance manually as it leaves it in an inconsistent state that will be baked into the launch template.
Windows
On Windows, there is currently no process to set up a slave from scratch and the above shellscript is not applicable.
...
Expand | ||
---|---|---|
| ||
Ubuntu CPUAMI-ID: ID of the previously created AMI Instance type: C5.18xlarge Key-Pair-Name: mxnet_edge_berlin_shared_rsa Network type: VPC Network interfaces: - Volumes: EBS / 400GB / GP2 / Delete on terminated: yes / Default IOPS Security groups: TODO IAM instance profile: TODO Monitoring: Enable Ubuntu GPUAMI-ID: ID of the previously created AMI Instance type: G3.8xlarge Key-Pair-Name: mxnet_edge_berlin_shared_rsa Network type: VPC Network interfaces: - Volumes: EBS / 2000GB / GP2 / Delete on terminated: yes / Default IOPS Security groups: TODO IAM instance profile: TODO Monitoring: Enable Ubuntu GPU P3AMI-ID: ID of the previously created AMI Instance type: P3.2xlarge Key-Pair-Name: mxnet_edge_berlin_shared_rsa Network type: VPC Network interfaces: - Volumes: EBS / 2000GB / GP2 / Delete on terminated: yes / Default IOPS Security groups: TODO IAM instance profile: TODO Monitoring: Enable Ubuntu GPU P3 8xlargeAMI-ID: ID of the previously created AMI Instance type: P3.8xlarge Key-Pair-Name: mxnet_edge_berlin_shared_rsa Network type: VPC Network interfaces: - Volumes: EBS / 2000GB / GP2 / Delete on terminated: yes / Default IOPS Security groups: TODO IAM instance profile: TODO Monitoring: Enable Windows CPUAMI-ID: ID of the previously created AMI Instance type: C5.18xlarge Key-Pair-Name: mxnet_edge_berlin_shared_rsa Network type: VPC Network interfaces: - Volumes: EBS / 500GB / GP2 / Delete on terminated: yes / Default IOPS Security groups: TODO IAM instance profile: TODO Monitoring: Enable Windows GPUAMI-ID: ID of the previously created AMI Instance type: G3.8xlarge Key-Pair-Name: mxnet_edge_berlin_shared_rsa Network type: VPC Network interfaces: - Volumes: EBS / 500GB / GP2 / Delete on terminated: yes / Default IOPS Security groups: TODO IAM instance profile: TODO Monitoring: Enable |
...
In order to manage a distributed Docker cache, we're leveraging Docker Hub.
Cache creation
To generate the cache, we're leveraging a Jenkins job that rebuilds the cache upon new commits to the master. To define which bucket to be used for cache publish and retrieval, set the following environment variable at Jenkins -> Manage Jenkins -> Configure System -> Global properties -> Environment variables. Create variables as follows and insert the variables from the secret created above:
Auto scaling
Auto scaling is done by a lambda function. The management of this function is done using the serverless framework.
...
npm install serverless
export PATH=".~/node_modules/.bin/:$PATH"
...
Expand | ||
---|---|---|
| ||
|
...
After creating the role, assign this role to the user created above by going to Jenkins->Manage and Assign Roles->Assign Roles. Enter the GitHub handle at 'User/group to add' and press 'Add'. Attention: This name is case-sensitive! Afterwards, assign it the autoscaling role.