Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As soon as you enter one of these directory names, a terraform template launches an instance in your AWS account, it will execute the necessary setup logic and then stop the instance in order to allow you to continue with the launch template creation process. Warning: do not stop the instance manually! Please note that you will need a named AWS CLI profile called 'mxnet-ci-dev' or this operation is going to fail.

...

On Ubuntu, no additional steps are necessary after executing the create-slave shellscriptshell script. Just create an AMI in the EC2 console after the instance has reached the Stopped-state. Warning: do not stop the instance manually as it leaves it in an inconsistent state that will be baked into the launch template.

Windows

On Windows, there is currently no process to set up a slave from scratch and the above shellscript is not applicable.

...

Expand
titleConfigurations

Ubuntu CPU

AMI-ID: ID of the previously created AMI

Instance type: C5.18xlarge

Key-Pair-Name: mxnet_edge_berlin_shared_rsa

Network type: VPC

Network interfaces: -

Volumes: EBS / 400GB / GP2 / Delete on terminated: yes / Default IOPS

Security groups: TODO

IAM instance profile: TODO

Monitoring: Enable

Ubuntu GPU

AMI-ID: ID of the previously created AMI

Instance type: G3.8xlarge

Key-Pair-Name: mxnet_edge_berlin_shared_rsa

Network type: VPC

Network interfaces: -

Volumes: EBS / 2000GB / GP2 / Delete on terminated: yes / Default IOPS

Security groups: TODO

IAM instance profile: TODO

Monitoring: Enable


Ubuntu GPU P3

AMI-ID: ID of the previously created AMI

Instance type: P3.2xlarge

Key-Pair-Name: mxnet_edge_berlin_shared_rsa

Network type: VPC

Network interfaces: -

Volumes: EBS / 2000GB / GP2 / Delete on terminated: yes / Default IOPS

Security groups: TODO

IAM instance profile: TODO

Monitoring: Enable


Ubuntu GPU P3 8xlarge

AMI-ID: ID of the previously created AMI

Instance type: P3.8xlarge

Key-Pair-Name: mxnet_edge_berlin_shared_rsa

Network type: VPC

Network interfaces: -

Volumes: EBS / 2000GB / GP2 / Delete on terminated: yes / Default IOPS

Security groups: TODO

IAM instance profile: TODO

Monitoring: Enable


Windows CPU

AMI-ID: ID of the previously created AMI

Instance type: C5.18xlarge

Key-Pair-Name: mxnet_edge_berlin_shared_rsa

Network type: VPC

Network interfaces: -

Volumes: EBS / 500GB / GP2 / Delete on terminated: yes / Default IOPS

Security groups: TODO

IAM instance profile: TODO

Monitoring: Enable


Windows GPU

AMI-ID: ID of the previously created AMI

Instance type: G3.8xlarge

Key-Pair-Name: mxnet_edge_berlin_shared_rsa

Network type: VPC

Network interfaces: -

Volumes: EBS / 500GB / GP2 / Delete on terminated: yes / Default IOPS

Security groups: TODO

IAM instance profile: TODO

Monitoring: Enable

...

In order to manage a distributed Docker cache, we're leveraging Docker Hub

Cache creation

 


To generate the cache, we're leveraging a Jenkins job that rebuilds the cache upon new commits to the master. To define which bucket to be used for cache publish and retrieval, set the following environment variable at Jenkins -> Manage Jenkins -> Configure System -> Global properties -> Environment variables. Create variables as follows and insert the variables from the secret created above:

 


Auto scaling


Auto scaling is done by a lambda function. The management of this function is done using the serverless framework.

...

npm install serverless
export PATH=".~/node_modules/.bin/:$PATH"

...

Expand
titleRole permission
  • Overall:
    • Read
  • Agent:
    • Configure
    • Connect
    • Create
    • Delete
    • Disconnect
    • Provision
  • Job:
    • Discover
    • Read

...


After creating the role, assign this role to the user created above by going to Jenkins->Manage and Assign Roles->Assign Roles. Enter the GitHub handle at 'User/group to add' and press 'Add'. Attention: This name is case-sensitive! Afterwards, assign it the autoscaling role.