Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This step requires people working on CI to store credentials to AWS secrets and test through a restricted Slave.

Progress

Publish pipeline

We successfully finished the Step 1 to 4. Current pipeline can be found in here.

Image Added

All dependencies are built in the first stage and placed in the deps/ folder. During the second stage, we tested the package on six platforms. Finally we inject the credentials and deploy the package on maven.

Static build instruction

We provide comprehensive build instruction for all users to build on 14.04.

License

We provide License file for all packages we published

Future works

change publish OS (Severe)

Scenario

As we know, Ubuntu 14.04 will no longer be supported by Canonical as of April 2019. Although they won't shut down their server, still no patches or upgrades will be happening. If we continue to rely on it for building MXNet there will be potential security risks when we publish the package. We also need to change a lot more to keep using the public version of Ubuntu 14.04

In this case, it is expected to use the next version of LTS system such as 16.04 to publish all of the packages. However, due to the test on the package built from there, Sheng found that the GLIBC version was not compatible with Cent OS 7 with the follow error:

/lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.23' not found (required by /tmp/mxnet6145590735071079280/libmxnet.so

GLIBC is a library that shipped with a fixed version in different system. It cannot be changed easily either to upgrade or downgrade as all packages that distributed with the system would potentially be unstable. In this case, we may lose Cent OS 7 and Amazon Linux support entirely if we decide to go with 16.04 build. We cannot static-link GLIBC since it is under GPL.

The followings is the list of GLIBC version that different system used:

14.04:
ubuntu@ip-172-31-19-57:~$ ldd --version
ldd (Ubuntu EGLIBC 2.19-0ubuntu6.14) 2.19


16.04
ubuntu@ip-172-31-37-210:~$ ldd --version
ldd (Ubuntu GLIBC 2.23-0ubuntu10) 2.23


Cent OS 7:
[centos@ip-172-31-13-196 ~]$ ldd --version
ldd (GNU libc) 2.17

Amazon Linux 1
$ ldd --version
ldd (GNU libc) 2.17

Amazon Linux 2
$ ldd--version
ldd (GNU libc) 2.26

Solution

In order to solve this issue, I propose several solutions listed below:

Build with different GLIBC
https://www.tldp.org/HOWTO/Glibc2-HOWTO-6.html It is still worthwhile to configure GLIBC in a system that all builds will be based on. This could be the ideal solution as we can still use the up--Will update here---

...

to-date system and make it compatible with all previous versions.

Still using 14.04
As mentioned above, we can still use 14.04 even if the supporting life-cycle is done. By adding an archive repository in the system would help to keep it available with apt-get install command. The safer way will be a docker image that contains all configuration in the system and we do not require apt-get install anymore to build the whole package. Moreover, 14.04 should not be used to do the publish as there could be potential security problem. Instead, a system before End of Life should be used specifically for the publish. In our case, only the backend build contains the requirement for the GLIBC version. Then we can keep PyPi and Maven publish out of the 14.04

Using Cent OS 7
As we still need to maintain the support on Cent OS and Amazon Linux system, the best solution is to choose an OS that still have support. In this case, Cent OS 7 could be the best one to migrate our building script to. However, all of the current GPU build scripts would be unavailable since nvidia does not provide the corresponding packages for rpm. In this case, we may need to go with NVIDIA Docker for Cent OS 7 and that only provide a limited versions of CUDA. Another problem we may see is the performance and stability difference on the backend we built since we downgrade GLIBC from 2.19 to 2.17

List of CUDA that NVIDIA supporting for Cent OS 7:
CUDA 10, 9.2, 9.1, 9.0, 8.0, 7.5

Drop the support for Cent OS 7 and Amazon Linux and keep it with 16.04 build
We still support build-from-source instructions for the users using these two systems.

gcc/gfortran version upgrade (Important)

Scenario

Currently, we use the GCC version 4.8 to build all of our dependencies in order to compatible with different CUDA versions. However, some future architecture require gcc 5.0 or above to build together such as Horovod. In this case, we need to make them compatible. There maybe unforeseen problems such as backward compatibility or stability issue.

Solution

We simply upgrade our GCC version from 4.8 to 5.x to make them compatible

static library version control (Improvement)

Scenario

As Frank Liu discovered in here: MXNet build dependencies, we are facing issues with the static library. The version chosen here are questionable and could not be easily maintained. Apart from that, some dependencies such as libzmq holds a GPL License and Apache Legal forbid that from using. In this case, we need to find an alternative way to build the dependencies here.

For example, we are currently using a beta version for lib-turbo. We use a non-stable openblas which should be downgrade to a stable version. In this case, we should choose a stable release for them for the best performance. We need to dig in and clarify the reasons behind our choices of different versions of the packages.

Solution

There is no ideal ways to automate this process and require manual check and benchmark to choose the best performance set. We also need to get rid of the usage of libzmq or consult with legal team to see if there are any alternatives. We need to take action on PS-LITE side to make the change.

Number of packages supporting (Good to have)

Scenario

We are currently the 'beast' on Pypi, along with Tensorflow that taken over 40% of the total package sizes. It is due to the matrix supporting of our packages. We offer a bunch of CUDA versions with combination of MKL as well as Python versions. It is a trade-off on widely version support and maintenance nightmare. There is no clear solution how we should handle this whether to reduce the number of packages we publish or keep it as it is.

Solution

We bump with CUDA.

FAQ

  • How to automate the pom file injection?

The GPG publish requires a user input section in the Maven file, there should be a way to automated this parwe use a python script to inject these credentials

  • License requirements of the various dependencies.

We will have to verify that the licenses of the dependencies that we plan to prepackage allow us to use their binaryCurrently, these license are added by this PR.