What went well:

  • The CI infrastructure has become pretty stable and we didn't get blocked by infrastructure issues like before. There are one off issues like the dockcross issue which was caused by a dependency and I think Marco got dockcross to fix this.
  • Moving all the submodules to 3rd party directory really helped from the vote standpoint on the general list.
  • Community was very proactive in raising issues and helping with fixes.

What we must do better with next release:

  • From a CI standpoint, we should do a survey of all our dependencies and make sure we use only stable releases of software and not bleeding edge.
  • From a license standpoint, there should be a revisit of the rat-excludes file and we should make sure that the rat-excludes file is not excluding anything important.
  • One common pain-point is the "creative commons" license. We should add a test to our nightlies to grep for creative commons in our repo and flag it. In most cases, these may need to be fixed.
  • IIRC apache rat check is part of nightlies currently. IMO, It should be part of every PR that gets merged so that we fix the missing header issues and other license issues immediately.
  • There should be a clear set of guidelines on what is the criteria to release MXNet. Ideally, it should just be the CI passing assuming that our CI checks environments and platforms that our users run MXNet on. We can see that this is not the case yet, since CUDA 7.5 is currently not being tested and we couldn't catch it before the vote. We couldn't catch build failures on windows and mac before the vote either.
  • We should have a guideline on when we mark a feature experimental or not. Does every new feature start with being experimental ? When does a feature stop being experimental ? What kind of testing will ensure that new features are not marked experimental ? Is the experimental feature replacing any feature that was stable ? (In this case the new feature should not be experimental OR it should not replace but be an addition). IMO, we should discuss every major feature that we are going to add in the release notes in the dev list, and in certain cases have a vote on whether the feature needs to be experimental. Release manager can initiate these discussions when they are preparing the release note.
  • The increase of flaky tests have become a real issue for MXNet. Looking at past CI runs when doing a release it is difficult to say whether the code is stable or not. We need to make some dent with respect to flaky tests before the next release.
  • Community raised a lot of issues causing number of RC due to manual testing (lack of CI) - should be in CI and checked with each PR to check (each PR check or nightly)
  • Timely testing and early feedback to minimize RC count
  • Community support for release manager  - RM had to do a lot of things himself

What we should do better with next release(s):

  • Think about how the tight dependencies can be handled better. The problem currently is that the MXnet community is under Apache and the other dependencies are under dmlc and the pace of the two orgs are different. How can the MXNet community have better control over some of these tight dependencies like dmlc-core, nnvm, mshadow etc. ?
  • Create tarballs for community testing on regular basis (monthly)
  • No labels