Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Flink 1.17.0 is officially released today!

Retrospective:

From Qingsheng Ren 

  • As discussed in the mailing list, we need to trigger a final patch version for 1.15 after releasing 1.17. Some cleanup steps need to be reviewed and changed, such as removing 1.15 data from svn, CI, flink-docker etc. See
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyFLINK-31570
  • I like the idea made by Matthias Pohl that we track TODOs for releasing on JIRA 👍 I used it as a checklist to make sure we don't miss anything. Also it helps collaborating, as we can divide works across RMs easily by assigning JIRA tickets.

From Matthias Pohl 

  • Google Meet might not be the best choice for the release sync. We need to be able to invite attendees even if the creator of the meeting isn't available (maybe try Zoom or even Jitsi as an OpenSource alternative instead?)

  • Release sync every 2 weeks and a switch to weekly after feature freeze felt reasonable
  • Slack worked well as a collaboration tool to document the monitoring tasks (#builds, #flink-dev-benchmarks) in a team with multiple release managers

  • The Slack Azure Pipeline bot seems to be buggy. It swallows some build failures. It's not a severe issue, though. We created #builds-debug to monitor whether it's happening consistently. The issue is covered in

    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyFLINK-30733

  • We experienced occasional issues in the manual steps of the release creation in the past (e.g. japicmp config was not properly pushed). Creating Jira issues for the release helped to make the release creation more transparent and made the steps more reviewable. Additionally, it helped to distribute subtasks to different people with Jira being the tool for documentation and synchronization. That's especially helpful when there is more than one person in charge of creating the release.

    • Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyFLINK-31146
    • RCs
      • Jira
        serverASF JIRA
        serverId5aa69414-a9e9-3523-82ec-879b028fb15b
        keyFLINK-31154
      • Jira
        serverASF JIRA
        serverId5aa69414-a9e9-3523-82ec-879b028fb15b
        keyFLINK-31578
      • Jira
        serverASF JIRA
        serverId5aa69414-a9e9-3523-82ec-879b028fb15b
        keyFLINK-31583
    • Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyFLINK-3115431562
    • Jira
      serverASF JIRA
      serverId5aa69414-a9e9-3523-82ec-879b028fb15b
      keyFLINK-31567
  • We had backports/merges without PRs happening by committers occasionally during the 1.17 release which broke master/release branches (probably, changes were done locally before merging which were not part of the PR to have a faster backport experience). It might make sense to remind everyone that this should be avoided. Not sure whether we want/can restrict that.

  • We observed a good response on fixing test instabilities by the end of the release cycle but had some long running issues earlier in the cycle which caused extra efforts on the release managers due to reoccurring test failures.

  • Release testing picked up “slowly”: Initially, we planned 2 weeks for release testing. But there was not really any progress (tickets being created and worked on) in the first week. In the end, we had to extend the phase by another week resulting in 3 instead of 2 weeks of release testing. I guess we could encourage the community to create release testing tasks earlier and label them properly to be able to monitor the effort. That would even enable us to do release testing for a certain feature after the feature is done and not necessarily only at the end of the release cycle.

  • Manual test data generation is tedious (
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyFLINK-31593
    ). But this should be fixed in 1.18 with
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyFLINK-27518
    being almost done.
  • We started creating documentation for release management. The goal is to collect what tasks are there to help support a Flink release to encourage newcomers to pick up the task.

From Leonard Xu 

  • We can keep RC0 (a non-votable one) in future releases, as an initial version for developers to validate, so that some issues could be found earlier and avoid repeatedly canceling and re-creating RCs. 

From Martijn Visser 

  • We should be more careful for commits without a PR / green CI, which brought some problem at the end of 1.17 release cycle. There might not be possible to totally ban this, but we could give an reminder to committers.