Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58692

Change in treatment of Success - Stable vs. Unstable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Critical Critical
    • core

      We've recently noticed on our Jenkins instance (at build.kde.org) that builds which are unstable are no longer considered "Successful" by Jenkins.

      This means that all of our views are now broken, because we've used "Successful" as meaning it successfully built (even if tests failed). Our expectations appear to align with the Jenkins terminology guide ( https://wiki.jenkins.io/display/JENKINS/Terminology )

      This behaviour appeared sometime after Jenkins 2.184, and can be viewed at https://build.kde.org/job/Applications/view/Everything%20-%20stable-kf5-qt5/job/kopete/job/stable-kf5-qt5%20SUSEQt5.12/

      (Note that only Build #1 is considered Successful, even though all builds of that job had the result of being Unstable. The correct behaviour in this instance should be for the latest Successful activity for that job to be Build #4 - as it did complete successfully, even if it is unstable)

          [JENKINS-58692] Change in treatment of Success - Stable vs. Unstable

          Ben Cooksley added a comment -

          Over the past week we've started receiving additional complaints that a number of projects were not getting builds triggered. Examination of the Polling logs would show something like the following:

          Started on Aug 7, 2019 8:33:46 AM
          Using strategy: Default
          [poll] Last Built Revision: Revision 597ffa6a5e89b7e05180ccb3517973b3867d72fa (refs/remotes/origin/Applications/19.08)
          No credentials specified
           > git --version # timeout=10
           > git ls-remote -h git://anongit.kde.org/konsole # timeout=10
          Found 54 remote heads on git://anongit.kde.org/konsole
          [poll] Latest remote head revision on refs/heads/Applications/19.04 is: 550cd447bc4bb79cc8920a147e84f7afb35406d6 - already built by 2
          no polling baseline in /home/jenkins/workspace/Applications/konsole/stable-kf5-qt5 SUSEQt5.12 on Docker Swarm-353db413f644
          no polling baseline in /home/jenkins/workspace/Applications/konsole/stable-kf5-qt5 SUSEQt5.12 on Docker Swarm-353db413f644
          no polling baseline in /home/jenkins/workspace/Applications/konsole/stable-kf5-qt5 SUSEQt5.12 on Docker Swarm-353db413f644
          no polling baseline in /home/jenkins/workspace/Applications/konsole/stable-kf5-qt5 SUSEQt5.12 on Docker Swarm-353db413f644
          no polling baseline in /home/jenkins/workspace/Applications/konsole/stable-kf5-qt5 SUSEQt5.12 on Docker Swarm-353db413f644
          Done. Took 0.1 sec
          No changes
          

          Examining the Jenkins Core changelog indicated that maintenance of symlinks within Jenkins Core for jobs/projects had been removed and transferred to a plugin. Following installation of that plugin and running jobs again, we've found that correct functionality (both in terms of the branches being polled by Jenkins and the views being updated).

          As such this now appears to be a regression, and given it prevents Git polling from working properly in certain cases (when it's managed as part of a Declarative Pipeline) it actually breaks core functionality of Jenkins.

           

          Ben Cooksley added a comment - Over the past week we've started receiving additional complaints that a number of projects were not getting builds triggered. Examination of the Polling logs would show something like the following: Started on Aug 7, 2019 8:33:46 AM Using strategy: Default [poll] Last Built Revision: Revision 597ffa6a5e89b7e05180ccb3517973b3867d72fa (refs/remotes/origin/Applications/19.08) No credentials specified > git --version # timeout=10 > git ls-remote -h git: //anongit.kde.org/konsole # timeout=10 Found 54 remote heads on git: //anongit.kde.org/konsole [poll] Latest remote head revision on refs/heads/Applications/19.04 is: 550cd447bc4bb79cc8920a147e84f7afb35406d6 - already built by 2 no polling baseline in /home/jenkins/workspace/Applications/konsole/stable-kf5-qt5 SUSEQt5.12 on Docker Swarm-353db413f644 no polling baseline in /home/jenkins/workspace/Applications/konsole/stable-kf5-qt5 SUSEQt5.12 on Docker Swarm-353db413f644 no polling baseline in /home/jenkins/workspace/Applications/konsole/stable-kf5-qt5 SUSEQt5.12 on Docker Swarm-353db413f644 no polling baseline in /home/jenkins/workspace/Applications/konsole/stable-kf5-qt5 SUSEQt5.12 on Docker Swarm-353db413f644 no polling baseline in /home/jenkins/workspace/Applications/konsole/stable-kf5-qt5 SUSEQt5.12 on Docker Swarm-353db413f644 Done. Took 0.1 sec No changes Examining the Jenkins Core changelog indicated that maintenance of symlinks within Jenkins Core for jobs/projects had been removed and transferred to a plugin. Following installation of that plugin and running jobs again, we've found that correct functionality (both in terms of the branches being polled by Jenkins and the views being updated). As such this now appears to be a regression, and given it prevents Git polling from working properly in certain cases (when it's managed as part of a Declarative Pipeline) it actually breaks core functionality of Jenkins.  

          Mark Waite added a comment -

          bcooksley I'm not understanding how the change from using symlinks has affected the git polling. Could you provide more details so that I can better understand. We may need help from jglick on differences related to the removal of symlinks.

          Mark Waite added a comment - bcooksley I'm not understanding how the change from using symlinks has affected the git polling. Could you provide more details so that I can better understand. We may need help from jglick on differences related to the removal of symlinks.

          Ben Cooksley added a comment -

          markewaite I'm not sure how it managed to make an impact either, however the behaviour we were seeing prior to the restoration of the maintenance of the symlinks by the plugin was the behaviour noted above - namely, that changes to the branch name weren't being picked up.

          Interestingly, it picked up that the last successful build was 'Applications/19.08', yet for reasons unknown continued to poll an older branch - 'Applications/19.04'.

          You can find copies of the Pipeline templates, along with the Job DSL scripts we use to provision all the jobs on our Jenkins instance at https://invent.kde.org/sysadmin/ci-tooling/tree/master/ (running of helpers/gather-jobs.py is required prior to trying to evaluate the dsl/*.groovy scripts)

          To provide a bit of background, we reuse the same jobs when the stable branches for our software changes, and just update the job to refer to the new branches as needed. This functionality has to date worked perfectly reliably, until the release of Jenkins 2.185/2.186 (we jumped straight from 2.184 to 2.186 due to the Trilead SSH issues in 2.185).

          The solution for us was to install the Symlink plugin, after which normal functionality was restored with 2.186+

          Ben Cooksley added a comment - markewaite I'm not sure how it managed to make an impact either, however the behaviour we were seeing prior to the restoration of the maintenance of the symlinks by the plugin was the behaviour noted above - namely, that changes to the branch name weren't being picked up. Interestingly, it picked up that the last successful build was 'Applications/19.08', yet for reasons unknown continued to poll an older branch - 'Applications/19.04'. You can find copies of the Pipeline templates, along with the Job DSL scripts we use to provision all the jobs on our Jenkins instance at  https://invent.kde.org/sysadmin/ci-tooling/tree/master/  (running of helpers/gather-jobs.py is required prior to trying to evaluate the dsl/*.groovy scripts) To provide a bit of background, we reuse the same jobs when the stable branches for our software changes, and just update the job to refer to the new branches as needed. This functionality has to date worked perfectly reliably, until the release of Jenkins 2.185/2.186 (we jumped straight from 2.184 to 2.186 due to the Trilead SSH issues in 2.185). The solution for us was to install the Symlink plugin, after which normal functionality was restored with 2.186+

          Jesse Glick added a comment - - edited

          So this is hypothesized to be a regression from JENKINS-37862? I cannot think offhand of any reason why that would be so; workflow-job (the source of the no polling baseline in … message noted above) does not rely on the existence of symlinks to resolve logical permalinks. The change in question did change how permalinks are cached, so as not to read symlinks for this purpose (now a plain text file is used instead), but the build-symlink plugin does not override this new mechanism, so it should not be able to fix any regression from that aspect. The existence of the RunListener in that plugin could perhaps be forcing a cache update that would not otherwise occur, but I do not see how that could be so either, since PeepholePermalink already updates the cache for every standard permalink at the end of every build.

          Is there any known way to reproduce this bug, from scratch, using minimal instructions?

          Jesse Glick added a comment - - edited So this is hypothesized to be a regression from JENKINS-37862 ? I cannot think offhand of any reason why that would be so; workflow-job (the source of the no polling baseline in … message noted above) does not rely on the existence of symlinks to resolve logical permalinks. The change in question did change how permalinks are cached, so as not to read symlinks for this purpose (now a plain text file is used instead), but the build-symlink plugin does not override this new mechanism, so it should not be able to fix any regression from that aspect. The existence of the RunListener in that plugin could perhaps be forcing a cache update that would not otherwise occur, but I do not see how that could be so either, since PeepholePermalink already updates the cache for every standard permalink at the end of every build. Is there any known way to reproduce this bug, from scratch, using minimal instructions?

          Ben Cooksley added a comment -

          I'm afraid i've not attempted to reproduce this bug, and experimenting with returning our production systems to a potentially broken state isn't really an option.

          The only thing I could recommend in this case would be using https://invent.kde.org/sysadmin/ci-tooling/blob/master/pipeline-templates/SUSEQt5.12.template as a starting point.

          The only Stage that matters in that job from the perspective of this bug is the Checkout Sources stage, so you can probably delete the rest without too much impact (although it may be worth forcing the build to always be UNSTABLE)

          It is worth noting that we were also experiencing issues with job runs not being considered Successful by Jenkins unless they were also Stable, which impacted views as noted above. As only some jobs were experiencing the issue of not having the correct branches polled, it is possible that these two issues are somehow related - especially given they both disappeared when the plugin is installed.

          Is it possible that the plugin is causing side effects within Jenkins - so it isn't the symlinks themselves that matter - but rather something else that it causes within Jenkins when performing the symlink update - which resolves our problem here?

          The additional Groovy declarations you'll need to include to use the above template are as follows:

          ```def repositoryUrl = "git://anongit.kde.org/konsole"
          def browserUrl = "https://cgit.kde.org/konsole.git"
          def branchToBuild = "master"
          def productName = "Applications"
          def projectName = "konsole"
          def branchGroup = "kf5-qt5"
          def currentPlatform = "SUSEQt5.12"
          def ciEnvironment = "production"
          def buildFailureEmails = "konsole-devel@kde.org"
          def unstableBuildEmails = ""```

          Ben Cooksley added a comment - I'm afraid i've not attempted to reproduce this bug, and experimenting with returning our production systems to a potentially broken state isn't really an option. The only thing I could recommend in this case would be using  https://invent.kde.org/sysadmin/ci-tooling/blob/master/pipeline-templates/SUSEQt5.12.template  as a starting point. The only Stage that matters in that job from the perspective of this bug is the Checkout Sources stage, so you can probably delete the rest without too much impact (although it may be worth forcing the build to always be UNSTABLE) It is worth noting that we were also experiencing issues with job runs not being considered Successful by Jenkins unless they were also Stable, which impacted views as noted above. As only some jobs were experiencing the issue of not having the correct branches polled, it is possible that these two issues are somehow related - especially given they both disappeared when the plugin is installed. Is it possible that the plugin is causing side effects within Jenkins - so it isn't the symlinks themselves that matter - but rather something else that it causes within Jenkins when performing the symlink update - which resolves our problem here? The additional Groovy declarations you'll need to include to use the above template are as follows: ```def repositoryUrl = "git://anongit.kde.org/konsole" def browserUrl = "https://cgit.kde.org/konsole.git" def branchToBuild = "master" def productName = "Applications" def projectName = "konsole" def branchGroup = "kf5-qt5" def currentPlatform = "SUSEQt5.12" def ciEnvironment = "production" def buildFailureEmails = "konsole-devel@kde.org" def unstableBuildEmails = ""```

          Jesse Glick added a comment -

          Is it possible that the plugin is causing side effects within Jenkins - so it isn't the symlinks themselves that matter - but rather something else that it causes within Jenkins when performing the symlink update - which resolves our problem here?

          That would be my guess, which is why I suspect that the exact sequence of operations matters.

          Jesse Glick added a comment - Is it possible that the plugin is causing side effects within Jenkins - so it isn't the symlinks themselves that matter - but rather something else that it causes within Jenkins when performing the symlink update - which resolves our problem here? That would be my guess, which is why I suspect that the exact sequence of operations matters.

          Ben Cooksley added a comment -

          In this case our sequence could broadly be summarised as:

          1) Jobs created, using the Pipeline templates as noted above, with an initial branch of Applications/19.04
          2) Jobs are then run, at which point Jenkins becomes aware that the jobs are tracking Applications/19.04
          3) Jobs are subsequently updated by re-running the DSL Job, which updates the Pipeline templates to refer to Applications/19.08
          4) Jobs are run again manually, which should have updated Jenkins to make it aware that Applications/19.08 should now be tracked
          5) Subsequent polling results in Applications/19.04 still being polled...

          Our Pipeline templates (aside from the additional declarations i've posted above) have remained broadly the same for some time now and haven't changed much in quite some time.

           

          Ben Cooksley added a comment - In this case our sequence could broadly be summarised as: 1) Jobs created, using the Pipeline templates as noted above, with an initial branch of Applications/19.04 2) Jobs are then run, at which point Jenkins becomes aware that the jobs are tracking Applications/19.04 3) Jobs are subsequently updated by re-running the DSL Job, which updates the Pipeline templates to refer to Applications/19.08 4) Jobs are run again manually, which should have updated Jenkins to make it aware that Applications/19.08 should now be tracked 5) Subsequent polling results in Applications/19.04 still being polled... Our Pipeline templates (aside from the additional declarations i've posted above) have remained broadly the same for some time now and haven't changed much in quite some time.  

          Jesse Glick added a comment -

          I was unable to reproduce such a problem in a new installation of 2.186 using a very simple setup. I made a local Git repo with one file and two branches a and b. I made a Pipeline like

          node {
              git url: '/…/JENKINS-58692-repo', branch: 'a'
              sh 'cat file'
          }
          

          with SCM polling set to happen every minute. I did an initial build, then made a change to the a branch and waited. Build 2 ran as expected. Then I changed the branch in the script to b, ran another manual build, edited the b branch and waited. Build 4 ran as expected. The SCM polling log showed the expected things. The permalinks file had the expected contents at the end:

          lastFailedBuild -1
          lastStableBuild 4
          lastSuccessfulBuild 4
          lastUnstableBuild -1
          lastUnsuccessfulBuild -1
          

          Adding

          currentBuild.result = 'UNSTABLE'
          

          to the end of the script and doing it all over did not break anything; now the permalinks are

          lastFailedBuild -1
          lastStableBuild 4
          lastSuccessfulBuild 8
          lastUnstableBuild 8
          lastUnsuccessfulBuild 8
          

          as expected. (Yes, lastUnsuccessfulBuild is confusing: JENKINS-21706.)

          I was about to ask whether you would be willing to install an experimental build which merely adds more detailed messages to the SCM polling log that might help narrow down the problem, but

          experimenting with returning our production systems to a potentially broken state isn't really an option

          Of course; but do you have some sort of staging server available where a mirror of at least a representative subset of jobs could be installed, without interfering with production workflows? If not, and markewaite has no further ideas for reproducing, then I am afraid we would need to close this in the absence of any similar reports.

          Jesse Glick added a comment - I was unable to reproduce such a problem in a new installation of 2.186 using a very simple setup. I made a local Git repo with one file and two branches a and b . I made a Pipeline like node { git url: '/…/JENKINS-58692-repo' , branch: 'a' sh 'cat file' } with SCM polling set to happen every minute. I did an initial build, then made a change to the a branch and waited. Build 2 ran as expected. Then I changed the branch in the script to b , ran another manual build, edited the b branch and waited. Build 4 ran as expected. The SCM polling log showed the expected things. The permalinks file had the expected contents at the end: lastFailedBuild -1 lastStableBuild 4 lastSuccessfulBuild 4 lastUnstableBuild -1 lastUnsuccessfulBuild -1 Adding currentBuild.result = 'UNSTABLE' to the end of the script and doing it all over did not break anything; now the permalinks are lastFailedBuild -1 lastStableBuild 4 lastSuccessfulBuild 8 lastUnstableBuild 8 lastUnsuccessfulBuild 8 as expected. (Yes, lastUnsuccessfulBuild is confusing: JENKINS-21706 .) I was about to ask whether you would be willing to install an experimental build which merely adds more detailed messages to the SCM polling log that might help narrow down the problem, but experimenting with returning our production systems to a potentially broken state isn't really an option Of course; but do you have some sort of staging server available where a mirror of at least a representative subset of jobs could be installed, without interfering with production workflows? If not, and markewaite has no further ideas for reproducing, then I am afraid we would need to close this in the absence of any similar reports.

          Mark Waite added a comment -

          I don't have any further ideas to offer.

          Mark Waite added a comment - I don't have any further ideas to offer.

          Ben Cooksley added a comment -

          Unfortunately we don't have a test environment for this (due to the size of our Jenkins instance and the resources involved in operating even a small number of the jobs on it) and it would need to run jobs in order to try to reproduce this.

          I'll report back if this issue reoccurs, but for now this can be closed. Thanks for investigating.

           

          Ben Cooksley added a comment - Unfortunately we don't have a test environment for this (due to the size of our Jenkins instance and the resources involved in operating even a small number of the jobs on it) and it would need to run jobs in order to try to reproduce this. I'll report back if this issue reoccurs, but for now this can be closed. Thanks for investigating.  

            Unassigned Unassigned
            bcooksley Ben Cooksley
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: