Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55075

Pipeline: If job fails it will run again on next poll.

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • p4-plugin
    • P4-Plugin 1.9.3
      Jenkins 2.138.1

      When a job detected by polling fails, it will run again the next time polling runs even when there are no new changes since last poll. For example.

      (1) Build works.
      (2) No new changes = Next poll no execution.
      (3) New change made = Next poll execution.
      (4) Build fails = Next poll execution.
      (5) Build fails = Next poll execution.
      (6) Build works = Next poll no execution.

       

      Reproductions steps:

      (1) Create a pipeline job with the following Jenkinsfile and set it up to poll every minutes (or use the PollNow plugin):

      pipeline {  agent { label 'master' }
        
        stages {
          stage("Run script") {
            steps {
              script {
              def failJob = input(message: 'Do you wish this job to work?', 
      	                ok: 'OK', 
                              parameters: [booleanParam(defaultValue: true, 
                              description: 'If you want this to fail untick yes and click YES',
                              name: 'Yes?')])	   
                 echo "Input Result:" + failJob
                 if (failJob == false)
                 {
                    sh 'Arghhhhh'
                 }
      	   echo "P4_CLIENT is:"
      	   echo env.P4_CLIENT
              }
            }
          }
        }
      }
      
      

      (2) Submit a changelist to the polled view.

      (3) Wait for job to run, go into the console for the job execution and click on 'Input requested', Untick 'Yes?' then click 'OK'. Job will be marked as 'FAILURE'.

      (4) Wait for next poll or click 'Poll Now'. Job will trigger again.

      (5) On 'Input requested just click 'OK'. Job will be marked as 'SUCCESS'.

      (6) Wait for next poll or click 'Poll Now'. Job will NOT trigger again.

       

       

          [JENKINS-55075] Pipeline: If job fails it will run again on next poll.

          Karl Wirth added a comment -

          FYI - msmeeth, p4paul

          Karl Wirth added a comment - FYI - msmeeth , p4paul

          Nancy Belser added a comment -

          We are experiencing this problem as well. We cannot execute builds so it is a critical problem.

          Nancy Belser added a comment - We are experiencing this problem as well. We cannot execute builds so it is a critical problem.

          Karl Wirth added a comment -

          Hi nbelser - Can you confirm what you mean by you cannot execute builds? Also what is the reason the builds are failling? Is it a transient infrastructure problem or bad code?

          Karl Wirth added a comment - Hi nbelser - Can you confirm what you mean by you cannot execute builds? Also what is the reason the builds are failling? Is it a transient infrastructure problem or bad code?

          Nancy Belser added a comment -

          Hi Karl,

          We had to shut down our jenkins instance because we cannot get the pipeline project to stop constantly polling perforce and attempting to build with no changes.

          Nancy Belser added a comment - Hi Karl, We had to shut down our jenkins instance because we cannot get the pipeline project to stop constantly polling perforce and attempting to build with no changes.

          Paul Allen added a comment -

          Hi Nancy,

          It seems that on Karl's system a bad syncID was causing the polling logic to think there were new changes.  Please can you recreate the Jenkins Job (e.g. create a new job and delete or disable the one continuously polling).

          If you have a support ticket open with Karl, please can you email a copy of your build.xml file for the last failing build and your config.xml.  We will review the data tomorrow and see if you were experiencing the same issue Karl had when upgrading from 1.9.3 -> 1.9.6

          Kind regards,

          Paul

          Paul Allen added a comment - Hi Nancy, It seems that on Karl's system a bad syncID was causing the polling logic to think there were new changes.  Please can you recreate the Jenkins Job (e.g. create a new job and delete or disable the one continuously polling). If you have a support ticket open with Karl, please can you email a copy of your build.xml file for the last failing build and your config.xml.  We will review the data tomorrow and see if you were experiencing the same issue Karl had when upgrading from 1.9.3 -> 1.9.6 Kind regards, Paul

          Nancy Belser added a comment -

          Hi Paul, this is happening for 2 pipeline projects which we cannot disable. Do you suggest we recreate the entire pipeline project?

          Nancy Belser added a comment - Hi Paul, this is happening for 2 pipeline projects which we cannot disable. Do you suggest we recreate the entire pipeline project?

          Paul Allen added a comment - - edited

          Are you referring to Pipeline Jobs or MultiBranch?  If a Pipeline Job then it is the build history that is potentially causing the issue; deleting the job and then using the same name, credentials and settings worked for Karl.  If you don't want to loose the build history then create a new job, but this will use a new Perforce workspace and will resync all the files (which could be expensive on a large project).

          Please work with Karl via support and he can guide you through the steps and advise if it is best to delete or duplicate the job based on your situation.  In the meantime please send us the build.xml and config.xml data.

          Paul Allen added a comment - - edited Are you referring to Pipeline Jobs or MultiBranch?  If a Pipeline Job then it is the build history that is potentially causing the issue; deleting the job and then using the same name, credentials and settings worked for Karl.  If you don't want to loose the build history then create a new job, but this will use a new Perforce workspace and will resync all the files (which could be expensive on a large project). Please work with Karl via support and he can guide you through the steps and advise if it is best to delete or duplicate the job based on your situation.  In the meantime please send us the build.xml and config.xml data.

          Nancy Belser added a comment -

          This is actually a multibranch project.  We downgraded to p4 version 1.9.5 an hour or so ago and it has stabilized somewhat, but there are still some jobs retriggering. When our developer updates me I will pass on the info.

          Nancy Belser added a comment - This is actually a multibranch project.  We downgraded to p4 version 1.9.5 an hour or so ago and it has stabilized somewhat, but there are still some jobs retriggering. When our developer updates me I will pass on the info.

          Brad Wehmeier added a comment -

          I'm the developer Nancy mentioned. I was able to fix our deployment thanks to the hints here.

          As Nancy mentioned it was a MultiBranch pipeline project and we had downgraded to 1.9.5 but that did not fix the issue. I deleted the job history for all of the branches and triggered a multibranch pipeline scan to ensure branch configurations were up to date. Then each branch build once and everything was back to normal. We remain at version 1.9.5.

          In essence, I deleted the project and re-created it, but this way I was able to preserve the configuration and our build number history.

           

          Brad Wehmeier added a comment - I'm the developer Nancy mentioned. I was able to fix our deployment thanks to the hints here. As Nancy mentioned it was a MultiBranch pipeline project and we had downgraded to 1.9.5 but that did not fix the issue. I deleted the job history for all of the branches and triggered a multibranch pipeline scan to ensure branch configurations were up to date. Then each branch build once and everything was back to normal. We remain at version 1.9.5. In essence, I deleted the project and re-created it, but this way I was able to preserve the configuration and our build number history.  

          Paul Allen added a comment -

          Hi Brad, thank you for looking into this with Nancy.  Were you able to take a copy of the failing build.xml file and config.xml before you deleted the Job? 

          I suspect the issue may be a result of changing the Global Library syncID in 1.9.2->1.9.3 from an encoded depot path to a UUID. I'd like to try and find the cause so I can potentially cleanup the old data to avoid users having to delete and re-create Jobs.

          Paul Allen added a comment - Hi Brad, thank you for looking into this with Nancy.  Were you able to take a copy of the failing build.xml file and config.xml before you deleted the Job?  I suspect the issue may be a result of changing the Global Library syncID in 1.9.2->1.9.3 from an encoded depot path to a UUID. I'd like to try and find the cause so I can potentially cleanup the old data to avoid users having to delete and re-create Jobs.

          Brad Wehmeier added a comment -

          Sorry Paul, I was not able to grab a backup and our older builds with the previous version of the plugin had already been purged by the build retention policy. If I'm able to reproduce this again I will definitely grab those files for you.

          Brad Wehmeier added a comment - Sorry Paul, I was not able to grab a backup and our older builds with the previous version of the plugin had already been purged by the build retention policy. If I'm able to reproduce this again I will definitely grab those files for you.

          Paul Allen added a comment -

          No problem, I am adding some more debug to help track this issue.  Another possible cause is a duplicate syncID, which I now test and raise a warning for.

          I'll add the debug code against this issue and mark it as closed for the moment.  Feel free to reopen later on if the problem reoccurs.

          Paul Allen added a comment - No problem, I am adding some more debug to help track this issue.  Another possible cause is a duplicate syncID, which I now test and raise a warning for. I'll add the debug code against this issue and mark it as closed for the moment.  Feel free to reopen later on if the problem reoccurs.

          Karl Wirth added a comment -

          Hi wbasu - Ping'ing this issue as it is a big problem for our end users.

          Karl Wirth added a comment - Hi wbasu - Ping'ing this issue as it is a big problem for our end users.

          Nancy Belser added a comment -

          This continues to happen to us.   Should we create a separate ticket?

          Nancy Belser added a comment - This continues to happen to us.   Should we create a separate ticket?

          Karl Wirth added a comment -

          Hi nbelser - Thanks but no. This is the bug tracking system so the developers should be working off this job and I have highlighted this one to them.

          However note that if you are a supported Perforce customer and want us to look into specifics of your problem you can raise a seperate ticket by emailing 'support@perforce.com'. However this Jenkins Jira system is the system I use to log bugs and communicate with the developers.

          Karl Wirth added a comment - Hi nbelser - Thanks but no. This is the bug tracking system so the developers should be working off this job and I have highlighted this one to them. However note that if you are a supported Perforce customer and want us to look into specifics of your problem you can raise a seperate ticket by emailing 'support@perforce.com'. However this Jenkins Jira system is the system I use to log bugs and communicate with the developers.

          Karl Wirth added a comment -

          Note for Dev - Please can we look at this on as a priority.

          Karl Wirth added a comment - Note for Dev - Please can we look at this on as a priority.

          cbopardikar please look into it.

          W Basu Perforce added a comment - cbopardikar please look into it.

          Karl Wirth added a comment -

          After internal discussion we know of at least a few customers that would want the build to retry on failure. An ideal solution is therefore if we could have a tickbox against a job to switch between rebuild on OS/Jenkins fail and ignore fail.

          Karl Wirth added a comment - After internal discussion we know of at least a few customers that would want the build to retry on failure. An ideal solution is therefore if we could have a tickbox against a job to switch between rebuild on OS/Jenkins fail and ignore fail.

          Mykola Ulianytskyi added a comment - - edited

          After internal discussion we know of at least a few customers that would want the build to retry on failure.

          An ideal solution is therefore if we could have a tickbox against a job to switch between rebuild on OS/Jenkins fail and ignore fail.

          SCM Plugin should trigger build only once if new changes found
          and never retry it on failure because build loops occur.

          All existing SCM plugins (Git, CVS, SVN, etc) don't retry builds on failure.
           
          Users can use built-in Jenkins features for:
           
          1) Steps Retry:

          stage('Deploy') {
              steps {
                  retry(3) {
                     sh './deploy.sh'
                  }
              }
          }
          

          https://jenkins.io/doc/pipeline/tour/running-multiple-steps/#timeouts-retries-and-more
           

          2) Entire Pipeline Retry:

          pipeline {
              options { 
                  retry(3) 
              }
          ...

          https://jenkins.io/doc/book/pipeline/syntax/#options

          Mykola Ulianytskyi added a comment - - edited After internal discussion we know of at least a few customers that would want the build to retry on failure. An ideal solution is therefore if we could have a tickbox against a job to switch between rebuild on OS/Jenkins fail and ignore fail. SCM Plugin should trigger build only once if new changes found and never retry it on failure because build loops occur. All existing SCM plugins (Git, CVS, SVN, etc) don't retry builds on failure.   Users can use built-in Jenkins features for:   1) Steps Retry: stage( 'Deploy' ) { steps { retry(3) { sh './deploy.sh' } } } https://jenkins.io/doc/pipeline/tour/running-multiple-steps/#timeouts-retries-and-more   2) Entire Pipeline Retry: pipeline { options { retry(3) } ... https://jenkins.io/doc/book/pipeline/syntax/#options

          Brad Wehmeier added a comment - - edited

          Customers who what to retry failed builds should use a Jenkins plugin designed for that. e.g. https://wiki.jenkins.io/display/JENKINS/Naginator+Plugin

          If you still decide a checkbox is necessary, please set the default it to NOT trigger another build on failure since that is the behavior of other SCM plugins for Jenkins.

          Brad Wehmeier added a comment - - edited Customers who what to retry failed builds should use a Jenkins plugin designed for that. e.g.  https://wiki.jenkins.io/display/JENKINS/Naginator+Plugin If you still decide a checkbox is necessary, please set the default it to NOT trigger another build on failure since that is the behavior of other SCM plugins for Jenkins.

          Karl Wirth added a comment -

          Hi bradleywehmeier and lystor. Thank you very much. That is great feedback.
          FYI - cbopardikar

          Karl Wirth added a comment - Hi bradleywehmeier and lystor . Thank you very much. That is great feedback. FYI - cbopardikar

          Alisdair Robertson added a comment - - edited

          I've got an issue with the change to polling behaviour that appears to have been altered in this ticket.

          It seems that now in cases where polling shows no changes since the previous build we report no changes (good) but we also report no changes when there was a polling error, and don't take any action to correct or notify about the polling error (bad). 

          This has caused a few branches of mine that are set to poll nightly to not be built for a few days before we noticed, with polling logs that look like this for each workspace used in the build (we use multiple p4sync steps in parallel stages):

           
          P4: Polling on: master with:<workspace name>
          P4: Polling: No changes in previous build.
          P4: Polling error; no previous change.

          From looking at the code it looks to me as though this is caused by no changes being attached to the previous build for some unknown reason so https://github.com/jenkinsci/p4-plugin/blob/master/src/main/java/org/jenkinsci/plugins/p4/tagging/TagAction.java#L300 would return an empty arraylist. However I haven't actually done any debugging.

          While I can understand that builds being constantly triggered just because of a polling error is not desirable, I find it even more undesirable to almost silently skip triggering a build indefinitely without an administrator being made aware that there is a polling issue rather than just no changes being made in the workspace.

          Could there be limited (re)triggering on polling failures or an notification system so that an administrator can swoop in to diagnose issues or manually trigger the job anew as necessary in the event of polling failures?

          Note that I explicitly do not want to retry failed builds, because as far as I can tell from logs, although the build prior to the polling error was a failure in compilation, the perforce syncs in the job all completed successfully.

          Alisdair Robertson added a comment - - edited I've got an issue with the change to polling behaviour that appears to have been altered in this ticket. It seems that now in cases where polling shows no changes since the previous build we report no changes (good) but we also report no changes when there was a polling error, and don't take any action to correct or notify about the polling error (bad).  This has caused a few branches of mine that are set to poll nightly to not be built for a few days before we noticed, with polling logs that look like this for each workspace used in the build (we use multiple p4sync steps in parallel stages):   P4: Polling on: master with:<workspace name> P4: Polling: No changes in previous build. P4: Polling error; no previous change. From looking at the code it looks to me as though this is caused by no changes being attached to the previous build for some unknown reason so https://github.com/jenkinsci/p4-plugin/blob/master/src/main/java/org/jenkinsci/plugins/p4/tagging/TagAction.java#L300 would return an empty arraylist. However I haven't actually done any debugging. While I can understand that builds being constantly triggered just because of a polling error is not desirable, I find it even more undesirable to almost silently skip triggering a build indefinitely without an administrator being made aware that there is a polling issue rather than just no changes being made in the workspace. Could there be limited (re)triggering on polling failures or an notification system so that an administrator can swoop in to diagnose issues or manually trigger the job anew as necessary in the event of polling failures? Note that I explicitly do not want to retry failed builds, because as far as I can tell from logs, although the build prior to the polling error was a failure in compilation, the perforce syncs in the job all completed successfully.

          Karl Wirth added a comment -

          Hi alisdair_robertson, Can you please provide an example of a polling error you are seeing so we can try it out here (for example the bad polling log).

          Thanks in advance,

          Karl

          Karl Wirth added a comment - Hi alisdair_robertson , Can you please provide an example of a polling error you are seeing so we can try it out here (for example the bad polling log). Thanks in advance, Karl

          Hey p4karl, the only content under the branch job polling log for the last poll is as follows (workspace names changes, but they include node name job name and stage name):

          Started on 01/05/2019 8:16:00 PM
          P4: Polling on: master with:workspace-1-name
          P4: Polling: No changes in previous build.
          P4: Polling error; no previous change.
          P4: Polling on: master with:workspace-2-name
          P4: Polling: No changes in previous build.
          P4: Polling error; no previous change.
          P4: Polling on: master with:workspace-3-name
          P4: Polling: No changes in previous build.
          P4: Polling error; no previous change.
          P4: Polling on: master with:workspace-4-name
          P4: Polling: No changes in previous build.
          P4: Polling error; no previous change.
          P4: Polling on: master with:workspace-5-name
          P4: Polling: No changes in previous build.
          P4: Polling error; no previous change.
          P4: Polling on: master with:workspace-6-name
          P4: Polling: No changes in previous build.
          P4: Polling error; no previous change.
          Done. Took 1.8 sec
          No changes

          I see normal polling output for the other branches of the multibranch pipeline where the most recent build was not a failure.

          This multibranch pipeline never automatically runs the 'scan multibranch pipeline' job, we only run that manually when we have new branches that need to be added. The Jenkinsfile configures explicit SCM polling once each evening.

          Alisdair Robertson added a comment - Hey p4karl , the only content under the branch job polling log for the last poll is as follows (workspace names changes, but they include node name job name and stage name): Started on 01/05/2019 8:16:00 PM P4: Polling on: master with:workspace-1-name P4: Polling: No changes in previous build. P4: Polling error; no previous change. P4: Polling on: master with:workspace-2-name P4: Polling: No changes in previous build. P4: Polling error; no previous change. P4: Polling on: master with:workspace-3-name P4: Polling: No changes in previous build. P4: Polling error; no previous change. P4: Polling on: master with:workspace-4-name P4: Polling: No changes in previous build. P4: Polling error; no previous change. P4: Polling on: master with:workspace-5-name P4: Polling: No changes in previous build. P4: Polling error; no previous change. P4: Polling on: master with:workspace-6-name P4: Polling: No changes in previous build. P4: Polling error; no previous change. Done. Took 1.8 sec No changes I see normal polling output for the other branches of the multibranch pipeline where the most recent build was not a failure. This multibranch pipeline never automatically runs the 'scan multibranch pipeline' job, we only run that manually when we have new branches that need to be added. The Jenkinsfile configures explicit SCM polling once each evening.

          Dave Miller added a comment -

          I just upgraded to 1.10.0 to address a different issue, and I'm trying to understand why commit 283834eabea30f31cde59b1ab3b743a01b2f47cb, which claims to be associated with this ticket, is now throwing un-actionable "Severe warnings" when it finds duplicate SyncIDs. Why do duplicate syncIDs justify a severe log entry, what do they have to to with this ticket, and how should they be addressed? These "Severe warnings" are dramatically clogging our server logs. p4paul?

          Dave Miller added a comment - I just upgraded to 1.10.0 to address a different issue, and I'm trying to understand why commit 283834eabea30f31cde59b1ab3b743a01b2f47cb, which claims to be associated with this ticket, is now throwing un-actionable "Severe warnings" when it finds duplicate SyncIDs. Why do duplicate syncIDs justify a severe log entry, what do they have to to with this ticket, and how should they be addressed? These "Severe warnings" are dramatically clogging our server logs.  p4paul ?

          Karl Wirth added a comment -

          Hi feelingmimsy - We have had a lot of polling problems recently. Some of them came down to problems in P4-Jenkins but many came down to customers using the same workspace with different views in the same job. For each sync we store a sync ID keyed on the workspace name and if there are two we could easily be syncing and building the wrong changelists or missing changelists. Therefore we decided to highlight this with the message you see.

          I'd like to investigate this further but will be asking for some potentially confidential information. Would you be willing to send an email to 'support@perforce.com' for my attention so that I can get this information from you?

          Karl Wirth added a comment - Hi feelingmimsy - We have had a lot of polling problems recently. Some of them came down to problems in P4-Jenkins but many came down to customers using the same workspace with different views in the same job. For each sync we store a sync ID keyed on the workspace name and if there are two we could easily be syncing and building the wrong changelists or missing changelists. Therefore we decided to highlight this with the message you see. I'd like to investigate this further but will be asking for some potentially confidential information. Would you be willing to send an email to 'support@perforce.com' for my attention so that I can get this information from you?

          Karl Wirth added a comment -

          Have created the following bug to document here what the 'duplicate syncID found' means and suggest a messaging improvement:

          JENKINS-58067

          Karl Wirth added a comment - Have created the following bug to document here what the 'duplicate syncID found' means and suggest a messaging improvement: JENKINS-58067

          Released in 1.9.7

          Charusheela Bopardikar added a comment - Released in 1.9.7

            cbopardikar Charusheela Bopardikar
            p4karl Karl Wirth
            Votes:
            4 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: