Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-30383

SynchronousNonBlockingStepExecution should allow restart of idempotent steps

    • Icon: Improvement Improvement
    • Resolution: Won't Do
    • Icon: Minor Minor
    • None
    • wf 1.10
      jenkins 1.609.2

      It appears as though if the Jenkins master crashes whilst it is archiving artifacts then the workflow fails to resume.

      archiveArtifacts should be idempotent so it should be able to try again (if the slave / workspace still exists) In this case they should as I was only using dumb slaves.

      THe message should also say which step (class) could not be resumed - as I am just guessing it was the last one before the "resuming build" message

      ...
      Running: Stash some files to be used later in the build
      14:23:10 Stashed 1 file(s)
      Running: Allocate node : Body : End
      Running: Allocate node : End
      Running: Change Directory : Start
      14:23:11 Running in /home/jenkins/slave/workspace/automated_release/packaging/target
      Running: Change Directory : Body : Start
      Running: Change Directory : Start
      14:23:11 Running in /home/jenkins/slave/workspace/automated_release/packaging/target/msi
      Running: Change Directory : Body : Start
      Running: Restore files previously stashed
      Running: Change Directory : Body : End
      Running: Change Directory : End
      Running: Change Directory : Body : End
      Running: Change Directory : End
      Running: Change Directory : Start
      14:23:29 Running in /home/jenkins/slave/workspace/automated_release/packaging/target
      Running: Change Directory : Body : Start
      Running: Archive Artifacts
      Resuming build
      Running: Change Directory : Body : End
      Running: Change Directory : End
      Running: Change Directory : Body : End
      Running: Change Directory : End
      Running: General Build Wrapper : Body : End
      Running: General Build Wrapper : End
      Running: Allocate node : Body : End
      Running: Allocate node : End
      Running: End of Workflow
      java.lang.Exception: Resume after a restart not supported for non-blocking synchronous steps
      	at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution.onResume(AbstractSynchronousNonBlockingStepExecution.java:73)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl$1.onSuccess(FlowExecutionList.java:176)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl$1.onSuccess(FlowExecutionList.java:172)
      	at com.google.common.util.concurrent.Futures$6.run(Futures.java:975)
      	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
      	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
      	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134)
      	at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170)
      	at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:627)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:605)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:546)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:32)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Finished: FAILURE
      

          [JENKINS-30383] SynchronousNonBlockingStepExecution should allow restart of idempotent steps

          Jesse Glick added a comment -

          Has been discussed. Certain steps could indeed be idempotent. Not sure about this one; probably yes.

          Jesse Glick added a comment - Has been discussed. Certain steps could indeed be idempotent. Not sure about this one; probably yes.

          Same thing with SCMStep. Are you considering it? jglick teilo

          Anna Tikhonova added a comment - Same thing with SCMStep. Are you considering it? jglick teilo

          Jesse Glick added a comment -

          Not currently planning to work on it; one of many useful things to do.

          Jesse Glick added a comment - Not currently planning to work on it; one of many useful things to do.

          Jesse Glick added a comment -

          svanoort FYI this code makes it an error currently.

          Jesse Glick added a comment - svanoort FYI this code makes it an error currently.

          FTR just got the following on XWiki's CI:

          ...
          [main] ➡ Build has failed, sending mails to concerned parties
          [Pipeline] [main] emailext
          Resuming build at Fri Feb 23 18:55:45 CET 2018 after Jenkins restart
          Ready to run at Fri Feb 23 18:55:50 CET 2018
          [Pipeline] [main] }
          [Pipeline] [main] // stage
          [Pipeline] [main] }
          [Pipeline] [main] // node
          [Pipeline] [main] }
          [main] Failed in branch main
          [Pipeline] // parallel
          [Pipeline] }
          [Pipeline] // stage
          [Pipeline] End of Pipeline
          
          GitHub has been notified of this commit’s build result
          
          java.lang.Exception: Resume after a restart not supported for non-blocking synchronous steps
          	at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution.onResume(AbstractSynchronousNonBlockingStepExecution.java:70)
          	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl$1.onSuccess(FlowExecutionList.java:185)
          	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl$1.onSuccess(FlowExecutionList.java:180)
          	at com.google.common.util.concurrent.Futures$6.run(Futures.java:975)
          	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
          	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
          	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134)
          	at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170)
          	at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:835)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:813)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:746)
          	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
          	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at java.lang.Thread.run(Thread.java:748)
          Finished: FAILURE
          

          The exception seems related.

          You can see it live at http://ci.xwiki.org/job/XWiki/job/xwiki-platform/job/stable-9.11.x/74/console

          Vincent Massol added a comment - FTR just got the following on XWiki's CI: ... [main] ➡ Build has failed, sending mails to concerned parties [Pipeline] [main] emailext Resuming build at Fri Feb 23 18:55:45 CET 2018 after Jenkins restart Ready to run at Fri Feb 23 18:55:50 CET 2018 [Pipeline] [main] } [Pipeline] [main] // stage [Pipeline] [main] } [Pipeline] [main] // node [Pipeline] [main] } [main] Failed in branch main [Pipeline] // parallel [Pipeline] } [Pipeline] // stage [Pipeline] End of Pipeline GitHub has been notified of this commit’s build result java.lang.Exception: Resume after a restart not supported for non-blocking synchronous steps at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution.onResume(AbstractSynchronousNonBlockingStepExecution.java:70) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl$1.onSuccess(FlowExecutionList.java:185) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl$1.onSuccess(FlowExecutionList.java:180) at com.google.common.util.concurrent.Futures$6.run(Futures.java:975) at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253) at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149) at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134) at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170) at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:835) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:813) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:746) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Finished: FAILURE The exception seems related. You can see it live at http://ci.xwiki.org/job/XWiki/job/xwiki-platform/job/stable-9.11.x/74/console

          What's the workaround for this issue? How can we get our job to work? They keep throwing this exception and they fail

          Mohammad Norouzi added a comment - What's the workaround for this issue? How can we get our job to work? They keep throwing this exception and they fail

          Jesse Glick added a comment -

          As of JENKINS-49707 the general recommendation is to use the retries option in Declarative Pipeline, or for Scripted to wrap node blocks in e.g.

          retry(count: 2, conditions: [agent(), nonresumable()])
          

          which will cover a variety of agent-related problems, including the originally proposed case (controller crashing in the middle of archiveArtifacts), at the cost of potentially redoing some work unnecessarily. This enhancement as originally conceived would offer a more efficient approach (retrying only a single step, using the same agent and workspace) but only covers limited cases (the step is idempotent, and the same agent is able to reconnect), so it does not seem worth the added complexity just to speed up builds that would otherwise have failed in the face of what you hope is an occasional problem.

          Jesse Glick added a comment - As of JENKINS-49707 the general recommendation is to use the retries option in Declarative Pipeline, or for Scripted to wrap node blocks in e.g. retry(count: 2, conditions: [agent(), nonresumable()]) which will cover a variety of agent-related problems, including the originally proposed case (controller crashing in the middle of archiveArtifacts ), at the cost of potentially redoing some work unnecessarily. This enhancement as originally conceived would offer a more efficient approach (retrying only a single step, using the same agent and workspace) but only covers limited cases (the step is idempotent, and the same agent is able to reconnect), so it does not seem worth the added complexity just to speed up builds that would otherwise have failed in the face of what you hope is an occasional problem.

          jglick Hello. Thanks for looking into this. Isn't JENKINS-49707 about automatic retries? If so, why would I need to care about handling retries myself in our script? (I'm probably missing something, maybe JENKINS-49707 is only about a specific use case). Would a retry cover the use case of https://issues.jenkins.io/browse/JENKINS-30383?focusedCommentId=330038&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-330038 (ie jenkins is restarted and the script is resumed)? Thanks!

          Vincent Massol added a comment - jglick Hello. Thanks for looking into this. Isn't JENKINS-49707 about automatic retries? If so, why would I need to care about handling retries myself in our script? (I'm probably missing something, maybe JENKINS-49707 is only about a specific use case). Would a retry cover the use case of https://issues.jenkins.io/browse/JENKINS-30383?focusedCommentId=330038&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-330038 (ie jenkins is restarted and the script is resumed)? Thanks!

          Jesse Glick added a comment -

          why would I need to care about handling retries myself in our script?

          You need to opt into the new behavior—in the case of Scripted Pipeline, by using the retry step explicitly; in the case of Declarative, by using the retries option.

          jenkins is restarted and the script is resumed

          Yes, for the case of a non-durable step such as checkout or archiveArtifacts this is handled by the retry condition nonresumable. (sh, bat, and powershell steps should simply proceed.)

          Jesse Glick added a comment - why would I need to care about handling retries myself in our script? You need to opt into the new behavior—in the case of Scripted Pipeline, by using the retry step explicitly; in the case of Declarative, by using the retries option. jenkins is restarted and the script is resumed Yes, for the case of a non-durable step such as checkout or archiveArtifacts this is handled by the retry condition nonresumable . ( sh , bat , and powershell steps should simply proceed.)

            Unassigned Unassigned
            teilo James Nord
            Votes:
            11 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: