Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-46961

Pipelines interrupted while starting incorrectly resume after Jenkins restarts and cannot be killed

    • workflow-job 2.40, workflow-cps 2.83

      I have a Multibranch Pipeline job that failed on August 30th 2017 due to a restart on the master or the slave (they're in separate servers).

      I've tried to abort it multiple times (and the log shows "Aborted by <user-at-company>" every time I try to abort), but it doesn't work.

      I've tried restarting both master and slave, but when I do that, I get a message like this on the log:

      "Resuming build at Tue Sep 12 17:49:04 BRT 2017 after Jenkins restart"

      Here are some more log lines:

       

      > git checkout -f a4ab3c46a97093925f401a391b238821f1099417
      First time build. Skipping changelog.
      java.nio.channels.ClosedByInterruptException
      at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
      at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:216)
      at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
      at java.nio.channels.Channels.writeFully(Channels.java:101)
      at java.nio.channels.Channels.access$000(Channels.java:61)
      at java.nio.channels.Channels$1.write(Channels.java:174)
      at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
      at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
      at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
      at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
      at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
      at java.io.BufferedWriter.flush(BufferedWriter.java:254)
      at hudson.util.AtomicFileWriter.flush(AtomicFileWriter.java:97)
      at com.thoughtworks.xstream.core.util.QuickWriter.flush(QuickWriter.java:75)
      Caused: com.thoughtworks.xstream.io.StreamException: : null
      at com.thoughtworks.xstream.core.util.QuickWriter.flush(QuickWriter.java:77)
      at com.thoughtworks.xstream.io.xml.PrettyPrintWriter.endNode(PrettyPrintWriter.java:322)
      at com.thoughtworks.xstream.io.WriterWrapper.endNode(WriterWrapper.java:37)
      at com.thoughtworks.xstream.io.path.PathTrackingWriter.endNode(PathTrackingWriter.java:48)
      at com.thoughtworks.xstream.core.TreeMarshaller.start(TreeMarshaller.java:83)
      at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.marshal(AbstractTreeMarshallingStrategy.java:37)
      at com.thoughtworks.xstream.XStream.marshal(XStream.java:1026)
      at com.thoughtworks.xstream.XStream.marshal(XStream.java:1015)
      at com.thoughtworks.xstream.XStream.toXML(XStream.java:988)
      at hudson.XmlFile.write(XmlFile.java:171)
      Caused: java.io.IOException
      at hudson.XmlFile.write(XmlFile.java:174)
      at org.jenkinsci.plugins.workflow.support.storage.SimpleXStreamFlowNodeStorage.storeNode(SimpleXStreamFlowNodeStorage.java:93)
      at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.storeNode(CpsFlowExecution.java:1481)
      at org.jenkinsci.plugins.workflow.cps.FlowHead.newStartNode(FlowHead.java:109)
      at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:487)
      at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:269)
      at hudson.model.ResourceController.execute(ResourceController.java:97)
      at hudson.model.Executor.run(Executor.java:405)
      Finished: FAILURE
      Resuming build at Wed Aug 30 18:32:00 BRT 2017 after Jenkins restart
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Aborted by user-at-company
      Resuming build at Tue Sep 12 17:49:04 BRT 2017 after Jenkins restart
      Resuming build at Wed Sep 13 18:05:24 BRT 2017 after Jenkins restart
      Resuming build at Mon Sep 18 11:07:36 BRT 2017 after Jenkins restart

          [JENKINS-46961] Pipelines interrupted while starting incorrectly resume after Jenkins restarts and cannot be killed

          Meng Xin Zhu added a comment -

          +1, met it again in latest lts 2.121.2.

          BUILDURL/kill did the tricky.

          Meng Xin Zhu added a comment - +1, met it again in latest lts 2.121.2. BUILDURL/kill did the tricky.

          ivy cassidy added a comment -

          Just ran into as well.

          Jenkins ver. 2.135++
          workflow-cps 2.54

          had to buldurl/kill as well.

          ivy cassidy added a comment - Just ran into as well. Jenkins ver. 2.135++ workflow-cps 2.54 had to buldurl/kill as well.

          I have the same issue 

          Jenkins ver. 2.140

          Pipeline: Groovy/workflow-cps v2.54

          Eslam ElHusseiny added a comment - I have the same issue  Jenkins ver. 2.140 Pipeline: Groovy/workflow-cps v2.54

          Eyal David added a comment -

          same issue Jenkins master : Jenkins ver. 2.89.4
          Pipeline: Groovy/workflow-cps v2.53

          Eyal David added a comment - same issue Jenkins master : Jenkins ver. 2.89.4 Pipeline: Groovy/workflow-cps v2.53

          Faced the issues today:

          Jenkins ver. 2.190.3
          workflow-cps@2.76
          workflow-cps-global-lib@2.15

          It happened after Jenkins restart.

          Ivan Udovichenko added a comment - Faced the issues today: Jenkins ver. 2.190.3 workflow-cps@2.76 workflow-cps-global-lib@2.15 It happened after Jenkins restart.

          Devin Nusbaum added a comment -

          FWIW, I have been looking into this recently. jenkinsci/workflow-cps-plugin #374 adds a test case that reproduces this kind of problem that we should be able to use to fix the problem that is causing these Pipelines to resume in the first place.

          Devin Nusbaum added a comment - FWIW, I have been looking into this recently. jenkinsci/workflow-cps-plugin #374 adds a test case that reproduces this kind of problem that we should be able to use to fix the problem that is causing these Pipelines to resume in the first place.

          Devin Nusbaum added a comment - - edited

          Ok, I think that jenkinsci/workflow-job-plugin#167 in combination with the PR in my last comment should fix the issue.

          Devin Nusbaum added a comment - - edited Ok, I think that jenkinsci/workflow-job-plugin#167 in combination with the PR in my last comment should fix the issue.

          Devin Nusbaum added a comment -

          Ok, I just released Pipeline: Job plugin version 2.40 and Pipeline: Groovy plugin version 2.83 with some fixes that should prevent this problem from occurring when Pipelines are interrupted while they are still starting by making sure their state is persisted correctly so they do not resume after Jenkins is restarted. This problem affected Pipelines regardless of what durability level they were using.

          Pipelines that were already stuck in the resuming state will not be fixed by these changes; they will still need to be hard-killed or deleted from the file system if the hard-kill does not work.

          I wasn't able to come up with a reproducer that would cause actual resumption so I was not able to figure out exactly why the Pipelines that experienced this issue hung after resuming. If anyone is still seeing that issue for new builds after updating the to the new versions of the  plugins I mentioned, please open a new issue with details and steps to reproduce the problem.

          Devin Nusbaum added a comment - Ok, I just released Pipeline: Job plugin version 2.40 and Pipeline: Groovy plugin version 2.83 with some fixes that should prevent this problem from occurring when Pipelines are interrupted while they are still starting by making sure their state is persisted correctly so they do not resume after Jenkins is restarted. This problem affected Pipelines regardless of what durability level they were using. Pipelines that were already stuck in the resuming state will not be fixed by these changes; they will still need to be hard-killed or deleted from the file system if the hard-kill does not work. I wasn't able to come up with a reproducer that would cause actual resumption so I was not able to figure out exactly why the Pipelines that experienced this issue hung after resuming. If anyone is still seeing that issue for new builds after updating the to the new versions of the  plugins I mentioned, please open a new issue with details and steps to reproduce the problem.

          This continues to be a problem for us in our pipelines. Last night for 3/25/2024 Jenkins update our pipeline stalled after resume and caused the restart to also stall which stalled all subsequent jobs.
          Scheduling project: JOB
          Starting building: JOB #314Build JOB #314 completed: SUCCESSPausing (Preparing for shutdown)Aborted by userResuming (Shutdown was canceled)[Pipeline] }[Pipeline] // stage[Pipeline] }[Pipeline] // node[Pipeline] End of Pipelineorg.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: 503bb8f0-8c16-48d9-948a-02deaee08aba
          Finished: ABORTED

          jacob anderson added a comment - This continues to be a problem for us in our pipelines. Last night for 3/25/2024 Jenkins update our pipeline stalled after resume and caused the restart to also stall which stalled all subsequent jobs. Scheduling project: JOB Starting building: JOB #314 Build JOB #314 completed: SUCCESSPausing (Preparing for shutdown)Aborted by u serResuming (Shutdown was canceled) [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // node [Pipeline] End of Pipelineorg.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: 503bb8f0-8c16-48d9-948a-02deaee08aba Finished: ABORTED

          For us, the problem was ThinBackup. When it runs it was resetting the container and that was interfering with the pipeline execution. Once I disabled the regular ThinBackup jobs the pipeline was stable again.

          jacob anderson added a comment - For us, the problem was ThinBackup. When it runs it was resetting the container and that was interfering with the pipeline execution. Once I disabled the regular ThinBackup jobs the pipeline was stable again.

            dnusbaum Devin Nusbaum
            elifarley Elifarley
            Votes:
            7 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated: