Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47157

Jenkins pipeline hangs with "cannot start writing logs to a finished node"

      I have a jenkins pipeline that uses the parallel command to execute tests and then at the end I run an "sh" command. Upon examining these pipeline runs which seem "stuck" i find this error message at the end of the console:

      java.lang.IllegalStateException: cannot start writing logs to a finished node org.jenkinsci.plugins.workflow.cps.nodes.StepEndNode[id=922]
      	at org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.<init>(LogActionImpl.java:114)
      	at org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.stream(LogActionImpl.java:85)
      	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:73)
      	at org.jenkinsci.plugins.workflow.cps.steps.ParallelStep$ResultHandler$Callback.onFailure(ParallelStep.java:122)
      	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:345)
      	at com.cloudbees.groovy.cps.impl.ThrowBlock$1.receive(ThrowBlock.java:68)
      	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
      	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Finished: ABORTED

      My guess as to what is going on is that at our company, we scale jenkins up and down during the day. There's a script that will removea  jenkins slave from the master via a script that roughly looks like this:

      |#! /bin/bash -ex
      |cd /var/jenkins_home/war/WEB-INF
      |java -jar jenkins-cli.jar -auth ${username}:${password} -s http://localhost:8080/ delete-node ${nodeName}

      We detach nodes in this way for a variety of reasons but most commonly, it's because a slave node in our Ec2 spot instance fleet request went away and want to detach it from the master.

      Big picture, we are probably doing something wrong. However, it seems like this ticket depicts a Jenkins pipeline in a stuck state.

      Please advise. I am seeing this exact issue almost daily now. There must be some smart way to recover from this. Ideally, if a slave goes away, Jenkins master is smart enough to somehow re-dispatch the work to be done on some other slave which is still up.

          [JENKINS-47157] Jenkins pipeline hangs with "cannot start writing logs to a finished node"

          Sam Van Oort added a comment -

          This looks like a bug around the log handling (which is about to get massively reworked, thankfully), but I'm not clear what triggers this particular case:  do you have a sample pipeline that will generate this issue?  Thanks!

          Sam Van Oort added a comment - This looks like a bug around the log handling (which is about to get massively reworked, thankfully), but I'm not clear what triggers this particular case:  do you have a sample pipeline that will generate this issue?  Thanks!

          Jesse Glick added a comment -

          Whatever the bug in ParallelStep or LogActionImpl or whatever turns out to be, the critical question for me is why this code, if nothing else, did not suffice to croakonProgramEnd and thus force WorkflowRun.GraphL to finish the build.

          Jesse Glick added a comment - Whatever the bug in ParallelStep or LogActionImpl or whatever turns out to be, the critical question for me is why this code , if nothing else, did not suffice to croak → onProgramEnd and thus force WorkflowRun.GraphL to finish the build.

          Jesse Glick added a comment -

          BTW this particular case would not have happened this way in workflow-support 2.15 or later.

          Mentioning the Jenkins version in Environment in cases like this is not often useful. What matters is the versions of core Pipeline plugins.

          Jesse Glick added a comment - BTW this particular case would not have happened this way in workflow-support 2.15 or later. Mentioning the Jenkins version in Environment in cases like this is not often useful. What matters is the versions of core Pipeline plugins.

          Sam Van Oort added a comment -

          jglick I'm not convinced this one is reproducible in modern pipeline plugin versions at all – we were looking at one with a user today, and they proved to be using very old versions of the plugins. 

          piratejohnny Does the issue still occur if you use the latest Pipeline plugin versions?

          Sam Van Oort added a comment - jglick I'm not convinced this one is reproducible in modern pipeline plugin versions at all – we were looking at one with a user today, and they proved to be using very old versions of the plugins.  piratejohnny Does the issue still occur if you use the latest Pipeline plugin versions?

          We had this issue today:

          java.io.IOException: cannot start writing logs to a finished node org.jenkinsci.plugins.workflow.cps.nodes.StepEndNode[id=290] } in CpsFlowExecution[Owner[cab/BF/performance/PR-5/1:cab/BF/performance/PR-5 #1]]
          	at org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.<init>(LogActionImpl.java:106)
          	at org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.stream(LogActionImpl.java:77)
          	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:73)
          	at org.jenkinsci.plugins.workflow.steps.WaitForConditionStep$Execution.retry(WaitForConditionStep.java:102)
          	at org.jenkinsci.plugins.workflow.steps.WaitForConditionStep$Execution.access$100(WaitForConditionStep.java:48)
          	at org.jenkinsci.plugins.workflow.steps.WaitForConditionStep$Execution$1.apply(WaitForConditionStep.java:91)
          	at org.jenkinsci.plugins.workflow.steps.WaitForConditionStep$Execution$1.apply(WaitForConditionStep.java:88)
          	at org.jenkinsci.plugins.workflow.steps.StepExecution$2.apply(StepExecution.java:172)
          	at org.jenkinsci.plugins.workflow.steps.StepExecution$2.apply(StepExecution.java:168)
          	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$StepExecutionIteratorImpl$1.onSuccess(FlowExecutionList.java:222)
          	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$StepExecutionIteratorImpl$1.onSuccess(FlowExecutionList.java:217)
          	at com.google.common.util.concurrent.Futures$6.run(Futures.java:975)
          	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
          	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
          	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134)
          	at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170)
          	at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:999)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:977)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:906)
          	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
          	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at java.lang.Thread.run(Thread.java:748)

          Claas Diederichs added a comment - We had this issue today: java.io.IOException: cannot start writing logs to a finished node org.jenkinsci.plugins.workflow.cps.nodes.StepEndNode[id=290] } in CpsFlowExecution[Owner[cab/BF/performance/PR-5/1:cab/BF/performance/PR-5 #1]] at org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.<init>(LogActionImpl.java:106) at org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.stream(LogActionImpl.java:77) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:73) at org.jenkinsci.plugins.workflow.steps.WaitForConditionStep$Execution.retry(WaitForConditionStep.java:102) at org.jenkinsci.plugins.workflow.steps.WaitForConditionStep$Execution.access$100(WaitForConditionStep.java:48) at org.jenkinsci.plugins.workflow.steps.WaitForConditionStep$Execution$1.apply(WaitForConditionStep.java:91) at org.jenkinsci.plugins.workflow.steps.WaitForConditionStep$Execution$1.apply(WaitForConditionStep.java:88) at org.jenkinsci.plugins.workflow.steps.StepExecution$2.apply(StepExecution.java:172) at org.jenkinsci.plugins.workflow.steps.StepExecution$2.apply(StepExecution.java:168) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$StepExecutionIteratorImpl$1.onSuccess(FlowExecutionList.java:222) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$StepExecutionIteratorImpl$1.onSuccess(FlowExecutionList.java:217) at com.google.common.util.concurrent.Futures$6.run(Futures.java:975) at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253) at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149) at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134) at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170) at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:999) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:977) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:906) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748)

          Sam Van Oort added a comment -

          claasd Please confirm what versions of the Pipeline plugin you are using and whether this occurred within a parallel, especially a nested parallel - thanks!

          Sam Van Oort added a comment - claasd Please confirm what versions of the Pipeline plugin you are using and whether this occurred within a parallel, especially a nested parallel - thanks!

          Hi,

          we are using
          pipeline = 2.5
          Pipeline: Supporting APIs = 2.18
          Pipeline: Basic Steps = 2.7
          Pipeline: Step API = 2.14

          The error did indeed occur inside a nested parallel.
          We do a parallel that executes 8 build steps. This is wrapped into a parallel that executes those 8 parallel builds and another tasks that collects logfies.

          Claas Diederichs added a comment - Hi, we are using pipeline = 2.5 Pipeline: Supporting APIs = 2.18 Pipeline: Basic Steps = 2.7 Pipeline: Step API = 2.14 The error did indeed occur inside a nested parallel. We do a parallel that executes 8 build steps. This is wrapped into a parallel that executes those 8 parallel builds and another tasks that collects logfies.

          Sam Van Oort added a comment -

          claasd Okay,that's helpful – what versions of Pipeline: Groovy and Pipeline: Job are you using though? There's been a lot of related changes there.

          Sam Van Oort added a comment - claasd Okay,that's helpful – what versions of Pipeline: Groovy and Pipeline: Job are you using though? There's been a lot of related changes there.

          Pipeline: Groovy = 2.53
          Pipeline: Job = 2.21

          Claas Diederichs added a comment - Pipeline: Groovy = 2.53 Pipeline: Job = 2.21

          Sam Van Oort added a comment -

          claasd Is this reproducible with the latest workflow-job version? We've just done a massive change to how logging is handled and I suspect this resolves it.

          Sam Van Oort added a comment - claasd Is this reproducible with the latest workflow-job version? We've just done a massive change to how logging is handled and I suspect this resolves it.

            Unassigned Unassigned
            piratejohnny Jon B
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: