Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47868

Pipeline durability hang when slave node disconnected

XMLWordPrintable

      My parallel pipeline job runs primarily on Jenkins slave nodes and I came across a case where a parallel branch went to a slave node that disconnected from the Jenkins master due to an issue with our hosting provider.  This hung the build until I manually stepped in.   I noticed it after all of the other branches completed their work and one branch was running on a disconnected slave.  Even though Jenkins master had many idle Jenkins slave nodes, this branch waited on the disconnected agent.

      I manually stepped in and restarted the instance and it registered again on the Jenkins master.  Only after the slave node connected did the build fail.  I was expecting one of the three outcomes, instead I had to manually step in to free the hung build.

      1.  The branch would have detected the disconnected slave node and ran on another available one.

      2.  The branch would have failed immediately when the slave node disconnected similar to freestyle.

      3.  The branch and build would have resumed successfully once the slave reconnected.

      I was able to reproduce this issue using the Pipeline code below and disconnecting the slave during the "sleep 15s" step.

      timestamps {
      node("JENKINS-SLAVE-LABEL") {
         
            sh 'echo "First task"'
            sh 'sleep 15s'
            sh 'echo "Last task"'
          }
      }

       

      Below are the build logs after disconnecting the slave during "sleep 15s" and reconnecting the slave again after about a minute.

      [Pipeline] timestamps
      [Pipeline] {
      [Pipeline] node
      23:27:05 Running on JENKINS-SLAVE-NODE-NAME-a (i-xxxxxxxxxxxxxxxxxxx) in /home/centos/workspace/JOBNAME
      [Pipeline] {
      [Pipeline] sh
      23:27:13 [JOBNAME] Running shell script
      23:27:14 + echo 'First task'
      23:27:14 First task
      [Pipeline] sh
      23:27:14 [JOBNAME] Running shell script
      23:27:15 + sleep 15s
      23:27:25 Cannot contact JENKINS-SLAVE-NODE-NAME-a (i-xxxxxxxxxxxxxxxxxxx): java.io.IOException: remote file operation failed: /home/centos/workspace/JOBNAME at hudson.remoting.Channel@32fe452c:JENKINS-SLAVE-NODE-NAME-a (i-xxxxxxxxxxxxxxxxxxx): hudson.remoting.ChannelClosedException: channel is already closed
      [Pipeline] sh
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] }
      [Pipeline] // timestamps
      [Pipeline] End of Pipeline
      Command close created at
          at hudson.remoting.Command.<init>(Command.java:60)
          at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1123)
          at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1121)
          at hudson.remoting.Channel.close(Channel.java:1281)
          at hudson.remoting.Channel.close(Channel.java:1263)
          at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
      Caused: hudson.remoting.Channel$OrderlyShutdown
          at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
          at hudson.remoting.Channel$1.handle(Channel.java:527)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
      Caused: hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:605)
          at hudson.remoting.Request.call(Request.java:130)
          at hudson.remoting.Channel.call(Channel.java:829)
          at hudson.FilePath.act(FilePath.java:987)
          at hudson.FilePath.act(FilePath.java:976)
          at hudson.FilePath.mkdirs(FilePath.java:1159)
          at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.<init>(FileMonitoringTask.java:113)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:167)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:161)
          at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:90)
          at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:177)
          at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:224)
          at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:150)
          at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:108)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
          at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
          at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)
          at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1027)
          at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
          at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
          at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
          at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:155)
          at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
          at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:133)
          at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:153)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:157)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
          at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
      Caused: java.io.IOException: remote file operation failed: /home/centos/workspace/JOBNAME at hudson.remoting.Channel@32fe452c:JENKINS-SLAVE-NODE-NAME-a (i-xxxxxxxxxxxxxxxxxxx)
          at hudson.FilePath.act(FilePath.java:994)
          at hudson.FilePath.act(FilePath.java:976)
          at hudson.FilePath.mkdirs(FilePath.java:1159)
          at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.<init>(FileMonitoringTask.java:113)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:167)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:161)
          at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:90)
          at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:177)
          at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:224)
          at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:150)
          at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:108)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
          at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
          at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)
          at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1027)
          at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
          at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
          at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
          at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:155)
          at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
          at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:133)
          at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:153)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:157)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
          at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
          at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
          at WorkflowScript.run(WorkflowScript:6)
          at ___cps.transform___(Native Method)
          at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57)
          at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
          at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
          at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
          at com.cloudbees.groovy.cps.Next.step(Next.java:83)
          at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
          at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
          at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
          at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
          at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
          at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
          at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
          at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
          at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
          at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
          at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
          at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
      Finished: FAILURE
      

            Unassigned Unassigned
            mkozell Mike Kozell
            Votes:
            2 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: