Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-40825

"Pipe not connected" errors when running multiple builds simultaneously

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • kubernetes-plugin
    • None
    • Jenkins 2.60
      Kubernetes plugin 0.12
      Kubernetes 1.5.1 on GKE
      Kubernetes 1.7.0 on AWS

      Hi there,

      We have Jenkins running in Kubernetes with the Kubernetes plugin, and have been experiencing `java.io.IOException: Pipe not connected` errors when running multiple builds simultaneously. This seems to consistently happen when we run 8 or more builds (on the same pipeline). About 50% of the builds will succeed, and the other 50% will fail with the `Pipe not connected` exception. Most of the time it will fail at stage 1, but sometimes at stage 2.

      We're using the following pipeline:

      podTemplate(label: 'mypod', containers: [
        containerTemplate(name: 'debian', image: 'debian', ttyEnabled: true, command: 'cat'),
        containerTemplate(name: 'ubuntu', image: 'ubuntu', ttyEnabled: true, command: 'cat')
      ]) {
        node('mypod') {
          container('debian') {
            stage('stage 1') {
              sh 'echo hello'
              sh 'sleep 30'
              sh 'echo world'
            }
      
            stage('stage 2') {
              sh 'echo hello'
              sh 'sleep 30'
              sh 'echo world'
            }
          }
        }
      }
      

      And this is the log of such failed build:

      [Pipeline] podTemplate
      [Pipeline] {
      [Pipeline] node
      Still waiting to schedule task
      Waiting for next available executor on mypod
      Running on kubernetes-a0e59102b59b48ad99693ca32b94ab38-11a5bcd7df12e4 in /home/jenkins/workspace/kubernetes-test-3
      [Pipeline] {
      [Pipeline] container
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (stage 1)
      [Pipeline] sh
      [kubernetes-test-3] Running shell script
      Executing shell script inside container [debian] of pod [kubernetes-a0e59102b59b48ad99693ca32b94ab38-11a5bcd7df12e4]
      Executing command: sh -c echo $$ > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-f201019b/pid'; jsc=durable-7534cabf595ac7f32ca72b4db83e0af1; JENKINS_SERVER_COOKIE=$jsc '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-f201019b/script.sh' > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-f201019b/jenkins-log.txt' 2>&1; echo $? > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-f201019b/jenkins-result.txt' 
      # cd /home/jenkins/workspace/kubernetes-test-3
      sh -c echo $$ > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-f201019b/pid'; jsc=durable-7534cabf595ac7f32ca72b4db83e0af1; JENKINS_SERVER_COOKIE=$jsc '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-f201019b/script.sh' > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-f201019b/jenkins-log.txt' 2>&1; echo $? > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-f201019b/jenkins-result.txt' 
      exit
      # # + echo hello
      hello
      [Pipeline] sh
      [kubernetes-test-3] Running shell script
      Executing shell script inside container [debian] of pod [kubernetes-a0e59102b59b48ad99693ca32b94ab38-11a5bcd7df12e4]
      Executing command: sh -c echo $$ > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-0eb192c0/pid'; jsc=durable-7534cabf595ac7f32ca72b4db83e0af1; JENKINS_SERVER_COOKIE=$jsc '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-0eb192c0/script.sh' > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-0eb192c0/jenkins-log.txt' 2>&1; echo $? > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-0eb192c0/jenkins-result.txt' 
      # cd /home/jenkins/workspace/kubernetes-test-3
      sh -c echo $$ > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-0eb192c0/pid'; jsc=durable-7534cabf595ac7f32ca72b4db83e0af1; JENKINS_SERVER_COOKIE=$jsc '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-0eb192c0/script.sh' > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-0eb192c0/jenkins-log.txt' 2>&1; echo $? > '/home/jenkins/workspace/kubernetes-test-3@tmp/durable-0eb192c0/jenkins-result.txt' 
      exit
      # + sleep 30
      # [Pipeline] sh
      [kubernetes-test-3] Running shell script
      Executing shell script inside container [debian] of pod [kubernetes-a0e59102b59b48ad99693ca32b94ab38-11a5bcd7df12e4]
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      [Pipeline] // container
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] }
      [Pipeline] // podTemplate
      [Pipeline] End of Pipeline
      java.io.IOException: Pipe not connected
      	at java.io.PipedOutputStream.write(PipedOutputStream.java:140)
      	at java.io.OutputStream.write(OutputStream.java:75)
      	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:125)
      	at hudson.Launcher$ProcStarter.start(Launcher.java:384)
      	at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:147)
      	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:61)
      	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:158)
      	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:184)
      	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:126)
      	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:108)
      	at groovy.lang.GroovyObject$invokeMethod.call(Unknown Source)
      	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
      	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
      	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:151)
      	at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:21)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:115)
      	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:149)
      	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:146)
      	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:123)
      	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:123)
      	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:123)
      	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:123)
      	at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:16)
      	at WorkflowScript.run(WorkflowScript:10)
      	at ___cps.transform___(Native Method)
      	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57)
      	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
      	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
      	at sun.reflect.GeneratedMethodAccessor521.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
      	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
      	at com.cloudbees.groovy.cps.Next.step(Next.java:58)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:154)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:33)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:30)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:30)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:163)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:324)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:78)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:236)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:224)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:63)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Finished: FAILURE
      

      Something seems to be going wrong around https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecDecorator.java#L125.

          [JENKINS-40825] "Pipe not connected" errors when running multiple builds simultaneously

          Martin Sander added a comment -

          iocanel:
          It looks like I have made a bit of progress (if I am not completely mistaken).

          Are you aware that the DecoratedLauncher returned by decorate is used by Jenkins not once to execute the original command(s) (sleep in above example), but re-used to check if the process is still running?

          I.e. launch is called several times, with executions possibly (or maybe even certainly) overlapping.
          So my current assumption is that it is not safe to use members of the wrapping ContainerExecDecorator inside the DecoratedLauncher, especially launcher, watch, and proc.

          I will try to validate this assumption and might send you a (probably crude) pull request.

          Martin Sander added a comment - iocanel : It looks like I have made a bit of progress (if I am not completely mistaken). Are you aware that the DecoratedLauncher returned by decorate is used by Jenkins not once to execute the original command(s) ( sleep in above example), but re-used to check if the process is still running? I.e. launch is called several times, with executions possibly (or maybe even certainly) overlapping. So my current assumption is that it is not safe to use members of the wrapping ContainerExecDecorator inside the DecoratedLauncher , especially launcher , watch , and proc . I will try to validate this assumption and might send you a (probably crude) pull request.

          Martin Sander added a comment -

          I did some quite extensive testing yesterday, and I was able to get rid of the resource leak (I think).

          Pull request here: https://github.com/jenkinsci/kubernetes-plugin/pull/180. I recommend also viewing it with whitespace changes ignored.

          I don't expect you to merge it like that, but would be happy to get feedback .

          Unfortunately, it does not completely get rid of the "pipe not connected" errors, but

          • it seems to fix the resource leak
          • the "pipe not connected" error seems to fail the build much less often
          • it seems that it most of the time comes from org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep, which
            • runs ps all ten seconds or so to check if the process is still alive
            • just prints a single error to the build log, even if that check fails multiple times (Set the logger for that class to FINE to see all failures)
            • luckily does not fail the build if one of those checks fail

          Martin Sander added a comment - I did some quite extensive testing yesterday, and I was able to get rid of the resource leak (I think). Pull request here: https://github.com/jenkinsci/kubernetes-plugin/pull/180 . I recommend also viewing it with whitespace changes ignored . I don't expect you to merge it like that, but would be happy to get feedback . Unfortunately, it does not completely get rid of the "pipe not connected" errors, but it seems to fix the resource leak the "pipe not connected" error seems to fail the build much less often it seems that it most of the time comes from org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep , which runs ps all ten seconds or so to check if the process is still alive just prints a single error to the build log, even if that check fails multiple times (Set the logger for that class to FINE to see all failures) luckily does not fail the build if one of those checks fail

          0x89 Your assumption (that the decorator is called multiple times) is valid and is aligned with what I've seen so far. 

          That was were https://github.com/jenkinsci/kubernetes-plugin/pull/177 was aiming (to close() the listeners opened by the liveness checks).

          But it seems that this is affecting us in more ways and I feel you are on the right track.  Let me review your pull request and I'll get back to you.

          Ioannis Canellos added a comment - 0x89 Your assumption (that the decorator is called multiple times) is valid and is aligned with what I've seen so far.  That was were https://github.com/jenkinsci/kubernetes-plugin/pull/177  was aiming (to close() the listeners opened by the liveness checks). But it seems that this is affecting us in more ways and I feel you are on the right track.  Let me review your pull request and I'll get back to you.

          Martin Sander added a comment - - edited

          iocanel:

          I might be on the right track, but I think I didn't go far enough.

          It actually is not only the Decorator that is reused, but even the Launcher is used more than once, launch is called more than once.
          I will verify this and probably issue another pull request from a different branch tomorrow.

          Martin Sander added a comment - - edited iocanel : I might be on the right track, but I think I didn't go far enough. It actually is not only the Decorator that is reused, but even the Launcher is used more than once, launch is called more than once. I will verify this and probably issue another pull request from a different branch tomorrow.

          Martin Sander added a comment -

          Martin Sander added a comment - New pull request: https://github.com/jenkinsci/kubernetes-plugin/pull/182 .

          Jesse Redl added a comment -

          Thanks for the fix, we've re-enabled out multi-container workflows within jenkins / kubernetes plugin after upgrading to the most recent release!

          Jesse Redl added a comment - Thanks for the fix, we've re-enabled out multi-container workflows within jenkins / kubernetes plugin after upgrading to the most recent release!

          Andras Kovi added a comment -

          We started seeing this issue again: exceptions.txt

          Jenkins ver. 2.107.3, kubernetes-plugin:1.10.1

          The wait for the started latch is interrupted org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecDecorator.java:328.
          How is this possible? What is interrupting it and what parameters may need to be tweaked to get it working?

          The error happens when we spawn a relatively large number, about 25 parallel executions.

          Andras Kovi added a comment - We started seeing this issue again: exceptions.txt Jenkins ver. 2.107.3, kubernetes-plugin:1.10.1 The wait for the started latch is interrupted org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecDecorator.java:328 . How is this possible? What is interrupting it and what parameters may need to be tweaked to get it working? The error happens when we spawn a relatively large number, about 25 parallel executions.

          akovi can you share simple Pipeline script that reproduces the problem?

          Claes Buckwalter added a comment - akovi can you share simple Pipeline script that reproduces the problem?

          Andras Kovi added a comment -

          Seems like the 'Max connections to Kubernetes API' parameter was set to a very low number causing this error.

          So, for the record, if one encounters this issue, raising the 'Max connections to Kubernetes API' config parameter should be increased.

          For planning purposes it would still be good to know the relation between this parameter and the possible number of parallel executions in a pipeline.

           

          Andras Kovi added a comment - Seems like the 'Max connections to Kubernetes API' parameter was set to a very low number causing this error. So, for the record, if one encounters this issue, raising the 'Max connections to Kubernetes API' config parameter should be increased. For planning purposes it would still be good to know the relation between this parameter and the possible number of parallel executions in a pipeline.  

          Raviteja A added a comment -

          After increasing the 'Max connections to Kubernetes API' parameter, it didn't resolved the issue. We had to restart master.After that things started working.

          Raviteja A added a comment - After increasing the 'Max connections to Kubernetes API' parameter, it didn't resolved the issue. We had to restart master.After that things started working.

            csanchez Carlos Sanchez
            soud Steven Oud
            Votes:
            20 Vote for this issue
            Watchers:
            38 Start watching this issue

              Created:
              Updated:
              Resolved: