Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-72729

java.lang.InterruptedException when executing within docker on remote worker

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • docker-workflow-plugin
    • None

      Background - we have a Jenkins environment that is hosted on kubernetes. In our deployment we have a main controller pod and then an arbitrary number of remote worker nodes executing on separate kubernetes pods (in most cases limited to 2 or 4) connected via JNLP4.

      The remote workers themselves execute commands inside of various docker containers that are specialized for whatever task is being executed. This setup has existed like this for 4+ years and we have upgraded a few times to the latest LTS Jenkins as well as upgrading plugins.

      What has changed - we are upgrading to a more recent Jenkins LTS (FROM Jenkins 2.263.2 TO 2.426.3) and with it are upgrading associated plugins and underlying linux software.

      What is the problem - after the upgrade we have started executing our unit tests to ensure our testing platform operates properly and there are no regressions introduced with the upgrade of Jenkins. Our testing suite consists of a number of pipelines (representing one test) which are automatically generated and then executed concurrently.

      During test execution we are hitting a case where it appears like docker connectivity is breaking down on the worker and the pipeline is failing with an InterruptedException. If we run a small set of tests our jobs can pass reliably. If we ramp up and run the full suite (something we are currently able to do before the upgrade) most tests fail with the error mentioned below.

      What have we tried - an attempt was made to prevent this issue by setting the system property org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.REMOTE_TIMEOUT to 60. This appears to have solved an unrelated issue. This issue remains.

      Logs - included below are the logs of our pipeline execution - I have removed everything between when the pipeline started, and when it failed within withDockerContainer. We have a groovy helper that is the function we call to invoke the methods to execute within the container called dockerHelper.groovy below.

      14:05:01  [Pipeline] Start of Pipeline (hide)
      ...
      14:07:59  [Pipeline] withDockerContainer
      14:07:59  persistent-docker-worker-0 seems to be running inside container <our-container-id>
      14:07:59  but /home/jenkins/agent/workspace/integration-test/helmTest could not be found among []
      14:07:59  but /home/jenkins/agent/workspace/integration-test/helmTest@tmp could not be found among []
      14:07:59  $ docker run -t -d -u 1000:1000 -w /home/jenkins/agent/workspace/integration-test/helmTest -v /home/jenkins/agent/workspace/integration-test/helmTest:/home/jenkins/agent/workspace/integration-test/helmTest:rw,z -v /home/jenkins/agent/workspace/integration-test/helmTest@tmp:/home/jenkins/agent/workspace/integration-test/helmTest@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** artifactory.local/jpm/helm-client:3 cat
      14:12:59  [Pipeline] // withDockerContainer
      ...
      14:13:01  Also:   org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: be8f9f02-a5fc-4aae-a610-64176b73c3e1
      14:13:01  java.lang.InterruptedException
      14:13:01        at java.base/java.lang.Object.wait(Native Method)
      14:13:01        at hudson.remoting.Request.call(Request.java:177)
      14:13:01        at hudson.remoting.Channel.call(Channel.java:1002)
      14:13:01        at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
      14:13:01        at com.sun.proxy.$Proxy272.join(Unknown Source)
      14:13:01        at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1198)
      14:13:01        at hudson.Proc.joinWithTimeout(Proc.java:172)
      14:13:01        at org.jenkinsci.plugins.docker.workflow.client.DockerClient.launch(DockerClient.java:314)
      14:13:01        at org.jenkinsci.plugins.docker.workflow.client.DockerClient.run(DockerClient.java:144)
      14:13:01        at org.jenkinsci.plugins.docker.workflow.WithContainerStep$Execution.start(WithContainerStep.java:200)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:323)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:124)
      14:13:01        at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:47)
      14:13:01        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
      14:13:01        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
      14:13:01        at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.LoggingInvoker.methodCall(LoggingInvoker.java:105)
      14:13:01        at org.jenkinsci.plugins.docker.workflow.Docker$Image.inside(Docker.groovy:140)
      14:13:01        at org.jenkinsci.plugins.docker.workflow.Docker.node(Docker.groovy:66)
      14:13:01        at org.jenkinsci.plugins.docker.workflow.Docker$Image.inside(Docker.groovy:125)
      14:13:01        at dockerHelper.runOnContainer(dockerHelper:256)
      14:13:01        at ___cps.transform___(Native Method)
      14:13:01        at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:90)
      14:13:01        at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:116)
      14:13:01        at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:85)
      14:13:01        at jdk.internal.reflect.GeneratedMethodAccessor639.invoke(Unknown Source)
      14:13:01        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      14:13:01        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
      14:13:01        at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
      14:13:01        at com.cloudbees.groovy.cps.impl.ClosureBlock.eval(ClosureBlock.java:46)
      14:13:01        at com.cloudbees.groovy.cps.Next.step(Next.java:83)
      14:13:01        at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152)
      14:13:01        at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:146)
      14:13:01        at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)
      14:13:01        at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)
      14:13:01        at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295)
      14:13:01        at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:97)
      14:13:01        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
      14:13:01        at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
      14:13:01        at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      14:13:01        at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
      14:13:01        at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
      14:13:01        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      14:13:01        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
      14:13:01        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      14:13:01        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      14:13:01        at java.base/java.lang.Thread.run(Unknown Source)
      14:13:01  Finished: FAILURE
      

            Unassigned Unassigned
            gmv gmv
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: