Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47006

durable-task's BourneShellScript.launchWithCookie trips workflow-cps-plugin's 5-minute timeout

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Minor Minor
    • None

      I ran a pipeline job which failed in this way:

      // [...] Start a VM and have the VM connect back as a swarm slave named "example-eff65ede"
      [Pipeline] node
      22:44:19 Running on example-eff65ede in /var/tmp/jenkins/workspace/devops-gate/master/install-os-usher
      [Pipeline] {
      [Pipeline] stage (Unset publishers)
      22:44:19 Entering stage Unset publishers
      22:44:19 Proceeding
      [Pipeline] sh
      [Pipeline] stage (Send Notifications)
      22:49:19 Using the ‘stage’ step without a block argument is deprecated
      22:49:19 Entering stage Send Notifications
      22:49:19 Proceeding
      22:49:19 Sending email to: example@example.com
      [Pipeline] emailext
      [Pipeline] }
      [Pipeline] // timestamps
      [Pipeline] End of Pipeline
      java.lang.InterruptedException
       at java.lang.Object.wait(Native Method)
       at hudson.remoting.Request.call(Request.java:147)
       at hudson.remoting.Channel.call(Channel.java:829)
       at hudson.FilePath.act(FilePath.java:987)
       at hudson.FilePath.act(FilePath.java:976)
       at hudson.FilePath.chmod(FilePath.java:1592)
       at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:101)
       at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:64)
       at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:167)
       at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:224)
       at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:150)
       at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:108)
       at sun.reflect.GeneratedMethodAccessor1275.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
       at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
       at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1218)
       at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1027)
       at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
       at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
       at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
       at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:155)
       at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
       at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:133)
       at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:153)
       at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:157)
       at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
       at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
       at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
       at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
       at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:127)
       at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
       at WorkflowScript.run(WorkflowScript:65)
       at ___cps.transform___(Native Method)
       at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57)
       at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
       at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
       at sun.reflect.GeneratedMethodAccessor609.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
       at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:103)
       at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
       at sun.reflect.GeneratedMethodAccessor609.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
       at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:60)
       at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109)
       at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82)
       at sun.reflect.GeneratedMethodAccessor609.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
       at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
       at com.cloudbees.groovy.cps.Next.step(Next.java:83)
       at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
       at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
       at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
       at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
       at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
       at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19)
       at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35)
       at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32)
       at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
       at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32)
       at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
       Finished: FAILURE
      

      This is interesting. First, the swarm plugin has connected back to the master at 22:44:19 and so we successfully enter the node block and start running on the slave. Then we enter an sh step. This step seems to be taking a very long time and then exactly 5 minutes later at 22:49:19 we get an interrupt.

      At the time that we get the interrupt, we're in hudson.FilePath.chmod(FilePath.java:1592) via BourneShellScript.launchWithCookie via DurableTaskStep$Execution.start. Essentially we are trying to start the execution of the BourneShellScriptDurableTaskStep and as part of its launch method it is trying to chmod the script to 0755. This seems to be hanging or at the very least taking quite a long time. Five minutes later we get an interrupt.

      Where could this interrupt have come from? Further up in the stack trace, we see org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174). Looking at the source of that method, there is a 5-minute timer on each instruction in the CPS VM thread introduced by JENKINS-32986. This seems likely to be what is causing the interrupt.

      I read JENKINS-32986 as well as related bug JENKINS-42561. It seems that the suggestion there was to use DurableStep. But the thing is, I am using a DurableStep here already, in particular the durable-task-plugin and workflow-durable-task-step-plugin. The advice given in JENKINS-42561 was to implement Step directly, but it appears that is already done in BourneShellScript.

      Unfortunately, I don't have the logs on the swarm client side (I am trying to get those), but the VM was running on a network quite far away from the Jenkins master. It was also still booting up at the time we started the swarm client, so it's possible that performance was bad either due to the machine still booting up or the network being very poor (or both). Either way, it sounds like the Durable Task Plugin needs to account for the possibility that this work may take a long time and do it outside the VM CPS thread.

            Unassigned Unassigned
            basil Basil Crow
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: