Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37154

Attempting to abort an input step hangs waiting for metadata

      Observed in a thread dump:

      "Running CpsFlowExecution[Owner[.../...:... #...]]" id=... state=WAITING cpu=70%
          - waiting on <0x...> (a com.google.common.util.concurrent.AbstractFuture$Sync)
          - locked <0x...> (a com.google.common.util.concurrent.AbstractFuture$Sync)
          at sun.misc.Unsafe.park(Native Method)
          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
          at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275)
          at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111)
          at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.loadExecutions(InputAction.java:69)
          at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.remove(InputAction.java:141)
            - locked org.jenkinsci.plugins.workflow.support.steps.input.InputAction@4f8d2160
          at org.jenkinsci.plugins.workflow.support.steps.input.InputStepExecution.postSettlement(InputStepExecution.java:222)
          at org.jenkinsci.plugins.workflow.support.steps.input.InputStepExecution.doAbort(InputStepExecution.java:191)
          at org.jenkinsci.plugins.workflow.support.steps.input.InputStepExecution.stop(InputStepExecution.java:80)
          at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$1.onSuccess(CpsBodyExecution.java:210)
          at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$1.onSuccess(CpsBodyExecution.java:199)
          at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:568)
          at ...
      

      InputAction.loadExecutions currently needs to use a weak API which forces it to block. See discussion here.

          [JENKINS-37154] Attempting to abort an input step hangs waiting for metadata

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/org/jenkinsci/plugins/workflow/support/steps/input/InputAction.java
          http://jenkins-ci.org/commit/pipeline-input-step-plugin/6efdc1fd5c8abd4daa840f4bc938d901e80cabdd
          Log:
          Noting JENKINS-37154.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/org/jenkinsci/plugins/workflow/support/steps/input/InputAction.java http://jenkins-ci.org/commit/pipeline-input-step-plugin/6efdc1fd5c8abd4daa840f4bc938d901e80cabdd Log: Noting JENKINS-37154 .

          Jesse Glick added a comment -

          Worse is that this seems to cause a pile-on hang in many threads in the stage view for the job, even if only one build is so affected:

          "Handling GET /job/…/wfapi/runs from … : RequestHandlerThread[#…]" id=… state=BLOCKED cpu=87%
              - waiting to lock <0x…> (a org.jenkinsci.plugins.workflow.support.steps.input.InputAction)
                owned by "Running CpsFlowExecution[Owner[…/…:… #…]]" id=…
              at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.getExecutions(InputAction.java:133)
              at com.cloudbees.workflow.rest.external.RunExt.isPendingInput(RunExt.java:386)
              at com.cloudbees.workflow.rest.external.RunExt.initStatus(RunExt.java:403)
              at com.cloudbees.workflow.rest.external.RunExt.createOld(RunExt.java:319)
              at com.cloudbees.workflow.rest.external.RunExt.create(RunExt.java:303)
              at com.cloudbees.workflow.rest.external.JobExt.create(JobExt.java:126)
              at com.cloudbees.workflow.rest.endpoints.JobAPI.doRuns(JobAPI.java:68)
          

          Jesse Glick added a comment - Worse is that this seems to cause a pile-on hang in many threads in the stage view for the job, even if only one build is so affected: "Handling GET /job/…/wfapi/runs from … : RequestHandlerThread[#…]" id=… state=BLOCKED cpu=87% - waiting to lock <0x…> (a org.jenkinsci.plugins.workflow.support.steps.input.InputAction) owned by "Running CpsFlowExecution[Owner[…/…:… #…]]" id=… at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.getExecutions(InputAction.java:133) at com.cloudbees.workflow.rest.external.RunExt.isPendingInput(RunExt.java:386) at com.cloudbees.workflow.rest.external.RunExt.initStatus(RunExt.java:403) at com.cloudbees.workflow.rest.external.RunExt.createOld(RunExt.java:319) at com.cloudbees.workflow.rest.external.RunExt.create(RunExt.java:303) at com.cloudbees.workflow.rest.external.JobExt.create(JobExt.java:126) at com.cloudbees.workflow.rest.endpoints.JobAPI.doRuns(JobAPI.java:68)

          Jesse Glick added a comment -

          It seems that under certain conditions, this hang can occur simply by trying to abort a WorkflowRun paused in input after a restart. Seems to happen only if loadExecutions did not get called before (for example, the UI for the build was not displayed), and input was inside some block-scoped step. Even then it does not happen consistently, so evidently a race condition is at play.

          Jesse Glick added a comment - It seems that under certain conditions, this hang can occur simply by trying to abort a WorkflowRun paused in input after a restart. Seems to happen only if loadExecutions did not get called before (for example, the UI for the build was not displayed), and input was inside some block-scoped step. Even then it does not happen consistently, so evidently a race condition is at play.

          Jesse Glick added a comment -

          Occasional deadlock in a functional test I added:

          "Running CpsFlowExecution[Owner[p/1:p #1]]" #47 daemon prio=5 os_prio=0 tid=0x00007f6c6c024800 nid=0x2da6 waiting on condition [0x00007f6c62778000]
             java.lang.Thread.State: WAITING (parking)
          	at sun.misc.Unsafe.park(Native Method)
          	- parking to wait for  <0x0000000775538078> (a com.google.common.util.concurrent.AbstractFuture$Sync)
          	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
          	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
          	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
          	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
          	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275)
          	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111)
          	at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.loadExecutions(InputAction.java:66)
          	- locked <0x000000076dc6f3d8> (a org.jenkinsci.plugins.workflow.support.steps.input.InputAction)
          	at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.remove(InputAction.java:138)
          	- locked <0x000000076dc6f3d8> (a org.jenkinsci.plugins.workflow.support.steps.input.InputAction)
          	at org.jenkinsci.plugins.workflow.support.steps.input.InputStepExecution.postSettlement(InputStepExecution.java:220)
          	at org.jenkinsci.plugins.workflow.support.steps.input.InputStepExecution.doAbort(InputStepExecution.java:188)
          	at org.jenkinsci.plugins.workflow.support.steps.input.InputStepExecution.stop(InputStepExecution.java:80)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:795)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:789)
          	at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:150)
          	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
          	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
          	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134)
          	at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170)
          	at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:662)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:649)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:586)
          	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:32)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
          	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          	at java.lang.Thread.run(Thread.java:745)
          

          Jesse Glick added a comment - Occasional deadlock in a functional test I added: "Running CpsFlowExecution[Owner[p/1:p #1]]" #47 daemon prio=5 os_prio=0 tid=0x00007f6c6c024800 nid=0x2da6 waiting on condition [0x00007f6c62778000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000775538078> (a com.google.common.util.concurrent.AbstractFuture$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111) at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.loadExecutions(InputAction.java:66) - locked <0x000000076dc6f3d8> (a org.jenkinsci.plugins.workflow.support.steps.input.InputAction) at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.remove(InputAction.java:138) - locked <0x000000076dc6f3d8> (a org.jenkinsci.plugins.workflow.support.steps.input.InputAction) at org.jenkinsci.plugins.workflow.support.steps.input.InputStepExecution.postSettlement(InputStepExecution.java:220) at org.jenkinsci.plugins.workflow.support.steps.input.InputStepExecution.doAbort(InputStepExecution.java:188) at org.jenkinsci.plugins.workflow.support.steps.input.InputStepExecution.stop(InputStepExecution.java:80) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:795) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:789) at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:150) at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253) at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149) at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134) at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170) at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:662) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:649) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:586) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:32) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

          Jesse Glick added a comment -

          My first attempt was to define

          class FlowExecutionList {
            // …
            public Iterable<FlowExecutionOwner> getOwners() {/* … */}
          }
          class FlowExecutionOwner {
            // …
            public @Nonnull ListenableFuture<FlowExecution> getPromise() {/* … */}
          }
          

          and to call these things from InputAction.onLoad, using Futures.addCallback to nest asynchronous stuff. This failed with a StackOverflowError in spite of WorkflowRun.LOADING_RUNS:

          at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:470)
          at hudson.model.RunMap.retrieve(RunMap.java:224)
          at hudson.model.RunMap.retrieve(RunMap.java:56)
          at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:479)
          at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:461)
          at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:367)
          at hudson.model.RunMap.getById(RunMap.java:204)
          at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:723)
          at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.getExecutable(WorkflowRun.java:773)
          at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.onLoad(InputAction.java:57)
          at hudson.model.Run.onLoad(Run.java:346)
          at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:470)
          

          My second attempt was to call loadExecutions in a background thread from onLoaded, in the hope that it would complete before we try to use executions. This sporadically failed, as it seems to have gotten a CpsFlowExecution on which onLoad had not yet been called.

          Jesse Glick added a comment - My first attempt was to define class FlowExecutionList { // … public Iterable<FlowExecutionOwner> getOwners() { /* … */ } } class FlowExecutionOwner { // … public @Nonnull ListenableFuture<FlowExecution> getPromise() { /* … */ } } and to call these things from InputAction.onLoad , using Futures.addCallback to nest asynchronous stuff. This failed with a StackOverflowError in spite of WorkflowRun.LOADING_RUNS : at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:470) at hudson.model.RunMap.retrieve(RunMap.java:224) at hudson.model.RunMap.retrieve(RunMap.java:56) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:479) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:461) at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:367) at hudson.model.RunMap.getById(RunMap.java:204) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:723) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.getExecutable(WorkflowRun.java:773) at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.onLoad(InputAction.java:57) at hudson.model.Run.onLoad(Run.java:346) at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:470) My second attempt was to call loadExecutions in a background thread from onLoaded , in the hope that it would complete before we try to use executions . This sporadically failed, as it seems to have gotten a CpsFlowExecution on which onLoad had not yet been called.

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: