Jenkins version 2.105 (latest)

      All plugins updated

      I have hundreds of pipeline jobs that are showing queued against the master, but not assigned to an executor.  If I check each job, all the jobs are already completed (success or failures).  They're just stuck.  Restarting the master doesn't seem to help.

      Job logs all show evidence of trying to resume the job after master restart:

      [Pipeline] End of Pipeline Resuming build at Tue Feb 06 01:27:30 UTC 2018 after Jenkins restart [Pipeline] End of Pipeline

      There are a ton of entries in org.jenkinsci.plugins.workflow.flow.FlowExecutionList.xml – are these the stuck jobs?  I tried clearing them and bouncing the master, but they 'come back'

      Eventually it seems like the master gets overloaded with these stuck jobs, and stops processing or dispatching jobs to slaves.

          [JENKINS-49389] Completed pipeline jobs queued against master

          John Arnold created issue -
          John Arnold made changes -
          Attachment New: thread_dump.txt [ 41341 ]

          John Arnold added a comment - - edited

          Attached a copy-paste of the /threadDump page, which shows both the list of jobs queued against the master, and all the threads associated. Note, there are a ton of these threads:

           

          //
          org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#514]
          "org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#514]" Id=7005 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@c7a8190
           at sun.misc.Unsafe.park(Native Method)
           -  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@c7a8190
           at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
           at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
           at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:1129)
           at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:809)
           at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
          

           

           

          John Arnold added a comment - - edited Attached a copy-paste of the /threadDump page, which shows both the list of jobs queued against the master, and all the threads associated. Note, there are a ton of these threads:   // org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#514] "org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#514]" Id=7005 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@c7a8190  at sun.misc.Unsafe.park(Native Method)  -  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@c7a8190  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:1129)  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:809)  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  at java.lang. Thread .run( Thread .java:748)    
          John Arnold made changes -
          Component/s New: workflow-durable-task-step-plugin [ 21715 ]

          John Arnold added a comment -

          jglick svanoort Can you take a look? I can provide any data.

          John Arnold added a comment - jglick svanoort Can you take a look? I can provide any data.

          John Arnold added a comment -

          I picked a stuck build, did 'Delete Build' in the GUI, and it still shows up as a stuck/queued build under the master.

          John Arnold added a comment - I picked a stuck build, did 'Delete Build' in the GUI, and it still shows up as a stuck/queued build under the master.

          John Arnold added a comment -

          It seems like there should be a check – Jenkins should never attempt to resume a completed job.  Also seems like resume should timeout after some low threshold, 300sec default or something.

          John Arnold added a comment - It seems like there should be a check – Jenkins should never attempt to resume a completed job.  Also seems like resume should timeout after some low threshold, 300sec default or something.
          Oleg Nenashev made changes -
          Component/s New: pipeline [ 21692 ]

          Oleg Nenashev added a comment -

          Added to the Pipeline scrub queue, CC abayer svanoort

          Oleg Nenashev added a comment - Added to the Pipeline scrub queue, CC abayer svanoort
          Sam Van Oort made changes -
          Labels New: maybe-fixed-by-durability-megafix

            Unassigned Unassigned
            johnar John Arnold
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: