Stopped builds keep displaying in executor status

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      Regularly, we have builds that seems to starve on executor status while they are stopped (success, failed or canceled).

      This has two direct effects:

      1. Executors are locked and can't be used by other builds
      1. We use cloud nodes (Google Compute Engine) with retention time. Thus, VMs are not reclaimed and uncessarly increase cost by maintaining unused Cloud resources.

      I don't know what causing the issue. But here are some notable points:

      • If I click on red cross (from executor status pane), confirmation dialog message is: "Are you sure you want to abort null"
      • Red cross link is <JENKINS_URL>/computer/<NODE_NAME>/executors/0/stopBuild?runExtId=
      • After deleting node from node management UI, it appears disconnected (instead of being removed BUT VM is properly destroyed) with an executor line "Unknown Pipeline node step" (same red cross URL as above)
      • When accessing "Pipeline Steps" for the build, we have a request processing error (see attached log: 2022-11-07_08-13-55_starving_nodes.log)
      • Looking at running thread (see Console Script below), I have several org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep <NUMBER> waiting (see output below) :
      Thread.getAllStackTraces().each { thread,traces -> 
        println "\n ================================ ${thread.name} [${thread.state}] ================================"
        traces.each { println it }
      }
      
       ================================ org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#91456] [TIMED_WAITING] ================================
      java.base@11.0.16.1/jdk.internal.misc.Unsafe.park(Native Method)
      java.base@11.0.16.1/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
      java.base@11.0.16.1/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
      java.base@11.0.16.1/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:1218)
      java.base@11.0.16.1/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.poll(ScheduledThreadPoolExecutor.java:899)
      java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1053)
      java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
      java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829)
      

            Assignee:
            Unassigned
            Reporter:
            Logan Mzz
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Archived: