Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-60507

Pipeline stuck when allocating machine | node block appears to be neither running nor scheduled

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

       Our build system is sometimes showing this in the Thread Dump of a Pipeline while waiting for free executors

      Thread #94
      at DSL.node(node block appears to be neither running nor scheduled)
      at WorkflowScript.runOnNode(WorkflowScript:1798)
      at DSL.timeout(body has another 3 hr 14 min to run)
      at WorkflowScript.runOnNode(WorkflowScript:1783)
      at DSL.retry(Native Method)
      at WorkflowScript.runOnNode(WorkflowScript:1781)
      at WorkflowScript.getClosure(WorkflowScript:1901)

       
      In BlueOcean this appears, but the build queue is empty, and executors are available with those labels.

      Still waiting to schedule task
      Waiting for next available executor on pr&&prod&&mac&&build

       

      The job can only be completed by aborting or waiting for the timeout step to do it’s work.

      We started observing it since v2.121.3 (workflow-durable-task-step v2.19) but recently we updated to v2.190.1 (workflow-durable-task-step v2.28) and still seeing stuck pipelines when waiting for executors.

      The only reference I could find was in the last comment of this issue: https://issues.jenkins-ci.org/browse/JENKINS-42556 and there’s no way we can reproduce it. We’ve noticed this fix made by Jesse Glick but not sure if it will help us. We tried turning on Anonymous for a week and we still saw the problem.

      Please let me know if there’s more information/logs that I can help with to track down what might be the cause of this. Thanks.

      I've attached FINEST level logs on hudson.model.Queue, not sure if that will help a lot.
      Our Jenkins runs on RedHat, on Tomcat/9.0.14 and Java 1.8.0_171.

        Attachments

        1. plugins_versions.txt
          5 kB
          Mihai Stoichitescu
        2. screenshot-1.png
          40 kB
          Mihai Stoichitescu

          Activity

          Hide
          kdemenkov Konstantin Demenkov added a comment - - edited

          I have the same issue on latest 2.204.1 LTS. It appears pretty often (10% of jobs) in working with proxmox slaves over proxmox cloud plugin and jnlp. I suspect some incompatibility in timeouts/ connection's logic between master and proxmox slaves, but really don't know, why it happens.

          Show
          kdemenkov Konstantin Demenkov added a comment - - edited I have the same issue on latest 2.204.1 LTS. It appears pretty often (10% of jobs) in working with proxmox slaves over proxmox cloud plugin and jnlp. I suspect some incompatibility in timeouts/ connection's logic between master and proxmox slaves, but really don't know, why it happens.
          Hide
          stoiky Mihai Stoichitescu added a comment -

          We are still being hit by the issue from time to time, any ideas/workarounds/help to debug would be appreciated. Thanks

          Show
          stoiky Mihai Stoichitescu added a comment - We are still being hit by the issue from time to time, any ideas/workarounds/help to debug would be appreciated. Thanks

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            stoiky Mihai Stoichitescu
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated: