Details
-
Type:
Bug
-
Status: Open (View Workflow)
-
Priority:
Major
-
Resolution: Unresolved
-
Component/s: core, workflow-durable-task-step-plugin
-
Labels:None
-
Similar Issues:
Description
Our build system is sometimes showing this in the Thread Dump of a Pipeline while waiting for free executors
Thread #94
at DSL.node(node block appears to be neither running nor scheduled)
at WorkflowScript.runOnNode(WorkflowScript:1798)
at DSL.timeout(body has another 3 hr 14 min to run)
at WorkflowScript.runOnNode(WorkflowScript:1783)
at DSL.retry(Native Method)
at WorkflowScript.runOnNode(WorkflowScript:1781)
at WorkflowScript.getClosure(WorkflowScript:1901)
In BlueOcean this appears, but the build queue is empty, and executors are available with those labels.
Still waiting to schedule task
Waiting for next available executor on pr&&prod&&mac&&build
The job can only be completed by aborting or waiting for the timeout step to do it’s work.
We started observing it since v2.121.3 (workflow-durable-task-step v2.19) but recently we updated to v2.190.1 (workflow-durable-task-step v2.28) and still seeing stuck pipelines when waiting for executors.
The only reference I could find was in the last comment of this issue: https://issues.jenkins-ci.org/browse/JENKINS-42556 and there’s no way we can reproduce it. We’ve noticed this fix made by Jesse Glick but not sure if it will help us. We tried turning on Anonymous for a week and we still saw the problem.
Please let me know if there’s more information/logs that I can help with to track down what might be the cause of this. Thanks.
I've attached FINEST level logs on hudson.model.Queue, not sure if that will help a lot.
Our Jenkins runs on RedHat, on Tomcat/9.0.14 and Java 1.8.0_171.
I have the same issue on latest 2.204.1 LTS. It appears pretty often (10% of jobs) in working with proxmox slaves over proxmox cloud plugin and jnlp. I suspect some incompatibility in timeouts/ connection's logic between master and proxmox slaves, but really don't know, why it happens.