-
Bug
-
Resolution: Unresolved
-
Major
-
None
We run jobs that end up putting 600+ builds in the queue, and it will start some of them then seemingly get stuck and not update anything even though the resources are ready to run (not blocked).
k8s plugin. one loop for a label can take a very long time.
Apr 29, 2022 12:02:38 PM FINER hudson.slaves.NodeProvisioner ran update on k8s-beaker in 3378791ms
While this slow update is running, a lock is kept and any suggesting to review the label is ignored, so the behaviour we notice is a system with a lot of jobs that could run, but jenkins not spinning up new k8s pods to fullfil the requests, until eventually the update completes and then a bunch of them start. In some of our scenarios it can be stuck like this for about one hour (see above timer).
[JENKINS-68371] NodeProvisioner stuck / slow under load
Component/s | New: kubernetes-plugin [ 20639 ] | |
Component/s | Original: core [ 15593 ] |
Description |
Original:
We run jobs that end up putting 600+ builds in the queue, and it will start some of them then seemingly get stuck and not update anything even though the resources are ready to run (not blocked).
k8s plugin. one loop for a label can take a very long time. {code:java} Apr 29, 2022 12:02:38 PM FINER hudson.slaves.NodeProvisioner ran update on k8s-beaker in 3378791ms{code} |
New:
We run jobs that end up putting 600+ builds in the queue, and it will start some of them then seemingly get stuck and not update anything even though the resources are ready to run (not blocked).
k8s plugin. one loop for a label can take a very long time. {code:java} Apr 29, 2022 12:02:38 PM FINER hudson.slaves.NodeProvisioner ran update on k8s-beaker in 3378791ms{code} While this slow update is running, a lock is kept and any suggesting to review the label is ignored, so the behaviour we notice is a system with a lot of jobs that could run, but jenkins not spinning up new k8s pods to fullfil the requests, until eventually the update completes and then a bunch of them start. In some of our scenarios it can be stuck like this for about one hour (see above timer). |