-
Bug
-
Resolution: Unresolved
-
Critical
-
None
There is a potential deadlock around the KubernetesProvisioningLimits functionality on initialization:
============== Deadlock Found ============== "jenkins.util.Timer [#3]" id=44 (0x2c) state=WAITING cpu=81% - waiting on <0x5bf81c5c> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x5bf81c5c> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Computer.threadPoolForRemoting [#14]" id=257 (0x101) at java.base@21.0.7/jdk.internal.misc.Unsafe.park(Native Method) at java.base@21.0.7/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221) at java.base@21.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754) at java.base@21.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990) at java.base@21.0.7/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153) at java.base@21.0.7/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322) at hudson.model.Queue._withLock(Queue.java:1408) at hudson.model.Queue.withLock(Queue.java:1284) at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.initInstance(KubernetesProvisioningLimits.java:46) at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.register(KubernetesProvisioningLimits.java:78) at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.LimitRegistrationResults.register(LimitRegistrationResults.java:29) at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud.provision(KubernetesCloud.java:698) at hudson.slaves.Cloud.lambda$provision$0(Cloud.java:192) at hudson.slaves.Cloud$$Lambda/0x00007a35ad7b7c18.get(Unknown Source) at hudson.Util.ifOverridden(Util.java:1553) at hudson.slaves.Cloud.provision(Cloud.java:192) at PluginClassLoader for kube-agent-management//com.cloudbees.jenkins.plugins.kube.KubernetesNodeProvisionerStrategy.apply(KubernetesNodeProvisionerStrategy.java:128) at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:325) at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:823) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67) at java.base@21.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at java.base@21.0.7/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358) at java.base@21.0.7/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base@21.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base@21.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base@21.0.7/java.lang.Thread.runWith(Thread.java:1596) at java.base@21.0.7/java.lang.Thread.run(Thread.java:1583) "Computer.threadPoolForRemoting [#14]" id=257 (0x101) state=BLOCKED cpu=76% - waiting to lock <0x07bae317> (a org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits) owned by "jenkins.util.Timer [#3]" id=44 (0x2c) at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.unregister(KubernetesProvisioningLimits.java:120) at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits$NodeListenerImpl.onDeleted(KubernetesProvisioningLimits.java:169) at jenkins.model.NodeListener.lambda$fireOnDeleted$2(NodeListener.java:97) at jenkins.model.NodeListener$$Lambda/0x00007a35ad351140.accept(Unknown Source) at jenkins.util.Listeners.lambda$notify$0(Listeners.java:59) at jenkins.util.Listeners$$Lambda/0x00007a35acb37708.run(Unknown Source) at jenkins.util.Listeners.notify(Listeners.java:70) at jenkins.model.NodeListener.fireOnDeleted(NodeListener.java:97) at jenkins.model.Nodes.removeNode(Nodes.java:307) at jenkins.model.Jenkins.removeNode(Jenkins.java:2197) at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:91) at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$5(OnceRetentionStrategy.java:142) at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda/0x00007a35ac9c1ba0.run(Unknown Source) at hudson.model.Queue._withLock(Queue.java:1410) at hudson.model.Queue.withLock(Queue.java:1284) at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$6(OnceRetentionStrategy.java:137) at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda/0x00007a35ac9c1988.run(Unknown Source) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51) at jenkins.util.ErrorLoggingExecutorService$$Lambda/0x00007a35ad2c67f0.run(Unknown Source) at java.base@21.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at java.base@21.0.7/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.base@21.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base@21.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base@21.0.7/java.lang.Thread.runWith(Thread.java:1596) at java.base@21.0.7/java.lang.Thread.run(Thread.java:1583)
This blocks the queue lock completely.
The initialization of the KubernetesProvisioningLimits synchronizes the object and then requires a queue lock.
On node terminations however (that can happen on startup as soon as RetentionStrategy kick off) take a Queue lock first and then need a KubernetesProvisioningLimits to unregister the node:
- links to