Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-75827

Deadlock on KubernetesProvisioningLimits during initialization

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • kubernetes-plugin
    • None

      There is a potential deadlock around the KubernetesProvisioningLimits functionality on initialization:

      ==============
      Deadlock Found
      ==============
      "jenkins.util.Timer [#3]" id=44 (0x2c) state=WAITING cpu=81%
          - waiting on <0x5bf81c5c> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
          - locked <0x5bf81c5c> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
            owned by "Computer.threadPoolForRemoting [#14]" id=257 (0x101)
          at java.base@21.0.7/jdk.internal.misc.Unsafe.park(Native Method)
          at java.base@21.0.7/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
          at java.base@21.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
          at java.base@21.0.7/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
          at java.base@21.0.7/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
          at java.base@21.0.7/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
          at hudson.model.Queue._withLock(Queue.java:1408)
          at hudson.model.Queue.withLock(Queue.java:1284)
          at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.initInstance(KubernetesProvisioningLimits.java:46)
          at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.register(KubernetesProvisioningLimits.java:78)
          at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.LimitRegistrationResults.register(LimitRegistrationResults.java:29)
          at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud.provision(KubernetesCloud.java:698)
          at hudson.slaves.Cloud.lambda$provision$0(Cloud.java:192)
          at hudson.slaves.Cloud$$Lambda/0x00007a35ad7b7c18.get(Unknown Source)
          at hudson.Util.ifOverridden(Util.java:1553)
          at hudson.slaves.Cloud.provision(Cloud.java:192)
          at PluginClassLoader for kube-agent-management//com.cloudbees.jenkins.plugins.kube.KubernetesNodeProvisionerStrategy.apply(KubernetesNodeProvisionerStrategy.java:128)
          at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:325)
          at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:823)
          at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)
          at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
          at java.base@21.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
          at java.base@21.0.7/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
          at java.base@21.0.7/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
          at java.base@21.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
          at java.base@21.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
          at java.base@21.0.7/java.lang.Thread.runWith(Thread.java:1596)
          at java.base@21.0.7/java.lang.Thread.run(Thread.java:1583)
      
      "Computer.threadPoolForRemoting [#14]" id=257 (0x101) state=BLOCKED cpu=76%
          - waiting to lock <0x07bae317> (a org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits)
            owned by "jenkins.util.Timer [#3]" id=44 (0x2c)
          at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits.unregister(KubernetesProvisioningLimits.java:120)
          at PluginClassLoader for kubernetes//org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits$NodeListenerImpl.onDeleted(KubernetesProvisioningLimits.java:169)
          at jenkins.model.NodeListener.lambda$fireOnDeleted$2(NodeListener.java:97)
          at jenkins.model.NodeListener$$Lambda/0x00007a35ad351140.accept(Unknown Source)
          at jenkins.util.Listeners.lambda$notify$0(Listeners.java:59)
          at jenkins.util.Listeners$$Lambda/0x00007a35acb37708.run(Unknown Source)
          at jenkins.util.Listeners.notify(Listeners.java:70)
          at jenkins.model.NodeListener.fireOnDeleted(NodeListener.java:97)
          at jenkins.model.Nodes.removeNode(Nodes.java:307)
          at jenkins.model.Jenkins.removeNode(Jenkins.java:2197)
          at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:91)
          at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$5(OnceRetentionStrategy.java:142)
          at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda/0x00007a35ac9c1ba0.run(Unknown Source)
          at hudson.model.Queue._withLock(Queue.java:1410)
          at hudson.model.Queue.withLock(Queue.java:1284)
          at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$6(OnceRetentionStrategy.java:137)
          at PluginClassLoader for durable-task//org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda/0x00007a35ac9c1988.run(Unknown Source)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
          at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
          at jenkins.util.ErrorLoggingExecutorService$$Lambda/0x00007a35ad2c67f0.run(Unknown Source)
          at java.base@21.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
          at java.base@21.0.7/java.util.concurrent.FutureTask.run(FutureTask.java:317)
          at java.base@21.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
          at java.base@21.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
          at java.base@21.0.7/java.lang.Thread.runWith(Thread.java:1596)
          at java.base@21.0.7/java.lang.Thread.run(Thread.java:1583)
      

      This blocks the queue lock completely.

      The initialization of the KubernetesProvisioningLimits synchronizes the object and then requires a queue lock.

      On node terminations however (that can happen on startup as soon as RetentionStrategy kick off) take a Queue lock first and then need a KubernetesProvisioningLimits to unregister the node:

            jgarciacloudbees Javier GarcĂ­a
            allan_burdajewicz Allan BURDAJEWICZ
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: