Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-73826

Thread dead lock causing jenkins to unresponsive after updating kubernetes plugin

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • kubernetes-plugin
    • None

      We are currently in the process of upgrading our Jenkins version from 2.387.3 to 2.414.3. Following the upgrade, Jenkins appears to be functioning normally without requiring any plugin updates.

      As part of the upgrade process, we updated all possible plugins through the UI, which did not display any warnings. Notably, the update included the SnakeYAML plugin. However, upon further investigation, we discovered an issue with our existing Kubernetes plugins, specifically:

      • Kubernetes: 3937.vd7b_82db_e347b_
      • Kubernetes-cli: 1.12.0
      • Kubernetes-client-api: 6.4.1-215.v2ed17097a_8e9
      • Kubernetes-credentials: 0.10.0

      To address this issue and anyway we need to upgrade kubernetes plugin, we updated the SSH credentials, Kubernetes credentials, and the Kubernetes and Kubernetes CLI plugins to the following versions:

      • Kubernetes: 4054.v2da_8e2794884
      • Kubernetes-cli: 1.12.1
      • Kubernetes-client-api: 6.10.0-240.v57880ce8b_0b_2
      • Kubernetes-credentials: 174.va_36e093562d9

      I have attached the plugins.txt files before and after jenkins&plugin update. We used 2.414.3-lts-rhel-ubi9-jdk17 image for jenkins upgrade. 

      After running 2-3 jobs, Jenkins becomes unresponsive and displays the following error logs:

      2024-09-27 11:11:11.827+0000 [id=565] WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [jenkins.util.Timer [#1] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@1566b90c (owned by Computer.threadPoolForRemoting [#13]):
       at java.base@17.0.8.1/jdk.internal.misc.Unsafe.park(Native Method)
       at java.base@17.0.8.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
       at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
       at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:938)
       at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
       at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
       at hudson.model.Queue.maintain(Queue.java:1481)
       at hudson.model.Queue$MaintainTask.doRun(Queue.java:2919)
       at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)
       at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
       at java.base@17.0.8.1/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
       at java.base@17.0.8.1/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
       at java.base@17.0.8.1/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
       at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       at java.base@17.0.8.1/java.lang.Thread.run(Thread.java:833)
      , Computer.threadPoolForRemoting [#8] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@1566b90c (owned by Computer.threadPoolForRemoting [#13]):
       at java.base@17.0.8.1/jdk.internal.misc.Unsafe.park(Native Method)
       at java.base@17.0.8.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
       at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
       at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:938)
       at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
       at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
       at hudson.model.Queue._withLock(Queue.java:1456)
       at hudson.model.Queue.withLock(Queue.java:1314)
       at jenkins.model.Nodes.updateNode(Nodes.java:201)
       at jenkins.model.Jenkins.updateNode(Jenkins.java:2252)
       at hudson.model.Node.save(Node.java:143)
       at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:247)
       at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:297)
       at hudson.slaves.SlaveComputer$$Lambda$999/0x00000008010d5a10.call(Unknown Source)
       at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
       at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
       at java.base@17.0.8.1/java.util.concurrent.FutureTask.run(FutureTask.java:264)
       at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       at java.base@17.0.8.1/java.lang.Thread.run(Thread.java:833)
      , Computer.threadPoolForRemoting [#13] locked on org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher@517c803b (owned by Computer.threadPoolForRemoting [#8]):
       at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.isLaunchSupported(KubernetesLauncher.java:91)
       at hudson.slaves.SlaveComputer.isLaunchSupported(SlaveComputer.java:247)
       at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.check(OnceRetentionStrategy.java:81)
       at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.check(OnceRetentionStrategy.java:46)
       at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:960)
       at hudson.model.Queue._withLock(Queue.java:1397)
       at hudson.model.Queue.withLock(Queue.java:1271)
       at hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:957)
       at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:147)
       at hudson.model.AbstractCIBase$1.run(AbstractCIBase.java:255)
       at hudson.model.Queue._withLock(Queue.java:1397)
       at hudson.model.Queue.withLock(Queue.java:1271)
       at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238)
       at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1693)
       at jenkins.model.Nodes$5.run(Nodes.java:279)
       at hudson.model.Queue._withLock(Queue.java:1397)
       at hudson.model.Queue.withLock(Queue.java:1271)
       at jenkins.model.Nodes.removeNode(Nodes.java:270)
       at jenkins.model.Jenkins.removeNode(Jenkins.java:2238)
       at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:91)
       at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$5(OnceRetentionStrategy.java:142)
       at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda$1353/0x00000008013c5220.run(Unknown Source)
       at hudson.model.Queue._withLock(Queue.java:1397)
       at hudson.model.Queue.withLock(Queue.java:1271)
       at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$6(OnceRetentionStrategy.java:137)
       at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda$1352/0x00000008013c4ff8.run(Unknown Source)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
       at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
       at jenkins.util.ErrorLoggingExecutorService$$Lambda$742/0x0000000800e57420.run(Unknown Source)
       at java.base@17.0.8.1/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
       at java.base@17.0.8.1/java.util.concurrent.FutureTask.run(FutureTask.java:264)
       at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       at java.base@17.0.8.1/java.lang.Thread.run(Thread.java:833)
      2024-09-27 11:12:36.768+0000 [id=575] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #13 from /127.0.0.1:56944 failed: null
      2024-09-27 11:13:40.300+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 5 sec
      2024-09-27 11:13:45.301+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 10 sec
      2024-09-27 11:13:50.302+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 15 sec
      2024-09-27 11:13:55.302+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 20 sec
      2024-09-27 11:14:00.303+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 25 sec
      2024-09-27 11:14:05.303+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 30 sec
      2024-09-27 11:14:10.304+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 35 sec
       Computer.threadPoolForRemoting [#13] locked on org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher@517c803b (owned by Computer.threadPoolForRemoting [#8]):
       at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.isLaunchSupported(KubernetesLauncher.java:91)
      2024-09-27 11:14:15.304+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 40 sec
      2024-09-27 11:14:20.305+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 45 sec
      2024-09-27 11:14:25.306+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 50 sec
      2024-09-27 11:14:30.306+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 55 sec
      2024-09-27 11:14:35.307+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15]] unresponsive for 1 min 0 sec
      2024-09-27 11:14:36.768+0000 [id=587] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #14 from /127.0.0.1:56958 failed: null
      

       

      Kindly help us in resolving the issue as this is blocking our upgrade. 

        1. plugins.txt
          11 kB
          Bhavani
        2. pluginsbeforeupgrade.txt
          5 kB
          Bhavani

            skbarkat444 Barkat Shaik
            bhavani_indukuri Bhavani
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: