Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-73826

Thread dead lock causing jenkins to unresponsive after updating kubernetes plugin

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • kubernetes-plugin
    • None

      We are currently in the process of upgrading our Jenkins version from 2.387.3 to 2.414.3. Following the upgrade, Jenkins appears to be functioning normally without requiring any plugin updates.

      As part of the upgrade process, we updated all possible plugins through the UI, which did not display any warnings. Notably, the update included the SnakeYAML plugin. However, upon further investigation, we discovered an issue with our existing Kubernetes plugins, specifically:

      • Kubernetes: 3937.vd7b_82db_e347b_
      • Kubernetes-cli: 1.12.0
      • Kubernetes-client-api: 6.4.1-215.v2ed17097a_8e9
      • Kubernetes-credentials: 0.10.0

      To address this issue and anyway we need to upgrade kubernetes plugin, we updated the SSH credentials, Kubernetes credentials, and the Kubernetes and Kubernetes CLI plugins to the following versions:

      • Kubernetes: 4054.v2da_8e2794884
      • Kubernetes-cli: 1.12.1
      • Kubernetes-client-api: 6.10.0-240.v57880ce8b_0b_2
      • Kubernetes-credentials: 174.va_36e093562d9

      I have attached the plugins.txt files before and after jenkins&plugin update. We used 2.414.3-lts-rhel-ubi9-jdk17 image for jenkins upgrade. 

      After running 2-3 jobs, Jenkins becomes unresponsive and displays the following error logs:

      2024-09-27 11:11:11.827+0000 [id=565] WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [jenkins.util.Timer 1 locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@1566b90c (owned by Computer.threadPoolForRemoting 13):

      at java.base@17.0.8.1/jdk.internal.misc.Unsafe.park(Native Method)

      at java.base@17.0.8.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)

      at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)

      at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:938)

      at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)

      at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)

      at hudson.model.Queue.maintain(Queue.java:1481)

      at hudson.model.Queue$MaintainTask.doRun(Queue.java:2919)

      at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)

      at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)

      at java.base@17.0.8.1/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)

      at java.base@17.0.8.1/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)

      at java.base@17.0.8.1/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)

      at java.base@17.0.8.1/java.lang.Thread.run(Thread.java:833)

      , Computer.threadPoolForRemoting 8 locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@1566b90c (owned by Computer.threadPoolForRemoting 13):

      at java.base@17.0.8.1/jdk.internal.misc.Unsafe.park(Native Method)

      at java.base@17.0.8.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)

      at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)

      at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:938)

      at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)

      at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)

      at hudson.model.Queue._withLock(Queue.java:1456)

      at hudson.model.Queue.withLock(Queue.java:1314)

      at jenkins.model.Nodes.updateNode(Nodes.java:201)

      at jenkins.model.Jenkins.updateNode(Jenkins.java:2252)

      at hudson.model.Node.save(Node.java:143)

      at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:247)

      at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:297)

      at hudson.slaves.SlaveComputer$$Lambda$999/0x00000008010d5a10.call(Unknown Source)

      at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)

      at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)

      at java.base@17.0.8.1/java.util.concurrent.FutureTask.run(FutureTask.java:264)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)

      at java.base@17.0.8.1/java.lang.Thread.run(Thread.java:833)

      , Computer.threadPoolForRemoting 13 locked on org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher@517c803b (owned by Computer.threadPoolForRemoting 8):

      at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.isLaunchSupported(KubernetesLauncher.java:91)

      at hudson.slaves.SlaveComputer.isLaunchSupported(SlaveComputer.java:247)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.check(OnceRetentionStrategy.java:81)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.check(OnceRetentionStrategy.java:46)

      at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:960)

      at hudson.model.Queue._withLock(Queue.java:1397)

      at hudson.model.Queue.withLock(Queue.java:1271)

      at hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:957)

      at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:147)

      at hudson.model.AbstractCIBase$1.run(AbstractCIBase.java:255)

      at hudson.model.Queue._withLock(Queue.java:1397)

      at hudson.model.Queue.withLock(Queue.java:1271)

      at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238)

      at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1693)

      at jenkins.model.Nodes$5.run(Nodes.java:279)

      at hudson.model.Queue._withLock(Queue.java:1397)

      at hudson.model.Queue.withLock(Queue.java:1271)

      at jenkins.model.Nodes.removeNode(Nodes.java:270)

      at jenkins.model.Jenkins.removeNode(Jenkins.java:2238)

      at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:91)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$5(OnceRetentionStrategy.java:142)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda$1353/0x00000008013c5220.run(Unknown Source)

      at hudson.model.Queue._withLock(Queue.java:1397)

      at hudson.model.Queue.withLock(Queue.java:1271)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$6(OnceRetentionStrategy.java:137)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda$1352/0x00000008013c4ff8.run(Unknown Source)

      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)

      at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)

      at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)

      at jenkins.util.ErrorLoggingExecutorService$$Lambda$742/0x0000000800e57420.run(Unknown Source)

      at java.base@17.0.8.1/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)

      at java.base@17.0.8.1/java.util.concurrent.FutureTask.run(FutureTask.java:264)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)

      at java.base@17.0.8.1/java.lang.Thread.run(Thread.java:833)

       

      2024-09-27 11:12:36.768+0000 [id=575] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #13 from /127.0.0.1:56944 failed: null

      2024-09-27 11:13:40.300+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 5 sec

      2024-09-27 11:13:45.301+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 10 sec

      2024-09-27 11:13:50.302+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 15 sec

      2024-09-27 11:13:55.302+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 20 sec

      2024-09-27 11:14:00.303+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 25 sec

      2024-09-27 11:14:05.303+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 30 sec

      2024-09-27 11:14:10.304+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 35 sec

      Computer.threadPoolForRemoting 13 locked on org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher@517c803b (owned by Computer.threadPoolForRemoting 8):

      at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.isLaunchSupported(KubernetesLauncher.java:91)

      2024-09-27 11:14:15.304+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 40 sec

      2024-09-27 11:14:20.305+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 45 sec

      2024-09-27 11:14:25.306+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 50 sec

      2024-09-27 11:14:30.306+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 55 sec

      2024-09-27 11:14:35.307+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 1 min 0 sec

      2024-09-27 11:14:36.768+0000 [id=587] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #14 from /127.0.0.1:56958 failed: null

       

      Kindly help us in resolving the issue as this is blocking our upgrade. 

          [JENKINS-73826] Thread dead lock causing jenkins to unresponsive after updating kubernetes plugin

          Mark Waite added a comment -

          Kindly help us in resolving the issue as this is blocking our upgrade.

          Most of the people that review issues here are interested in helping with issues reported on the current Jenkins release. That increases the chances that the help will also assist others.
          You've chosen to run a Jenkins version that has multiple known critical security vulnerabilities. That increases your risk and decreases the chances of others helping you. If you're able to see the same problems with the most recent release, you should share those details.

          Mark Waite added a comment - Kindly help us in resolving the issue as this is blocking our upgrade. Most of the people that review issues here are interested in helping with issues reported on the current Jenkins release. That increases the chances that the help will also assist others. You've chosen to run a Jenkins version that has multiple known critical security vulnerabilities. That increases your risk and decreases the chances of others helping you. If you're able to see the same problems with the most recent release, you should share those details.

          Bhavani added a comment -

          markewaite  We are facing many issues, while trying to jump from 2.387.3 to the latest jenkins version. as we are very much far behind the latest jenkins LTS version. So this is the reason, we are trying to update in smaller jump in version and repeat the same until we do reach to the latest jenkins version. 

           

          The process, we are following now as below. 

          1) update the jenkins version 

          2) update the plugins, which doesn’t show any warnings 

          3) deal with required plugin updates ( with warning, like kubernetes plugin in our case) 

          4) and test if everything looks good. 

           

          Please help us, in understanding whats happening in this case and also provide us suggestions for best way to upgrade jenkins. 

          Bhavani added a comment - markewaite   We are facing many issues, while trying to jump from 2.387.3 to the latest jenkins version. as we are very much far behind the latest jenkins LTS version. So this is the reason, we are trying to update in smaller jump in version and repeat the same until we do reach to the latest jenkins version.    The process, we are following now as below.  1) update the jenkins version  2) update the plugins, which doesn’t show any warnings  3) deal with required plugin updates ( with warning, like kubernetes plugin in our case)  4) and test if everything looks good.    Please help us, in understanding whats happening in this case and also provide us suggestions for best way to upgrade jenkins. 

          Mark Waite added a comment -

          We are facing many issues, while trying to jump from 2.387.3 to the latest Jenkins version. as we are very much far behind the latest Jenkins LTS version.

          You're upgrading across 16 months of Jenkins releases. There have been many significant changes in those 16 months. The Jenkins project strongly recommends that LTS users upgrade every month. You're repaying 16 months of "upgrade debt".

          Please help us, in understanding what's happening in this case and also provide us suggestions for best way to upgrade Jenkins.

          Unfortunately, I don't know what's happening in this case.

          The path that you are taking is a valid path for Jenkins upgrade. Multiple steps along the path to the final destination. An alternative is "one giant step" to the final destination. Those paths are described in various posts like this recent reply.

          Mark Waite added a comment - We are facing many issues, while trying to jump from 2.387.3 to the latest Jenkins version. as we are very much far behind the latest Jenkins LTS version. You're upgrading across 16 months of Jenkins releases. There have been many significant changes in those 16 months. The Jenkins project strongly recommends that LTS users upgrade every month. You're repaying 16 months of "upgrade debt". Please help us, in understanding what's happening in this case and also provide us suggestions for best way to upgrade Jenkins. Unfortunately, I don't know what's happening in this case. The path that you are taking is a valid path for Jenkins upgrade. Multiple steps along the path to the final destination. An alternative is "one giant step" to the final destination. Those paths are described in various posts like this recent reply .

            Unassigned Unassigned
            bhavani_indukuri Bhavani
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: