Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-73826

Thread dead lock causing jenkins to unresponsive after updating kubernetes plugin

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • kubernetes-plugin
    • None

      We are currently in the process of upgrading our Jenkins version from 2.387.3 to 2.414.3. Following the upgrade, Jenkins appears to be functioning normally without requiring any plugin updates.

      As part of the upgrade process, we updated all possible plugins through the UI, which did not display any warnings. Notably, the update included the SnakeYAML plugin. However, upon further investigation, we discovered an issue with our existing Kubernetes plugins, specifically:

      • Kubernetes: 3937.vd7b_82db_e347b_
      • Kubernetes-cli: 1.12.0
      • Kubernetes-client-api: 6.4.1-215.v2ed17097a_8e9
      • Kubernetes-credentials: 0.10.0

      To address this issue and anyway we need to upgrade kubernetes plugin, we updated the SSH credentials, Kubernetes credentials, and the Kubernetes and Kubernetes CLI plugins to the following versions:

      • Kubernetes: 4054.v2da_8e2794884
      • Kubernetes-cli: 1.12.1
      • Kubernetes-client-api: 6.10.0-240.v57880ce8b_0b_2
      • Kubernetes-credentials: 174.va_36e093562d9

      I have attached the plugins.txt files before and after jenkins&plugin update. We used 2.414.3-lts-rhel-ubi9-jdk17 image for jenkins upgrade. 

      After running 2-3 jobs, Jenkins becomes unresponsive and displays the following error logs:

      2024-09-27 11:11:11.827+0000 [id=565] WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [jenkins.util.Timer 1 locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@1566b90c (owned by Computer.threadPoolForRemoting 13):

      at java.base@17.0.8.1/jdk.internal.misc.Unsafe.park(Native Method)

      at java.base@17.0.8.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)

      at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)

      at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:938)

      at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)

      at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)

      at hudson.model.Queue.maintain(Queue.java:1481)

      at hudson.model.Queue$MaintainTask.doRun(Queue.java:2919)

      at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)

      at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)

      at java.base@17.0.8.1/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)

      at java.base@17.0.8.1/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)

      at java.base@17.0.8.1/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)

      at java.base@17.0.8.1/java.lang.Thread.run(Thread.java:833)

      , Computer.threadPoolForRemoting 8 locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@1566b90c (owned by Computer.threadPoolForRemoting 13):

      at java.base@17.0.8.1/jdk.internal.misc.Unsafe.park(Native Method)

      at java.base@17.0.8.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)

      at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)

      at java.base@17.0.8.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:938)

      at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)

      at java.base@17.0.8.1/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)

      at hudson.model.Queue._withLock(Queue.java:1456)

      at hudson.model.Queue.withLock(Queue.java:1314)

      at jenkins.model.Nodes.updateNode(Nodes.java:201)

      at jenkins.model.Jenkins.updateNode(Jenkins.java:2252)

      at hudson.model.Node.save(Node.java:143)

      at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:247)

      at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:297)

      at hudson.slaves.SlaveComputer$$Lambda$999/0x00000008010d5a10.call(Unknown Source)

      at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)

      at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)

      at java.base@17.0.8.1/java.util.concurrent.FutureTask.run(FutureTask.java:264)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)

      at java.base@17.0.8.1/java.lang.Thread.run(Thread.java:833)

      , Computer.threadPoolForRemoting 13 locked on org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher@517c803b (owned by Computer.threadPoolForRemoting 8):

      at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.isLaunchSupported(KubernetesLauncher.java:91)

      at hudson.slaves.SlaveComputer.isLaunchSupported(SlaveComputer.java:247)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.check(OnceRetentionStrategy.java:81)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.check(OnceRetentionStrategy.java:46)

      at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:960)

      at hudson.model.Queue._withLock(Queue.java:1397)

      at hudson.model.Queue.withLock(Queue.java:1271)

      at hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:957)

      at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:147)

      at hudson.model.AbstractCIBase$1.run(AbstractCIBase.java:255)

      at hudson.model.Queue._withLock(Queue.java:1397)

      at hudson.model.Queue.withLock(Queue.java:1271)

      at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238)

      at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1693)

      at jenkins.model.Nodes$5.run(Nodes.java:279)

      at hudson.model.Queue._withLock(Queue.java:1397)

      at hudson.model.Queue.withLock(Queue.java:1271)

      at jenkins.model.Nodes.removeNode(Nodes.java:270)

      at jenkins.model.Jenkins.removeNode(Jenkins.java:2238)

      at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:91)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$5(OnceRetentionStrategy.java:142)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda$1353/0x00000008013c5220.run(Unknown Source)

      at hudson.model.Queue._withLock(Queue.java:1397)

      at hudson.model.Queue.withLock(Queue.java:1271)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy.lambda$done$6(OnceRetentionStrategy.java:137)

      at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$$Lambda$1352/0x00000008013c4ff8.run(Unknown Source)

      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)

      at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)

      at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)

      at jenkins.util.ErrorLoggingExecutorService$$Lambda$742/0x0000000800e57420.run(Unknown Source)

      at java.base@17.0.8.1/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)

      at java.base@17.0.8.1/java.util.concurrent.FutureTask.run(FutureTask.java:264)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)

      at java.base@17.0.8.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)

      at java.base@17.0.8.1/java.lang.Thread.run(Thread.java:833)

       

      2024-09-27 11:12:36.768+0000 [id=575] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #13 from /127.0.0.1:56944 failed: null

      2024-09-27 11:13:40.300+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 5 sec

      2024-09-27 11:13:45.301+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 10 sec

      2024-09-27 11:13:50.302+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 15 sec

      2024-09-27 11:13:55.302+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 20 sec

      2024-09-27 11:14:00.303+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 25 sec

      2024-09-27 11:14:05.303+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 30 sec

      2024-09-27 11:14:10.304+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 35 sec

      Computer.threadPoolForRemoting 13 locked on org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher@517c803b (owned by Computer.threadPoolForRemoting 8):

      at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.isLaunchSupported(KubernetesLauncher.java:91)

      2024-09-27 11:14:15.304+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 40 sec

      2024-09-27 11:14:20.305+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 45 sec

      2024-09-27 11:14:25.306+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 50 sec

      2024-09-27 11:14:30.306+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 55 sec

      2024-09-27 11:14:35.307+0000 [id=155] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecutionOwner[tfaudit/feature%2FTOOLS-3513-nexusupgradeTest/15:tfaudit/feature%2FTOOLS-3513-nexusupgradeTest #15] unresponsive for 1 min 0 sec

      2024-09-27 11:14:36.768+0000 [id=587] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #14 from /127.0.0.1:56958 failed: null

       

      Kindly help us in resolving the issue as this is blocking our upgrade. 

            Unassigned Unassigned
            bhavani_indukuri Bhavani
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: