-
Bug
-
Resolution: Not A Defect
-
Major
-
None
-
Jenkins version - Jenkins 2.319.1
Jenkins Master image : jenkins/jenkins:2.319.1-lts-alpine
Jenkins worker image: jenkins/inbound-agent:4.11-1-alpine
Installed plugins:
Kubernetes - 1.30.6
Kubernetes Client API - 5.4.1
Kubernetes Credentials Plugin - 0.9.0
JAVA version on master: openjdk 11.0.13
JAVA version on Agent/worker : openjdk 11.0.14
Jenkins version - Jenkins 2.319.1 Jenkins Master image : jenkins/jenkins:2.319.1-lts-alpine Jenkins worker image: jenkins/inbound-agent:4.11-1-alpine Installed plugins: Kubernetes - 1.30.6 Kubernetes Client API - 5.4.1 Kubernetes Credentials Plugin - 0.9.0 JAVA version on master: openjdk 11.0.13 JAVA version on Agent/worker : openjdk 11.0.14
Hi team,
We are facing issue in jenkins where jenkins agent disconnects(or goes offline) from master while still job is running on agent/worker. We are getting below error(highlighted) and tried below things but issue is still not resolving fully. Jenkins is deployed on EKS.
Error:
-------------------------------------------------------------------------------------------------------------------------
5332191-2022-11-02 13:59:19.734+0000 [id=140214] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2606, Executions with /project-sol-vodafone/ekscomponent/sol-ekscomponent-app-done/jenkins master ⇣
…/sol-ekscomponent-app-done/jenkins on 🌱 master [😰]
❯ kd logs jenkins-0 -c jenkins | grep -a10 -b10 worker-7j4x4
5332191-2022-11-02 13:59:19.734+0000 [id=140214] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2606, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
5332747-2022-11-02 13:59:19.734+0000 [id=140214] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 0 nodes assigned to this Jenkins instance, which we will check
5332921-2022-11-02 13:59:19.734+0000 [id=140214] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed
5333061-2022-11-02 13:59:19.734+0000 [id=140214] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished DockerContainerWatchdog Asynchronous Periodic Work. 1 ms
5333220-2022-11-02 14:04:19.733+0000 [id=140239] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started DockerContainerWatchdog Asynchronous Periodic Work
5333372-2022-11-02 14:04:19.734+0000 [id=140239] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
5333506-2022-11-02 14:04:19.734+0000 [id=140239] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2607, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
5334062-2022-11-02 14:04:19.734+0000 [id=140239] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 0 nodes assigned to this Jenkins instance, which we will check
5334236-2022-11-02 14:04:19.734+0000 [id=140239] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed
5334376-2022-11-02 14:04:19.734+0000 [id=140239] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished DockerContainerWatchdog Asynchronous Periodic Work. 0 ms
5334535:2022-11-02 14:07:54.573+0000 [id=140290] INFO hudson.slaves.NodeProvisioner#update: worker-7j4x4 provisioning successfully completed. We have now 2 computer(s)
5334695:2022-11-02 14:07:54.675+0000 [id=140291] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes done-jenkins/worker-7j4x4
5334828:2022-11-02 14:07:56.619+0000 [id=140291] INFO o.c.j.p.k.KubernetesLauncher#launch: Pod is running: kubernetes done-jenkins/worker-7j4x4
5334964-2022-11-02 14:07:58.650+0000 [id=140309] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #97 from /100.122.254.111:42648
5335123-2022-11-02 14:09:19.733+0000 [id=140536] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started DockerContainerWatchdog Asynchronous Periodic Work
5335275-2022-11-02 14:09:19.733+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
5335409-2022-11-02 14:09:19.734+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2608, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
5335965-2022-11-02 14:09:19.734+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 1 nodes assigned to this Jenkins instance, which we will check
5336139-2022-11-02 14:09:19.734+0000 [id=140536] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed
5336279-2022-11-02 14:09:19.734+0000 [id=140536] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished DockerContainerWatchdog Asynchronous Periodic Work. 1 ms
5336438-groovy.lang.MissingPropertyException: No such property: envVar for class: groovy.lang.Binding
5336532- at groovy.lang.Binding.getVariable(Binding.java:63)
5336585- at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onGetProperty(SandboxInterceptor.java:271)
–
5394279-2022-11-02 15:09:19.733+0000 [id=141899] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started DockerContainerWatchdog Asynchronous Periodic Work
5394431-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
5394565-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2620, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
5395121-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 3 nodes assigned to this Jenkins instance, which we will check
5395295-2022-11-02 15:09:19.734+0000 [id=141899] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed
5395435-2022-11-02 15:09:19.734+0000 [id=141899] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished DockerContainerWatchdog Asynchronous Periodic Work. 1 ms
5395594-2022-11-02 15:11:59.502+0000 [id=140320] INFO hudson.slaves.ChannelPinger$1#onDead: Ping failed. Terminating the channel JNLP4-connect connection from ip-100-122-254-111.eu-central-1.compute.internal/100.122.254.111:42648.
5395817-java.util.concurrent.TimeoutException: Ping started at 1667401679501 hasn't completed by 1667401919502
5395920- at hudson.remoting.PingThread.ping(PingThread.java:134)
5395977- at hudson.remoting.PingThread.run(PingThread.java:90)
5396032:2022-11-02 15:11:59.503+0000 [id=141914] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting 5049 for worker-7j4x4 terminated: java.nio.channels.ClosedChannelException
5396231-2022-11-02 15:12:35.579+0000 [id=141933] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started Periodic background build discarder
5396368-2022-11-02 15:12:36.257+0000 [id=141933] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Finished Periodic background build discarder. 678 ms
5396514-2022-11-02 15:14:15.582+0000 [id=141422] INFO hudson.slaves.ChannelPinger$1#onDead: Ping failed. Terminating the channel JNLP4-connect connection from ip-100-122-237-38.eu-central-1.compute.internal/100.122.237.38:55038.
5396735-java.util.concurrent.TimeoutException: Ping started at 1667401815582 hasn't completed by 1667402055582
5396838- at hudson.remoting.PingThread.ping(PingThread.java:134)
5396895- at hudson.remoting.PingThread.run(PingThread.java:90)
5396950-2022-11-02 15:14:15.584+0000 [id=141915] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting 5050 for worker-fjf1p terminated: java.nio.channels.ClosedChannelException
5397149-2022-11-02 15:14:19.733+0000 [id=141950] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$1: Started DockerContainerWatchdog Asynchronous Periodic Work
5397301-2022-11-02 15:14:19.733+0000 [id=141950] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
5397435-2022-11-02 15:14:19.734+0000 [id=141950] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 2621, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
-------------------------------------------------------------------------------------------------------------
Tried below things:
1. Increased idleMinutes to 180 from default
2. Verified that resources are sufficient as per graphana dashboard
3. Changed podRetention to onFailure from Never
4. Changed podRetention to Always from Never
5. Increased readTimeout
6. Increased connectTimeout
7. Increased slaveConnectTimeoutStr
8. Disabled the ping thread from UI via disabling “response time" checkbox from preventive node monitroing
9. Increased activeDeadlineSeconds
10. Verified same java version on master and agent
11. Updated kubernetes and kubernetes API client plugins
Any suggestion or resolutions pls.