-
Type:
Bug
-
Resolution: Duplicate
-
Priority:
Major
-
Component/s: kubernetes-plugin
-
None
-
Environment:Jenkins 1.50.3, kubernetes plugin 1.14.8
When communication between the JNLP slave and the Jenkins master fails, Pods are left in an Error state, and in some cases require manual intervention to clean up. Non-JNLP containers are left running and consuming resources in kubernetes.
Expected behavior would be to:
- Fail the job if a suitable Pod can't be started to run the build
- Make sure that any unsuccessful Pods are cleaned up by the kubernetes plugin
One way to reproduce this issue is by changing the Jenkins tunnel value to something invalid. 
This issue also happens even when the Jenkins tunnel reference is correct...
When a job is run, the Console will show that it is waiting for a node
Â
Started by user Admin
Running in Durability level: MAX_SURVIVABILITY
[Pipeline] Start of Pipeline
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] node
Still waiting to schedule task
‘Jenkins’ doesn’t have label ‘mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a’
Meanwhile, in kubernetes, many Pods will be created and each Pod left in Error state
Â
~ # kubectl get pod -o wide --watch NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES jenkins-deployment-67c96975d-khpvb 2/2 Running 0 3d4h 10.233.66.66 high-memory-node-5 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-72h55 0/2 Pending 0 0s <none> <none> <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-72h55 0/2 Pending 0 0s <none> high-memory-node-3 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-72h55 0/2 ContainerCreating 0 0s <none> high-memory-node-3 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-72h55 2/2 Running 0 4s 10.233.98.99 high-memory-node-3 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-72h55 1/2 Error 0 5s 10.233.98.99 high-memory-node-3 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-0gf7v 0/2 Pending 0 0s <none> <none> <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-0gf7v 0/2 Pending 0 0s <none> high-memory-node-2 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-0gf7v 0/2 ContainerCreating 0 1s <none> high-memory-node-2 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-0gf7v 2/2 Running 0 3s 10.233.119.99 high-memory-node-2 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-0gf7v 1/2 Error 0 4s 10.233.119.99 high-memory-node-2 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-6ggw6 0/2 Pending 0 0s <none> <none> <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-6ggw6 0/2 Pending 0 0s <none> high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-6ggw6 0/2 ContainerCreating 0 1s <none> high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-6ggw6 2/2 Running 0 4s 10.233.72.223 high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-6ggw6 1/2 Error 0 5s 10.233.72.223 high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-c7dxb 0/2 Pending 0 0s <none> <none> <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-c7dxb 0/2 Pending 0 0s <none> high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-c7dxb 0/2 ContainerCreating 0 1s <none> high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-c7dxb 2/2 Running 0 3s 10.233.72.224 high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-c7dxb 1/2 Error 0 5s 10.233.72.224 high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-v0tqq 0/2 Pending 0 0s <none> <none> <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-v0tqq 0/2 Pending 0 0s <none> high-memory-node-4 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-v0tqq 0/2 ContainerCreating 0 0s <none> high-memory-node-4 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-v0tqq 2/2 Running 0 2s 10.233.78.100 high-memory-node-4 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-v0tqq 1/2 Error 0 4s 10.233.78.100 high-memory-node-4 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-pntgv 0/2 Pending 0 0s <none> <none> <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-pntgv 0/2 Pending 0 0s <none> high-memory-node-1 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-pntgv 0/2 ContainerCreating 0 1s <none> high-memory-node-1 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-pntgv 2/2 Running 0 2s 10.233.95.100 high-memory-node-1 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-pntgv 1/2 Error 0 4s 10.233.95.100 high-memory-node-1 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-5ncq0 0/2 Pending 0 0s <none> <none> <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-5ncq0 0/2 Pending 0 1s <none> high-memory-node-3 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-5ncq0 0/2 ContainerCreating 0 1s <none> high-memory-node-3 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-5ncq0 2/2 Running 0 4s 10.233.98.100 high-memory-node-3 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-5ncq0 1/2 Error 0 5s 10.233.98.100 high-memory-node-3 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-dnssr 0/2 Pending 0 0s <none> <none> <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-dnssr 0/2 Pending 0 0s <none> high-memory-node-4 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-dnssr 0/2 ContainerCreating 0 0s <none> high-memory-node-4 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-dnssr 2/2 Running 0 3s 10.233.78.101 high-memory-node-4 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-dnssr 1/2 Error 0 4s 10.233.78.101 high-memory-node-4 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-2tcb1 0/2 Pending 0 0s <none> <none> <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-2tcb1 0/2 Pending 0 0s <none> high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-2tcb1 0/2 ContainerCreating 0 1s <none> high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-2tcb1 2/2 Running 0 2s 10.233.72.225 high-memory-node-0 <none> <none> mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-2tcb1 1/2 Error 0 4s 10.233.72.225 high-memory-node-0 <none> <none>
A describe of one of the Pods shows this
Â
~ # kubectl describe pod mypod-e549650e-86ea-43b1-946e-95f307b255b9-51lkb-d16h2
Name: mypod-e549650e-86ea-43b1-946e-95f307b255b9-51lkb-d16h2
Namespace: jenkins-master-test
Priority: 0
PriorityClassName: <none>
Node: high-memory-node-2/10.0.40.14
Start Time: Mon, 04 Mar 2019 21:35:11 +0000
Labels: jenkins=slave
jenkins/mypod-e549650e-86ea-43b1-946e-95f307b255b9=true
Annotations: buildUrl: http://jenkins-service:8080/job/testmaxconn/147/
kubernetes.io/limit-ranger:
LimitRanger plugin set: cpu, memory request for container alpine; cpu, memory limit for container alpine; cpu, memory request for containe...
Status: Running
IP: 10.233.119.104
Containers:
alpine:
Container ID: docker://80c1d2d396f1ece3c2c78c2335d78e482cca1899ff2f29047cc4d06892215d04
Image: alpine
Image ID: docker-pullable://alpine@sha256:b3dbf31b77fd99d9c08f780ce6f5282aba076d70a513a8be859d8d3a4d0c92b8
Port: <none>
Host Port: <none>
Command:
cat
State: Running
Started: Mon, 04 Mar 2019 21:35:13 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 25m
memory: 256Mi
Environment:
JENKINS_SECRET: 8e1c8742aa520eebf3ccff2bbf4855c270dbfbe08fea032b8e81dfd921ff9afa
JENKINS_TUNNEL: jenkins-service-bad:50000
JENKINS_AGENT_NAME: mypod-e549650e-86ea-43b1-946e-95f307b255b9-51lkb-d16h2
JENKINS_NAME: mypod-e549650e-86ea-43b1-946e-95f307b255b9-51lkb-d16h2
JENKINS_URL: http://jenkins-service:8080/
HOME: /home/jenkins
Mounts:
/home/jenkins from workspace-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qjfqj (ro)
jnlp:
Container ID: docker://cc4875a887ec33e3c4504378e9cfc4a55a0179b433305ac686223cc4165a4307
Image: jenkins/jnlp-slave:alpine
Image ID: docker-pullable://jenkins/jnlp-slave@sha256:8e330d8bc461440c797d543b9872e54b328da3ef3b052180bb5aed33204d4384
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 255
Started: Mon, 04 Mar 2019 21:35:13 +0000
Finished: Mon, 04 Mar 2019 21:35:14 +0000
Ready: False
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 25m
memory: 256Mi
Environment:
JENKINS_SECRET: 8e1c8742aa520eebf3ccff2bbf4855c270dbfbe08fea032b8e81dfd921ff9afa
JENKINS_TUNNEL: jenkins-service-bad:50000
JENKINS_AGENT_NAME: mypod-e549650e-86ea-43b1-946e-95f307b255b9-51lkb-d16h2
JENKINS_NAME: mypod-e549650e-86ea-43b1-946e-95f307b255b9-51lkb-d16h2
JENKINS_URL: http://jenkins-service:8080/
HOME: /home/jenkins
Mounts:
/home/jenkins from workspace-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qjfqj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
workspace-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-qjfqj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qjfqj
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30s default-scheduler Successfully assigned jenkins-master-test/mypod-e549650e-86ea-43b1-946e-95f307b255b9-51lkb-d16h2 to high-memory-node-2
Normal Pulled 28s kubelet, high-memory-node-2 Container image "alpine" already present on machine
Normal Created 28s kubelet, high-memory-node-2 Created container
Normal Started 28s kubelet, high-memory-node-2 Started container
Normal Pulled 28s kubelet, high-memory-node-2 Container image "jenkins/jnlp-slave:alpine" already present on machine
Normal Created 28s kubelet, high-memory-node-2 Created container
Normal Started 28s kubelet, high-memory-node-2 Started container
Â
Â
The logs from the JNLP container in one of the Error Pods are
Â
~ # kubectl logs mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-nx55q jnlp~ # kubectl logs mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-nx55q jnlp Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior Mar 04, 2019 9:22:33 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: mypod-cadb7e5a-1337-414d-abd2-a7ca3653a33a-tlt12-nx55q Mar 04, 2019 9:22:33 PM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Mar 04, 2019 9:22:33 PM hudson.remoting.Engine startEngine INFO: Using Remoting version: 3.27 Mar 04, 2019 9:22:33 PM hudson.remoting.Engine startEngine WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars Mar 04, 2019 9:22:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins-service:8080/] Mar 04, 2019 9:22:34 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Mar 04, 2019 9:22:34 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check Mar 04, 2019 9:22:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful  Agent address: jenkins-service-bad  Agent port:  50000  Identity:   51:84:00:05:77:43:f0:57:4e:08:d9:22:55:61:c6:8a Mar 04, 2019 9:22:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Mar 04, 2019 9:22:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins-service-bad:50000 Mar 04, 2019 9:22:34 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: null java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:101) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622) at java.nio.channels.SocketChannel.open(SocketChannel.java:189) at org.jenkinsci.remoting.engine.JnlpAgentEndpoint.open(JnlpAgentEndpoint.java:203) at hudson.remoting.Engine.connect(Engine.java:691) at hudson.remoting.Engine.innerRun(Engine.java:552) at hudson.remoting.Engine.run(Engine.java:474)
Another example of a failed JNLP container, this time the JNLP URL was correctAnd all of the Error Pods are still around.Â
Â
Â
Â
Â
Â
# kubectl logs jenkins-slave-g6q4s-5jz7b jnlp# kubectl logs jenkins-slave-g6q4s-5jz7b jnlp Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior Feb 27, 2019 12:52:59 AM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: jenkins-slave-g6q4s-5jz7b Feb 27, 2019 12:52:59 AM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Feb 27, 2019 12:52:59 AM hudson.remoting.Engine startEngine INFO: Using Remoting version: 3.26 Feb 27, 2019 12:52:59 AM hudson.remoting.Engine startEngine WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars Feb 27, 2019 12:52:59 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins-service:8080/] Feb 27, 2019 12:53:00 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Feb 27, 2019 12:53:00 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful  Agent address: jenkins-service  Agent port:  50000  Identity:   24:e1:66:dc:61:8a:05:cf:65:25:46:d5:59:57:12:be Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins-service:50000 Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: 24:e1:66:dc:61:8a:05:cf:65:25:46:d5:59:57:12:be Feb 27, 2019 12:53:00 AM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv INFO: [JNLP4-connect connection to jenkins-service/10.233.13.248:50000] Local headers refused by remote: Unknown client name: jenkins-slave-g6q4s-5jz7b Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Protocol JNLP4-connect encountered an unexpected exception java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: jenkins-slave-g6q4s-5jz7b at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223) at hudson.remoting.Engine.innerRun(Engine.java:614) at hudson.remoting.Engine.run(Engine.java:474) Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: jenkins-slave-g6q4s-5jz7b at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378) at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.onRecvClosed(ConnectionHeadersFilterLayer.java:433) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832) at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:172) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832) at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154) at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:48) at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:247) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93) at java.lang.Thread.run(Thread.java:748) Suppressed: java.nio.channels.ClosedChannelException ... 7 more Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins-service:50000 Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server reports protocol JNLP4-plaintext not supported, skipping Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Protocol JNLP3-connect is not enabled, skipping Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server reports protocol JNLP2-connect not supported, skipping Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server reports protocol JNLP-connect not supported, skipping Feb 27, 2019 12:53:00 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: The server rejected the connection: None of the protocols were accepted java.lang.Exception: The server rejected the connection: None of the protocols were accepted at hudson.remoting.Engine.onConnectionRejected(Engine.java:675) at hudson.remoting.Engine.innerRun(Engine.java:639) at hudson.remoting.Engine.run(Engine.java:474)
In this case, the Pod was still around in an Error state five days later.
Â
- duplicates
-
JENKINS-54540 Pods stuck in error state is not cleaned up
-
- Open
-