-
Bug
-
Resolution: Not A Defect
-
Blocker
-
None
-
Jenkins master version: 2.190.1
Kubernetes Plugin: 1.19.3
It also happened before the upgrade in
Jenkins: 2.176.3
K8S plugin: 1.19.0
It happens frequently not something constant, which makes it very hard to debug.
This is my podTemplate:
podTemplate(containers: [ containerTemplate( name: 'build', image: 'my_builder:latest', command: 'cat', ttyEnabled: true, workingDir: '/mnt/jenkins' ) ], volumes: [ hostPathVolume(mountPath: '/var/run/docker.sock', hostPath: '/var/run/docker.sock'), hostPathVolume(mountPath: '/mnt/jenkins', hostPath: '/mnt/jenkins') ], yaml: """ spec: containers: - name: build resources: requests: cpu: "10" memory: "10Gi" securityContext: fsGroup: 995 """ ) { node(POD_LABEL) { stage("Checkout") { } // more stages } }
This is the log from the pod:
Inbound agent connected from IP/IP Waiting for agent to connect (0/100): my_branch Remoting version: 3.35 This is a Unix agent Waiting for agent to connect (1/100): my_branch Agent successfully connected and online ERROR: Connection terminated java.nio.channels.ClosedChannelException at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154) at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142) at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Logs from Jenkins "cat /var/log/jenkins/jenkins.log":
2019-10-08 14:40:48.171+0000 [id=287] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: branch_name, template=PodTemplate{, name='pod_name', namespace='default', label='label_name', nodeUsageMode=EXCLUSIVE, volumes=[HostPathVolume [mountPath=/var/run/docker.sock, hostPath=/var/run/docker.sock], HostPathVolume [mountPath=/mnt/jenkins, hostPath=/mnt/jenkins]], containers=[ContainerTemplate{name='build', image='my_builder', workingDir='/mnt/jenkins', command='cat', ttyEnabled=true, envVars=[KeyValueEnvVar [getValue()=deploy/.dazelrc, getKey()=RC_FILE]]}], annotations=[org.csanchez.jenkins.plugins.kubernetes.PodAnnotation@aab9c821]} io.fabric8.kubernetes.client.KubernetesClientTimeoutException: Timed out waiting for [100000] milliseconds for [Pod] with name:[branch_name] in namespace [default]. at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.await(AllContainersRunningPodWatcher.java:130) at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:134) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:297) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
karolgil Hey we still face this issue once in a while. I worked on it a lot and described all my actions in this ticket.
These things are NOT related to the issue:
There is one thing which fixing it reduce this issue to "Once in a while" :
BUT we still facing this issue, I have suspicions it's related to the fact that Jenkins is scheduling a job in EKS node that is going down as part of the autoscaler policy. Job is being triggered and at the same time autoscaler components mark the same node to be cordoned. I dont have a prove yet for it. and it's being investigated.