-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Powered by SuggestiMate
Since 1.14.6 and it's update to kubernetes client 4.1.2, I started to observe weird behavior when plugin provisions two pods for one pod template. Second pod is exactly the same as first one but is spawned exactly 20 seconds after.
I tried version 1.14.5 and it's working well (also tested 1.14.2, 1.14.0).
Also it seems that SocketClosed exceptions caused by timeout on get operation became more frequent:
Caused: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [terraform-deploy-fabric-cluster-cec-az-246-hc9xs-2z6rf] in namespace: [jenkins] failed.
- is duplicated by
-
JENKINS-56889 Kubernetes creates too many slaves
-
- Closed
-
-
JENKINS-56491 overeager provisioning of k8s nodes
-
- Closed
-
- links to
[JENKINS-56347] Kubernetes plugin provisioning pods twice in 1.14.6
Can you try the PR in https://github.com/jenkinsci/kubernetes-plugin/pull/455 and post the logs ?
Same issue here on 1.15.1 and 1.15.2 will see if I can try the PR and report back. I am seeing a slave spin every 10 seconds until one connects.
Logs from PR in https://github.com/jenkinsci/kubernetes-plugin/pull/455.
This is a build of a single job that spins up 8 slaves:
May 01, 2019 2:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 1 May 01, 2019 2:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:13:50 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 01, 2019 2:13:50 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0 May 01, 2019 2:14:00 PM hudson.slaves.NodeProvisioner$2 run INFO: Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s) May 01, 2019 2:14:00 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: pd-compute-platform/pd-slave-zcf1g May 01, 2019 2:14:00 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 01, 2019 2:14:02 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Pod is running: pd-compute-platform/pd-slave-zcf1g May 01, 2019 2:14:10 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 1 May 01, 2019 2:14:10 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:14:10 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0 May 01, 2019 2:14:20 PM hudson.slaves.NodeProvisioner$2 run INFO: Kubernetes Pod Template provisioning successfully completed. We have now 3 computer(s) May 01, 2019 2:14:20 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: pd-compute-platform/pd-slave-txmw8 May 01, 2019 2:14:20 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 01, 2019 2:14:21 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Pod is running: pd-compute-platform/pd-slave-txmw8 May 01, 2019 2:14:30 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 1 May 01, 2019 2:14:30 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:14:30 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0 May 01, 2019 2:14:40 PM hudson.slaves.NodeProvisioner$2 run INFO: Kubernetes Pod Template provisioning successfully completed. We have now 4 computer(s) May 01, 2019 2:14:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: pd-compute-platform/pd-slave-61lwt May 01, 2019 2:14:40 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 01, 2019 2:14:41 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Pod is running: pd-compute-platform/pd-slave-61lwt May 01, 2019 2:14:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 1 May 01, 2019 2:14:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:14:50 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0 May 01, 2019 2:15:00 PM hudson.slaves.NodeProvisioner$2 run INFO: Kubernetes Pod Template provisioning successfully completed. We have now 5 computer(s) May 01, 2019 2:15:00 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 0 May 01, 2019 2:15:00 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:15:00 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: pd-compute-platform/pd-slave-x2gjp May 01, 2019 2:15:00 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 01, 2019 2:15:02 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Pod is running: pd-compute-platform/pd-slave-x2gjp May 01, 2019 2:15:10 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 1 May 01, 2019 2:15:10 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:15:10 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0 May 01, 2019 2:15:20 PM hudson.slaves.NodeProvisioner$2 run INFO: Kubernetes Pod Template provisioning successfully completed. We have now 6 computer(s) May 01, 2019 2:15:20 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 0 May 01, 2019 2:15:20 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:15:20 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: pd-compute-platform/pd-slave-8n2hg May 01, 2019 2:15:20 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 01, 2019 2:15:22 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Pod is running: pd-compute-platform/pd-slave-8n2hg May 01, 2019 2:15:30 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 1 May 01, 2019 2:15:30 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:15:30 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0 May 01, 2019 2:15:40 PM hudson.slaves.NodeProvisioner$2 run INFO: Kubernetes Pod Template provisioning successfully completed. We have now 7 computer(s) May 01, 2019 2:15:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 0 May 01, 2019 2:15:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:15:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: pd-compute-platform/pd-slave-lpv1t May 01, 2019 2:15:40 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 01, 2019 2:15:42 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Pod is running: pd-compute-platform/pd-slave-lpv1t May 01, 2019 2:15:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 1 May 01, 2019 2:15:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:15:50 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0 May 01, 2019 2:16:00 PM hudson.slaves.NodeProvisioner$2 run INFO: Kubernetes Pod Template provisioning successfully completed. We have now 8 computer(s) May 01, 2019 2:16:00 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 0 May 01, 2019 2:16:00 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:16:00 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: pd-compute-platform/pd-slave-jpd7x May 01, 2019 2:16:00 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 01, 2019 2:16:04 PM hudson.TcpSlaveAgentListener$ConnectionHandler run INFO: Accepted JNLP4-connect connection #1 from /10.234.12.72:32860 May 01, 2019 2:16:05 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Pod is running: pd-compute-platform/pd-slave-jpd7x May 01, 2019 2:16:22 PM hudson.TcpSlaveAgentListener$ConnectionHandler run INFO: Accepted JNLP4-connect connection #2 from /10.234.12.73:49150
As a comparison this is from 1.14.5 plugin:
May 01, 2019 2:36:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 1 May 01, 2019 2:36:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 01, 2019 2:36:14 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 01, 2019 2:36:14 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0 May 01, 2019 2:36:24 PM hudson.slaves.NodeProvisioner$2 run INFO: Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s) May 01, 2019 2:36:24 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: pd-slave-12gpw in namespace pd-pipedream May 01, 2019 2:36:24 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for Pod to be scheduled (0/600): pd-slave-12gpw May 01, 2019 2:36:25 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Container is waiting pd-slave-12gpw [jnlp]: ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}) May 01, 2019 2:36:25 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for Pod to be scheduled (1/600): pd-slave-12gpw May 01, 2019 2:36:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (0/250): pd-slave-12gpw May 01, 2019 2:36:27 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (1/250): pd-slave-12gpw May 01, 2019 2:36:28 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (2/250): pd-slave-12gpw May 01, 2019 2:36:29 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (3/250): pd-slave-12gpw May 01, 2019 2:36:30 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (4/250): pd-slave-12gpw May 01, 2019 2:36:31 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (5/250): pd-slave-12gpw May 01, 2019 2:36:32 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (6/250): pd-slave-12gpw May 01, 2019 2:36:33 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (7/250): pd-slave-12gpw May 01, 2019 2:36:34 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (8/250): pd-slave-12gpw May 01, 2019 2:36:35 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (9/250): pd-slave-12gpw May 01, 2019 2:36:36 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (10/250): pd-slave-12gpw May 01, 2019 2:36:37 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (11/250): pd-slave-12gpw May 01, 2019 2:36:38 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (12/250): pd-slave-12gpw May 01, 2019 2:36:39 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (13/250): pd-slave-12gpw May 01, 2019 2:36:40 PM hudson.model.AsyncPeriodicWork$1 run INFO: Started DockerContainerWatchdog Asynchronous Periodic Work May 01, 2019 2:36:40 PM com.nirima.jenkins.plugins.docker.DockerContainerWatchdog execute INFO: Docker Container Watchdog has been triggered May 01, 2019 2:36:40 PM com.nirima.jenkins.plugins.docker.DockerContainerWatchdog$Statistics writeStatisticsToLog INFO: Watchdog Statistics: Number of overall executions: 1, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms May 01, 2019 2:36:40 PM com.nirima.jenkins.plugins.docker.DockerContainerWatchdog loadNodeMap INFO: We currently have 1 nodes assigned to this Jenkins instance, which we will check May 01, 2019 2:36:40 PM com.nirima.jenkins.plugins.docker.DockerContainerWatchdog execute INFO: Docker Container Watchdog check has been completed May 01, 2019 2:36:40 PM hudson.model.AsyncPeriodicWork$1 run INFO: Finished DockerContainerWatchdog Asynchronous Periodic Work. 2 ms May 01, 2019 2:36:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (14/250): pd-slave-12gpw May 01, 2019 2:36:41 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (15/250): pd-slave-12gpw May 01, 2019 2:36:42 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (16/250): pd-slave-12gpw May 01, 2019 2:36:43 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (17/250): pd-slave-12gpw May 01, 2019 2:36:44 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (18/250): pd-slave-12gpw May 01, 2019 2:36:45 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (19/250): pd-slave-12gpw May 01, 2019 2:36:46 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (20/250): pd-slave-12gpw May 01, 2019 2:36:47 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch Omitted for brevity ... INFO: Waiting for agent to connect (111/250): pd-slave-12gpw May 01, 2019 2:38:19 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (112/250): pd-slave-12gpw May 01, 2019 2:38:20 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (113/250): pd-slave-12gpw May 01, 2019 2:38:20 PM hudson.TcpSlaveAgentListener$ConnectionHandler run INFO: Accepted JNLP4-connect connection #1 from /10.234.12.76:49588 May 01, 2019 2:38:21 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (114/250): pd-slave-12gpw May 01, 2019 2:38:22 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (115/250): pd-slave-12gpw May 01, 2019 2:38:23 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (116/250): pd-slave-12gpw May 01, 2019 2:38:42 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
It looks like the plugin is only waiting for the pod to be running and not waiting for it to actually connect back to jenkins. The working plugin waits for the pod to connect back to Jenkins.
It looks like there needs to be a wait for a connect back to Jenkins here
Like the working code here
But I am certainly no Java expert.
The commit that introduced this issue? https://github.com/jenkinsci/kubernetes-plugin/commit/30488eea66dd773ce57f18d1b79a7410affa476a#diff-d31a23c33d2739a785384f53a13bc682
thanks for the diagnosis, that seems to be the cause. I have updated the PR at https://github.com/jenkinsci/kubernetes-plugin/pull/455
No problem, just trying to help.
That fixes it!
May 02, 2019 1:18:54 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 1 May 02, 2019 1:18:54 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label pd-slave: Kubernetes Pod Template May 02, 2019 1:18:54 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesClientProvider createClient INFO: Created new Kubernetes client: kubernetes io.fabric8.kubernetes.client.DefaultKubernetesClient@5339d085 May 02, 2019 1:18:54 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 02, 2019 1:18:54 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning Kubernetes Pod Template from kubernetes with 1 executors. Remaining excess workload: 0 May 02, 2019 1:19:04 PM hudson.slaves.NodeProvisioner$2 run INFO: Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s) May 02, 2019 1:19:04 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: pd-pipedream/pd-slave-g6cwh May 02, 2019 1:19:04 PM okhttp3.internal.platform.Platform log INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? May 02, 2019 1:19:07 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Pod is running: pd-pipedream/pd-slave-g6cwh May 02, 2019 1:19:07 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (0/250): pd-slave-g6cwh May 02, 2019 1:19:08 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (1/250): pd-slave-g6cwh May 02, 2019 1:19:09 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Waiting for agent to connect (2/250): pd-slave-g6cwh
Any idea how long before a new version is released?
We have the same problem here, it started to happend once we added this: https://github.com/jenkinsci/kubernetes-plugin/#over-provisioning-flags
Then we removed the flags but it was still happening, so we downgraded to 1.14.5 and now is not happening.