Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-56347

Kubernetes plugin provisioning pods twice in 1.14.6

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • kubernetes-plugin
    • None

      Since 1.14.6 and it's update to kubernetes client 4.1.2, I started to observe weird behavior when plugin provisions two pods for one pod template. Second pod is exactly the same as first one but is spawned exactly 20 seconds after.

      I tried version 1.14.5 and it's working well (also tested 1.14.2, 1.14.0).

      Also it seems that SocketClosed exceptions caused by timeout on get operation became more frequent:
      Caused: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [terraform-deploy-fabric-cluster-cec-az-246-hc9xs-2z6rf] in namespace: [jenkins] failed.

          [JENKINS-56347] Kubernetes plugin provisioning pods twice in 1.14.6

          Filip Pytloun created issue -
          Filip Pytloun made changes -
          Description Original: Since 1.14.6 and it's update to kubernetes client 4.1.2, I started to observe weird behavior when plugin provisions two pods for one pod template. Second pod is exactly the same as first one but is spawned exactly 20 seconds after.

          I tried version 1.14.5 and it's working well (also tested 1.14.2, 1.14.0).

          Also it seems that SocketClosed exceptions caused by timeout on get operation became more frequent:
          Caused: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [terraform-deploy-fabric-cluster-cec-az-246-hc9xs-2z6rf] in namespace: [jenkins] failed.
          New: Since 1.14.6 and it's update to kubernetes client 4.1.2, I started to observe weird behavior when plugin provisions two pods for one pod template. Second pod is exactly the same as first one but is spawned exactly 20 seconds after.

          I tried version 1.14.5 and it's working well (also tested 1.14.2, 1.14.0).

          Also it seems that SocketClosed exceptions caused by timeout on get operation became more frequent:
           Caused: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [terraform-deploy-fabric-cluster-cec-az-246-hc9xs-2z6rf] in namespace: [jenkins] failed.

          Filip Pytloun added a comment -

          Any update? This is serious issue and is also affecting latest version 1.14.8.

          Filip Pytloun added a comment - Any update? This is serious issue and is also affecting latest version 1.14.8.

          there's nobody else experiencing that behavior, so unless you can provide definitions, debug logs,... there's not much that can be done

          Carlos Sanchez added a comment - there's nobody else experiencing that behavior, so unless you can provide definitions, debug logs,... there's not much that can be done

          We had the same problem with 1.14.6 and 1.14.8. Each job we ran created one slave, then a second 20 seconds later. This quickly drained the available resources in our namespace.

          We also avoided the problem by rolling back to 1.14.5.

          Unfortunately we don't have any logs of it right now. But I'll try to find time to reproduce it and get some logs.

          Christian Tryti added a comment - We had the same problem with 1.14.6 and 1.14.8. Each job we ran created one slave, then a second 20 seconds later. This quickly drained the available resources in our namespace. We also avoided the problem by rolling back to 1.14.5. Unfortunately we don't have any logs of it right now. But I'll try to find time to reproduce it and get some logs.
          Christian Tryti made changes -
          Attachment New: jenkins.log [ 46319 ]

          Recreated it on Jenkins 2.150.3 with the following plugins (that i think might have any relevance):

          • Kubernetes-plugin: 1.14.8
          • Kubernetes-pipeline-plugin: 1.6
          • Kubernetes-credentials: 0.4.0

          Our jobs run declarative pipelines. The following log jenkins.log shows 3 slaves being created.

          With kubernetes-plugin 1.14.5, only 1 slave is created for each job.

          Christian Tryti added a comment - Recreated it on Jenkins 2.150.3 with the following plugins (that i think might have any relevance): Kubernetes-plugin: 1.14.8 Kubernetes-pipeline-plugin: 1.6 Kubernetes-credentials: 0.4.0 Our jobs run declarative pipelines. The following log jenkins.log shows 3 slaves being created. With kubernetes-plugin 1.14.5, only 1 slave is created for each job.

          I can leave that jenkins-instance running, in case you want more logs. Shout out if you need more details about the rest of the setup as well.

          Christian Tryti added a comment - I can leave that jenkins-instance running, in case you want more logs. Shout out if you need more details about the rest of the setup as well.

          what I see in the logs is that the KubernetesCloud.provision is called 3 times, something that is controlled by Jenkins, not the plugin.
          Although it's suspicious that both of you say started happening in 1.14.6

          Do you have the settings in https://github.com/jenkinsci/kubernetes-plugin/#over-provisioning-flags ?

          Carlos Sanchez added a comment - what I see in the logs is that the KubernetesCloud.provision is called 3 times, something that is controlled by Jenkins, not the plugin. Although it's suspicious that both of you say started happening in 1.14.6 Do you have the settings in https://github.com/jenkinsci/kubernetes-plugin/#over-provisioning-flags ?

          Not in that jenkins-instance.

          Christian Tryti added a comment - Not in that jenkins-instance.

            csanchez Carlos Sanchez
            genunix Filip Pytloun
            Votes:
            6 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated:
              Resolved: