Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53427

Agent creation failure because of concurrent attempts to schedule a pod

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Critical Critical
    • kubernetes-plugin
    • None
    • Jenkins ver. 2.107.3
      kubernetes-plugin 1.8.4
      Kubernetes 1.8

      The vast majority of the pods have been created properly, but for some of them it looks like there are several concurrent attempts to create a single pod.

      I've just grep'ed logs on the master for such particular pod which plugin tried to create in several threads (Pod and PodTemplates details skiped): 

      ../custom/k8s.log:2018-09-03 12:08:12.876+0000 [id=629145] FINE o.c.j.p.k.PodTemplateBuilder#build: Pod built: Pod(apiVersion=v1, kind=Pod, ...)
      ./custom/k8s.log:2018-09-03 12:08:12.876+0000 [id=629145] FINE o.c.j.p.k.KubernetesLauncher#launch: Creating Pod: test-0ghpx in namespace dev./custom/k8s.log:2018-09-03 12:08:12.969+0000 [id=629145] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: test-0ghpx in namespace dev
      ./custom/k8s.log:2018-09-03 12:08:12.970+0000 [id=629145] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (0/100): test-0ghpx./custom/k8s.log:2018-09-03 12:08:14.143+0000 [id=640057] FINE o.c.j.p.k.PodTemplateBuilder#build: Pod built: Pod(...)
      ./custom/k8s.log:2018-09-03 12:08:14.144+0000 [id=640057] FINE o.c.j.p.k.KubernetesLauncher#launch: Creating Pod: test-0ghpx in namespace dev
      ./custom/k8s.log:2018-09-03 12:08:14.214+0000 [id=640057] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: test-0ghpx, template=PodTemplate{...}
      ./custom/k8s.log:io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://k8s.local:6443/api/v1/namespaces/dev/pods. Message: pods "test-0ghpx" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=pods, name=test-0ghpx, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "test-0ghpx" already exists, metadata=ListMeta(resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
      ./custom/k8s.log:2018-09-03 12:08:14.214+0000 [id=640057] FINER o.c.j.p.k.KubernetesLauncher#launch: Removing Jenkins node: test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:14.215+0000 [id=640057] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:14.289+0000 [id=640057] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminated Kubernetes instance for agent dev/test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:14.290+0000 [id=640057] INFO o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:14.542+0000 [id=640056] FINE o.c.j.p.k.PodTemplateBuilder#build: Pod built: Pod(...)
      ./custom/k8s.log:2018-09-03 12:08:14.543+0000 [id=640056] FINE o.c.j.p.k.KubernetesLauncher#launch: Creating Pod: test-0ghpx in namespace dev
      ./custom/k8s.log:2018-09-03 12:08:14.615+0000 [id=640056] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: test-0ghpx in namespace dev
      ./custom/k8s.log:2018-09-03 12:08:14.616+0000 [id=640056] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (0/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:18.976+0000 [id=629145] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (1/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:20.620+0000 [id=640056] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (1/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:24.983+0000 [id=629145] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (2/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:26.625+0000 [id=640056] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (2/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:30.988+0000 [id=629145] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (3/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:32.629+0000 [id=640056] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (3/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:36.993+0000 [id=629145] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (4/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:38.634+0000 [id=640056] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (4/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:42.998+0000 [id=629145] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (5/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:08:44.639+0000 [id=640056] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (5/100): test-0ghpx
      ......
      ./custom/k8s.log:2018-09-03 12:18:01.997+0000 [id=629145] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (98/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:18:03.559+0000 [id=640056] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (98/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:18:08.002+0000 [id=629145] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (99/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:18:09.563+0000 [id=640056] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for Pod to be scheduled (99/100): test-0ghpx
      ./custom/k8s.log:2018-09-03 12:18:14.008+0000 [id=629145] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: test-0ghpx, template=PodTemplate{...}
      ./custom/k8s.log:2018-09-03 12:18:14.008+0000 [id=629145] FINER o.c.j.p.k.KubernetesLauncher#launch: Removing Jenkins node: test-0ghpx
      ./custom/k8s.log:2018-09-03 12:18:14.008+0000 [id=629145] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent test-0ghpx
      ./custom/k8s.log:2018-09-03 12:18:14.009+0000 [id=629145] SEVERE o.c.j.p.k.KubernetesSlave#_terminate: Computer for agent is null: test-0ghpx
      ./custom/k8s.log:2018-09-03 12:18:15.568+0000 [id=640056] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: test-0ghpx, template=PodTemplate{...}
      ./custom/k8s.log:2018-09-03 12:18:15.569+0000 [id=640056] FINER o.c.j.p.k.KubernetesLauncher#launch: Removing Jenkins node: test-0ghpx
      ./custom/k8s.log:2018-09-03 12:18:15.569+0000 [id=640056] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent test-0ghpx
      ./custom/k8s.log:2018-09-03 12:18:15.570+0000 [id=640056] SEVERE o.c.j.p.k.KubernetesSlave#_terminate: Computer for agent is null: test-0ghpx
      ./slaves/test-0ghpx/slave.log:io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://k8s.local:6443/api/v1/namespaces/dev/pods. Message: pods "test-0ghpx" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=pods, name=test-0ghpx, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "test-0ghpx" already exists, metadata=ListMeta(resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
      

       

       

      So the failure raises like this:

      1. First thread schedules a pod
      2. Second thread tries to schedule pod, doesn't check that pod is already scheduled, fails with an attempt to create a pod with the same name, deletes the corresponding jenkins node 
      3. Third thread tries to schedule pod (node is already terminated)
      4. First and third threads wait for pod to be scheduled until timeout is reached, because schedule is impossible due to node is already killed. 
        We can face logs like this from the jnlp container of this pod:
      Aug 31, 2018 11:17:15 AM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnectAug 31, 2018 11:17:15 AM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnectINFO: Restarting agent via jenkins.slaves.restarter.UnixSlaveRestarter@2eb6111aAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main createEngineINFO: Setting up agent: test-0ghpxAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener <init>INFO: Jenkins agent is running in headless mode.Aug 31, 2018 11:17:17 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDirINFO: Using /home/jenkins/agent/remoting as a remoting work directoryBoth error and output logs will be printed to /home/jenkins/agent/remotingAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Locating server among [http://jenkins.local]Aug 31, 2018 11:17:17 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolveINFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]Aug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Agent discovery successful  Agent address: jenkins-jnlp.local  Agent port:    30150  Identity:      XXXXAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: HandshakingAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Connecting to jenkins-jnlp.local:30150Aug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Trying protocol: JNLP4-connectAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Remote identity confirmed: XXXXXXAug 31, 2018 11:17:17 AM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecvINFO: [JNLP4-connect connection to jenkins-jnlp.local/2.2.2.2:30150] Local headers refused by remote: Unknown client name: test-0ghpxAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Protocol JNLP4-connect encountered an unexpected exceptionjava.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: test-0ghpx at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223) at hudson.remoting.Engine.innerRun(Engine.java:609) at hudson.remoting.Engine.run(Engine.java:469)Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: test-0ghpx at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378) at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.onRecvClosed(ConnectionHeadersFilterLayer.java:433) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832) at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:172) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832) at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154) at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:48) at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:247) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1$1.run(Engine.java:94) at java.lang.Thread.run(Thread.java:748) Suppressed: java.nio.channels.ClosedChannelException ... 7 more
      Aug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Connecting to jenkins-jnlp.local:30150Aug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Server reports protocol JNLP4-plaintext not supported, skippingAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Protocol JNLP3-connect is not enabled, skippingAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Server reports protocol JNLP2-connect not supported, skippingAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener statusINFO: Server reports protocol JNLP-connect not supported, skippingAug 31, 2018 11:17:17 AM hudson.remoting.jnlp.Main$CuiListener errorSEVERE: The server rejected the connection: None of the protocols were acceptedjava.lang.Exception: The server rejected the connection: None of the protocols were accepted at hudson.remoting.Engine.onConnectionRejected(Engine.java:670) at hudson.remoting.Engine.innerRun(Engine.java:634) at hudson.remoting.Engine.run(Engine.java:469)
      

      For the successful creations i see only one thread which does the job.

      I've just created POC how we can mitigate the impact - but is looks much more like workaround (moreover - non-thread safe workaround) rather then proper fix

       

      Also it looks like the similar problem was described here: https://issues.jenkins-ci.org/browse/JENKINS-44042?focusedCommentId=311231&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-311231
      So probably this issue can be treated also as reproduction of that problem: JENKINS-44042

          [JENKINS-53427] Agent creation failure because of concurrent attempts to schedule a pod

          vlatombe do you know what's the concurrency model for cloud provisioner?

          Carlos Sanchez added a comment - vlatombe do you know what's the concurrency model for cloud provisioner?

          AFAICT from these logs, there are several threads calling KubernetesLauncher#launch.

          However, these are supposed to be initiated by Computer#connect, which includes some control to prevent multiple launches from happening (unless it is triggered using a 'force' attribute).

          Vincent Latombe added a comment - AFAICT from these logs, there are several threads calling KubernetesLauncher#launch . However, these are supposed to be initiated by Computer#connect , which includes some control to prevent multiple launches from happening (unless it is triggered using a 'force' attribute).

          this is mitigated in 1.13.9 as it won't wait for deleted pods
          https://github.com/jenkinsci/kubernetes-plugin/blob/master/CHANGELOG.md#1139

          please reopen if still happening in latest version

          Carlos Sanchez added a comment - this is mitigated in 1.13.9 as it won't wait for deleted pods https://github.com/jenkinsci/kubernetes-plugin/blob/master/CHANGELOG.md#1139 please reopen if still happening in latest version

          Dan Alvizu added a comment -

          We observed this in version 1.17.2 of the kubernetes-plugin. We had reached our maximum pod limit (100) in our kubernetes cluster, all of which had failing JNLP containers. They all had the same error: the master had refused the connection as it had an 'unknown client name'

          Agent JNLP container logs:

          Aug 14, 2019 9:51:40 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Protocol JNLP4-connect encountered an unexpected exception
          java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: proteus-apply-central-v95mf
          at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223)
          at hudson.remoting.Engine.innerRun(Engine.java:614)
          at hudson.remoting.Engine.run(Engine.java:474)
          Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: proteus-apply-central-v95mf
          at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378)
          at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.onRecvClosed(ConnectionHeadersFilterLayer.java:433)
          at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
          at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
          at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:172)
          at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
          at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
          at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:48)
          at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:247)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93)
          at java.lang.Thread.run(Thread.java:748)
          Suppressed: java.nio.channels.ClosedChannelException
          ... 7 more
          "Local headers refused by remote: Unknown client name:" 

           

          Here is the jenkins master log of a single one of these failing pods (ui-pipeline-shared-7sdt7). Apologies I did not discover[ the debug logging settings|https://github.com/jenkinsci/kubernetes-plugin#debugging] during our incident so I do not have much logs:

          k logs jenkins-icecream-68c6d8497d-xhl9x -n cicd | grep ui-pipeline-shared-7sdt7 (central/ping-services)
          INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-7sdt7
          WARNING: Failed to delete pod for agent cicd/ui-pipeline-shared-7sdt7: not found
          INFO: Disconnected computer ui-pipeline-shared-7sdt7
          ERROR: Failed to delete pod for agent cicd/ui-pipeline-shared-7sdt7: not found
          Disconnected computer ui-pipeline-shared-7sdt7
          INFO: Created Pod: cicd/ui-pipeline-shared-7sdt7
          INFO: Pod is running: cicd/ui-pipeline-shared-7sdt7
          WARNING: Error in provisioning; agent=KubernetesSlave name: ui-pipeline-shared-7sdt7, template=PodTemplate{inheritFrom='', name='ui-pipeline-shared', namespace='cicd', slaveConnectTimeout=1000, label='ui-pipeline-shared', nodeSelector='', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], yamls=[apiVersion: v1
          INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-7sdt7
          SEVERE: Computer for agent is null: ui-pipeline-shared-7sdt7
          FATAL: Computer for agent is null: ui-pipeline-shared-7sdt7
          INFO: [JNLP4-connect connection from 100.96.0.1/100.96.0.1:23036] Refusing headers from remote: Unknown client name: ui-pipeline-shared-7sdt7 

           

          As you can see here the termination of the ui-pipeline-shared-7sdt7 pod happens before the pod is 'created' or 'running'.

           

          Here are full master logs, ungrepped:

          Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
          INFO: Template for label ui-pipeline-shared: Kubernetes Pod Template
          Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave deleteSlavePod
          WARNING: Failed to delete pod for agent cicd/proteus-apply-central-8h7bv: not found
          Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Disconnected computer proteus-apply-central-8h7bv
          ERROR: Failed to delete pod for agent cicd/proteus-apply-central-8h7bv: not found
          Disconnected computer proteus-apply-central-8h7bv
          Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-kvc2j
          Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave deleteSlavePod
          WARNING: Failed to delete pod for agent null/ui-pipeline-shared-kvc2j: not found
          Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Disconnected computer ui-pipeline-shared-kvc2j
          ERROR: Failed to delete pod for agent null/ui-pipeline-shared-kvc2j: not found
          Disconnected computer ui-pipeline-shared-kvc2j
          Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
          INFO: Created Pod: cicd/ui-pipeline-shared-kvc2j
          WARNING: Error in provisioning; agent=KubernetesSlave name: proteus-apply-central-8h7bv, template=PodTemplate{inheritFrom='', name='proteus-apply-central', namespace='cicd', slaveConnectTimeout=1000, label='proteus-apply-central', nodeSelector='', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], yamls=[apiVersion: v1 kind: Pod metadata: labels: jenkins: runners annotations: iam.amazonaws.com/role: arn:aws:iam::208980577242:role/devtools_jenkins spec: containers: - name: proteus-apply image: docker.corp.pingidentity.com:5000/devtools/proteus/pipeline-images/proteus-apply:stable imagePullPolicy: Always command: - cat tty: true ]}java.lang.IllegalStateException: Node was deleted, computer is null
          at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:149)
          at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
          at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
          at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)Aug 14, 2019 11:09:33 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
          INFO: Pod is running: cicd/ui-pipeline-shared-kvc2j
          Aug 14, 2019 11:09:33 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
          WARNING: Error in provisioning; agent=KubernetesSlave name: ui-pipeline-shared-kvc2j, template=PodTemplate{inheritFrom='', name='ui-pipeline-shared', namespace='cicd', slaveConnectTimeout=1000, label='ui-pipeline-shared', nodeSelector='', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], yamls=[apiVersion: v1 kind: Pod metadata: labels: jenkins: runners annotations: #kube2iam iam.amazonaws.com/role: arn:aws:iam::208980577242:role/devtools_jenkins spec: containers: - command: - cat image: docker.corp.pingidentity.com:5000/ping-base/node-builder:10 imagePullPolicy: Always name: node-builder tty: true - command: - cat # This uses the image built on the 'icecream' feature branch here: https://gitlab.corp.pingidentity.com/platform-pipeline/platform-js-static-analysis-service/tree/icecream. image: docker.corp.pingidentity.com:5000/platform-pipeline/platform-js-static-analysis-service:icecream imagePullPolicy: Always name: platform-js-static-analysis-service tty: true env: - name: PING_SONAR_PASSWORD valueFrom: secretKeyRef: name: sonarqube key: password - command: - cat # This uses the image built on the https://gitlab.corp.pingidentity.com/devtools/icecream/cdn_deployer image: docker.corp.pingidentity.com:5000/devtools/icecream/cdn_deployer:stable imagePullPolicy: Always name: cdn-deploy tty: true ]}java.lang.IllegalStateException: Node was deleted, computer is null
          at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:149)
          at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
          at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
          at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)Aug 14, 2019 11:09:33 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-kvc2j
          Aug 14, 2019 11:09:33 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          SEVERE: Computer for agent is null: ui-pipeline-shared-kvc2j
          FATAL: Computer for agent is null: ui-pipeline-shared-kvc2j
          Aug 14, 2019 11:09:34 PM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv
          INFO: [JNLP4-connect connection from 100.96.0.1/100.96.0.1:28564] Refusing headers from remote: Unknown client name: ui-pipeline-shared-kvc2j
           

           

          Dan Alvizu added a comment - We observed this in version 1.17.2 of the kubernetes-plugin. We had reached our maximum pod limit (100) in our kubernetes cluster, all of which had failing JNLP containers. They all had the same error: the master had refused the connection as it had an 'unknown client name' Agent JNLP container logs: Aug 14, 2019 9:51:40 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Protocol JNLP4-connect encountered an unexpected exception java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: proteus-apply-central-v95mf at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223) at hudson.remoting.Engine.innerRun(Engine.java:614) at hudson.remoting.Engine.run(Engine.java:474) Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: proteus-apply-central-v95mf at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378) at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.onRecvClosed(ConnectionHeadersFilterLayer.java:433) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816) at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:172) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816) at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154) at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:48) at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:247) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93) at java.lang. Thread .run( Thread .java:748) Suppressed: java.nio.channels.ClosedChannelException ... 7 more "Local headers refused by remote: Unknown client name:"   Here is the jenkins master log of a single one of these failing pods (ui-pipeline-shared-7sdt7). Apologies I did not discover[ the debug logging settings| https://github.com/jenkinsci/kubernetes-plugin#debugging ] during our incident so I do not have much logs: k logs jenkins-icecream-68c6d8497d-xhl9x -n cicd | grep ui-pipeline-shared-7sdt7 (central/ping-services) INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-7sdt7 WARNING: Failed to delete pod for agent cicd/ui-pipeline-shared-7sdt7: not found INFO: Disconnected computer ui-pipeline-shared-7sdt7 ERROR: Failed to delete pod for agent cicd/ui-pipeline-shared-7sdt7: not found Disconnected computer ui-pipeline-shared-7sdt7 INFO: Created Pod: cicd/ui-pipeline-shared-7sdt7 INFO: Pod is running: cicd/ui-pipeline-shared-7sdt7 WARNING: Error in provisioning; agent=KubernetesSlave name: ui-pipeline-shared-7sdt7, template=PodTemplate{inheritFrom= '', name=' ui-pipeline-shared ', namespace=' cicd ', slaveConnectTimeout=1000, label=' ui-pipeline-shared ', nodeSelector=' ', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory= false ], yamls=[apiVersion: v1 INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-7sdt7 SEVERE: Computer for agent is null : ui-pipeline-shared-7sdt7 FATAL: Computer for agent is null : ui-pipeline-shared-7sdt7 INFO: [JNLP4-connect connection from 100.96.0.1/100.96.0.1:23036] Refusing headers from remote: Unknown client name: ui-pipeline-shared-7sdt7   As you can see here the termination of the ui-pipeline-shared-7sdt7 pod happens before the pod is 'created' or 'running'.   Here are full master logs, ungrepped: Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label ui-pipeline-shared: Kubernetes Pod Template Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave deleteSlavePod WARNING: Failed to delete pod for agent cicd/proteus-apply-central-8h7bv: not found Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Disconnected computer proteus-apply-central-8h7bv ERROR: Failed to delete pod for agent cicd/proteus-apply-central-8h7bv: not found Disconnected computer proteus-apply-central-8h7bv Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-kvc2j Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave deleteSlavePod WARNING: Failed to delete pod for agent null /ui-pipeline-shared-kvc2j: not found Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Disconnected computer ui-pipeline-shared-kvc2j ERROR: Failed to delete pod for agent null /ui-pipeline-shared-kvc2j: not found Disconnected computer ui-pipeline-shared-kvc2j Aug 14, 2019 11:09:26 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: cicd/ui-pipeline-shared-kvc2j WARNING: Error in provisioning; agent=KubernetesSlave name: proteus-apply-central-8h7bv, template=PodTemplate{inheritFrom= '', name=' proteus-apply-central ', namespace=' cicd ', slaveConnectTimeout=1000, label=' proteus-apply-central ', nodeSelector=' ', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory= false ], yamls=[apiVersion: v1 kind: Pod metadata: labels: jenkins: runners annotations: iam.amazonaws.com/role: arn:aws:iam::208980577242:role/devtools_jenkins spec: containers: - name: proteus-apply image: docker.corp.pingidentity.com:5000/devtools/proteus/pipeline-images/proteus-apply:stable imagePullPolicy: Always command: - cat tty: true ]}java.lang.IllegalStateException: Node was deleted, computer is null at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:149) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748)Aug 14, 2019 11:09:33 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Pod is running: cicd/ui-pipeline-shared-kvc2j Aug 14, 2019 11:09:33 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch WARNING: Error in provisioning; agent=KubernetesSlave name: ui-pipeline-shared-kvc2j, template=PodTemplate{inheritFrom= '', name=' ui-pipeline-shared ', namespace=' cicd ', slaveConnectTimeout=1000, label=' ui-pipeline-shared ', nodeSelector=' ', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory= false ], yamls=[apiVersion: v1 kind: Pod metadata: labels: jenkins: runners annotations: #kube2iam iam.amazonaws.com/role: arn:aws:iam::208980577242:role/devtools_jenkins spec: containers: - command: - cat image: docker.corp.pingidentity.com:5000/ping-base/node-builder:10 imagePullPolicy: Always name: node-builder tty: true - command: - cat # This uses the image built on the ' icecream' feature branch here: https: //gitlab.corp.pingidentity.com/platform-pipeline/platform-js- static -analysis-service/tree/icecream. image: docker.corp.pingidentity.com:5000/platform-pipeline/platform-js- static -analysis-service:icecream imagePullPolicy: Always name: platform-js- static -analysis-service tty: true env: - name: PING_SONAR_PASSWORD valueFrom: secretKeyRef: name: sonarqube key: password - command: - cat # This uses the image built on the https://gitlab.corp.pingidentity.com/devtools/icecream/cdn_deployer image: docker.corp.pingidentity.com:5000/devtools/icecream/cdn_deployer:stable imagePullPolicy: Always name: cdn-deploy tty: true ]}java.lang.IllegalStateException: Node was deleted, computer is null at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:149) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748)Aug 14, 2019 11:09:33 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-kvc2j Aug 14, 2019 11:09:33 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate SEVERE: Computer for agent is null : ui-pipeline-shared-kvc2j FATAL: Computer for agent is null : ui-pipeline-shared-kvc2j Aug 14, 2019 11:09:34 PM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv INFO: [JNLP4-connect connection from 100.96.0.1/100.96.0.1:28564] Refusing headers from remote: Unknown client name: ui-pipeline-shared-kvc2j  

          Dan Alvizu added a comment -

          During our incident, deleting 1-3 pods would result in more pods coming up with the same error from the JNLP container. 

          However, when we deleted all 100 of them, the next pods to start did not have this error.

          Script used to delete all the failed slaves in our 'cicd' namespace:

          for pod in $(kubectl get pods -n cicd -ojson -ljenkins=slave | jq '.items[].metadata.name' -r); do kubectl delete -n cicd pod "$pod" --wait=false; done 

          I will try to reproduce this issue, but I believe that the high number of running pods is critical here.

          Dan Alvizu added a comment - During our incident, deleting 1-3 pods would result in more pods coming up with the same error from the JNLP container.  However, when we deleted all 100 of them, the next pods to start did not have this error. Script used to delete all the failed slaves in our 'cicd' namespace: for pod in $(kubectl get pods -n cicd -ojson -ljenkins=slave | jq '.items[].metadata.name' -r); do kubectl delete -n cicd pod "$pod" --wait= false ; done I will try to reproduce this issue, but I believe that the high number of running pods is critical here.

          Dan Alvizu added a comment -

          re-opening as requested in previous comment

          Dan Alvizu added a comment - re-opening as requested in previous comment

          Dan Alvizu added a comment -

          Looking through the code base I'm unclear where this could be happening. This is where things are first being terminated:

          org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate

           

          So this must be called incorrectly somewhere. However it can only be called by:

          AbstractCloudSlave.terminate() (hudson.slaves)

          Which in turn can only be called from one of three places – I hope I'm missing something here:

          1. KubernetesLauncher.launch(SlaveComputer, TaskListener) (org.csanchez.jenkins.plugins.kubernetes)
          2. CloudRetentionStrategy.check(AbstractCloudComputer) (hudson.slaves)
          3. AbstractCloudComputer.doDoDelete() (hudson.slaves)
            DeleteNodeCommand.run() (hudson.cli)

           

          #1 can't happen because that requires a log statement {{"Error in provisioning; agent=%s, template=%s" }}to happen before the observed log statement, where instead it appears after (on the second termination).

          #2 can't be happening, as i logs a "Disconnecting {0}" with computer.name() – these only appear for "jnlp-" computers

          #3 can't  be happening as I'm not running any CLI commands at the time

           

          Dan Alvizu added a comment - Looking through the code base I'm unclear where this could be happening. This is where things are first being terminated: org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate   So this must be called incorrectly somewhere. However it can only be called by: AbstractCloudSlave.terminate() (hudson.slaves) Which in turn can only be called from one of three places – I hope I'm missing something here: KubernetesLauncher.launch(SlaveComputer, TaskListener) (org.csanchez.jenkins.plugins.kubernetes) CloudRetentionStrategy.check(AbstractCloudComputer) (hudson.slaves) AbstractCloudComputer.doDoDelete() (hudson.slaves) DeleteNodeCommand.run() (hudson.cli)   #1 can't happen because that requires a log statement {{"Error in provisioning; agent=%s, template=%s" }}to happen before the observed log statement, where instead it appears after (on the second termination). #2 can't be happening, as i logs a  "Disconnecting {0}" with computer.name() – these only appear for "jnlp-" computers #3 can't  be happening as I'm not running any CLI commands at the time  

          Mykhailo Zolotarenko added a comment - - edited

          Additional logs to the previous comment from Dan Alvizu

           

          Full JNLP container logs:

          Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up agent: ui-pipeline-shared-jbr6l
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Oct 09, 2019 8:58:32 AM hudson.remoting.Engine startEngine
          INFO: Using Remoting version: 3.29
          Oct 09, 2019 8:58:32 AM hudson.remoting.Engine startEngine
          WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [https://jenkins-icecream.some.tools/]
          Oct 09, 2019 8:58:32 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
          INFO: Remoting server accepts the following protocols: [JNLP4-connect, some]
          Oct 09, 2019 8:58:32 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
          INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Agent discovery successful
            Agent address: jenkins-icecream-agent.some.tools
            Agent port:    50000
            Identity:      11:11:11:11:11:11:11:11:11:11:11:11:11
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to jenkins-icecream-agent.some.tools:50000
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Trying protocol: JNLP4-connect
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Remote identity confirmed: 54:90:09:b3:21:36:c7:d1:0d:3f:a6:b4:51:2c:12:61
          Oct 09, 2019 8:58:32 AM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv
          INFO: [JNLP4-connect connection to jenkins-icecream-agent.some.tools/10.75.115.45:50000] Local headers refused by remote: Unknown client name: ui-pipeline-shared-jbr6l
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Protocol JNLP4-connect encountered an unexpected exception
          java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: ui-pipeline-shared-jbr6l
          	at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223)
          	at hudson.remoting.Engine.innerRun(Engine.java:614)
          	at hudson.remoting.Engine.run(Engine.java:474)
          Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: ui-pipeline-shared-jbr6l
          	at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378)
          	at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.onRecvClosed(ConnectionHeadersFilterLayer.java:433)
          	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
          	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
          	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:172)
          	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
          	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
          	at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:48)
          	at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:247)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93)
          	at java.lang.Thread.run(Thread.java:748)
          	Suppressed: java.nio.channels.ClosedChannelException
          		... 7 more
          
          
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to jenkins-icecream-agent.some.tools:50000
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Server reports protocol JNLP4-plaintext not supported, skipping
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Protocol JNLP3-connect is not enabled, skipping
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Server reports protocol JNLP2-connect not supported, skipping
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Server reports protocol JNLP-connect not supported, skipping
          Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: The server rejected the connection: None of the protocols were accepted
          java.lang.Exception: The server rejected the connection: None of the protocols were accepted
          	at hudson.remoting.Engine.onConnectionRejected(Engine.java:682)
          	at hudson.remoting.Engine.innerRun(Engine.java:639)
          	at hudson.remoting.Engine.run(Engine.java:474)
          
          
           
          

           

          Jenkins Master logs:

          Oct 10, 2019 2:41:53 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesClientProvider$PurgeExpiredKubernetesClients
          FINEST: Finished Purge expired KubernetesClients. 0 ms
          Oct 10, 2019 2:41:53 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesClientProvider gracefulClose
          INFO: Closing io.fabric8.kubernetes.client.DefaultKubernetesClient@39b17b63
          Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          SEVERE: Computer for agent is null: ui-pipeline-shared-jbr6l
          Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud
          FINE: Connected to Kubernetes some.us-east-2.k8s.some.net URL https://api.some.us-east-2.k8s.some.net/
          Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud
          FINE: Building connection to Kubernetes some.us-east-2.k8s.some.net URL https://api.some.us-east-2.k8s.some.net namespace cicd
          Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-jbr6l
          Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher
          FINER: Removing Jenkins node: ui-pipeline-shared-jbr6l
          Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
          WARNING: Error in provisioning; agent=KubernetesSlave name: ui-pipeline-shared-jbr6l, template=PodTemplate{inheritFrom='', name='ui-pipeline-shared', namespace='cicd', slaveConnectTimeout=1000, label='ui-pipeline-shared', nodeSelector='', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], yamls=[apiVersion: v1
          kind: Pod
          metadata:
            labels:
              jenkins: runners
            annotations:
              #kube2iam
              iam.amazonaws.com/role: arn:aws:iam::11111111111:role/some
          spec:
            containers:
            - command:
              - cat
              image: docker.corp.some.com:5000/some/node-builder:10
              imagePullPolicy: Always
              name: node-builder
              tty: true
            - command:
              - cat
              # This uses the image built on the 'icecream' feature branch here: https://gitlab.corp.some.com/platform-pipeline/platform-js-static-analysis-service/tree/icecream.
              image: docker.corp.some.com:5000/platform-pipeline/platform-js-static-analysis-service:icecream
              imagePullPolicy: Always
              name: platform-js-static-analysis-service
              tty: true
            - command:
              - cat
              # This uses the image built on the https://gitlab.corp.some.com/devtools/icecream/cdn_deployer
              image: docker.corp.some.com:5000/devtools/icecream/cdn_deployer:stable
              imagePullPolicy: Always
              name: cdn-deploy
              tty: true
          ]}
          java.lang.IllegalStateException: Node was deleted, computer is null
          	at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:149)
          	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
          	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
          	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at java.lang.Thread.run(Thread.java:748)
          Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
          INFO: Pod is running: cicd/ui-pipeline-shared-jbr6l
          Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher
          FINE: All containers are running for pod ui-pipeline-shared-jbr6l
          Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher
          FINEST: [MODIFIED] ui-pipeline-shared-jbr6l
          Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud
          FINEST: Planned Kubernetes agents for template "Kubernetes Pod Template": 1
          Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud
          FINE: Connected to Kubernetes some.us-east-2.k8s.some.net URL https://api.some.us-east-2.k8s.some.net/
          Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud
          FINE: Building connection to Kubernetes some.us-east-2.k8s.some.net URL https://api.some.us-east-2.k8s.some.net namespace cicd
          Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
          INFO: Template for label ui-pipeline-shared: Kubernetes Pod Template
          Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
          INFO: Excess workload after pending Kubernetes agents: 1
          Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud
          FINE: In provisioning : []
          Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher
          FINEST: [MODIFIED] ui-pipeline-shared-jbr6l...
          Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Disconnected computer ui-pipeline-shared-jbr6l
          Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave deleteSlavePod
          WARNING: Failed to delete pod for agent cicd/ui-pipeline-shared-jbr6l: not found
          Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
          INFO: Created Pod: cicd/ui-pipeline-shared-jbr6l
          Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
          INFO: Created Pod: cicd/ui-pipeline-shared-kdvwl
          Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud
          FINE: Connected to Kubernetes some.us-east-2.k8s.some.net URL https://api.some.us-east-2.k8s.some.net/
          Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud
          FINE: Building connection to Kubernetes some.us-east-2.k8s.some.net URL https://api.some.us-east-2.k8s.some.net namespace cicd
          Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-jbr6l
          

           

           

           

           

          Mykhailo Zolotarenko added a comment - - edited Additional logs to the previous comment from Dan Alvizu   Full JNLP container logs: Warning: JnlpProtocol3 is disabled by default , use JNLP_PROTOCOL_OPTS to alter the behavior Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: ui-pipeline-shared-jbr6l Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Oct 09, 2019 8:58:32 AM hudson.remoting.Engine startEngine INFO: Using Remoting version: 3.29 Oct 09, 2019 8:58:32 AM hudson.remoting.Engine startEngine WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [https: //jenkins-icecream.some.tools/] Oct 09, 2019 8:58:32 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, some] Oct 09, 2019 8:58:32 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful   Agent address: jenkins-icecream-agent.some.tools   Agent port:    50000   Identity:      11:11:11:11:11:11:11:11:11:11:11:11:11 Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins-icecream-agent.some.tools:50000 Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: 54:90:09:b3:21:36:c7:d1:0d:3f:a6:b4:51:2c:12:61 Oct 09, 2019 8:58:32 AM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv INFO: [JNLP4-connect connection to jenkins-icecream-agent.some.tools/10.75.115.45:50000] Local headers refused by remote: Unknown client name: ui-pipeline-shared-jbr6l Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Protocol JNLP4-connect encountered an unexpected exception java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: ui-pipeline-shared-jbr6l at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223) at hudson.remoting.Engine.innerRun(Engine.java:614) at hudson.remoting.Engine.run(Engine.java:474) Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: ui-pipeline-shared-jbr6l at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378) at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.onRecvClosed(ConnectionHeadersFilterLayer.java:433) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816) at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:172) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816) at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154) at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:48) at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:247) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93) at java.lang. Thread .run( Thread .java:748) Suppressed: java.nio.channels.ClosedChannelException ... 7 more Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins-icecream-agent.some.tools:50000 Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server reports protocol JNLP4-plaintext not supported, skipping Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Protocol JNLP3-connect is not enabled, skipping Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server reports protocol JNLP2-connect not supported, skipping Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server reports protocol JNLP-connect not supported, skipping Oct 09, 2019 8:58:32 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: The server rejected the connection: None of the protocols were accepted java.lang.Exception: The server rejected the connection: None of the protocols were accepted at hudson.remoting.Engine.onConnectionRejected(Engine.java:682) at hudson.remoting.Engine.innerRun(Engine.java:639) at hudson.remoting.Engine.run(Engine.java:474)     Jenkins Master logs: Oct 10, 2019 2:41:53 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesClientProvider$PurgeExpiredKubernetesClients FINEST: Finished Purge expired KubernetesClients. 0 ms Oct 10, 2019 2:41:53 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesClientProvider gracefulClose INFO: Closing io.fabric8.kubernetes.client.DefaultKubernetesClient@39b17b63 Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate SEVERE: Computer for agent is null : ui-pipeline-shared-jbr6l Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud FINE: Connected to Kubernetes some.us-east-2.k8s.some.net URL https: //api.some.us-east-2.k8s.some.net/ Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud FINE: Building connection to Kubernetes some.us-east-2.k8s.some.net URL https: //api.some.us-east-2.k8s.some.net namespace cicd Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-jbr6l Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher FINER: Removing Jenkins node: ui-pipeline-shared-jbr6l Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch WARNING: Error in provisioning; agent=KubernetesSlave name: ui-pipeline-shared-jbr6l, template=PodTemplate{inheritFrom= '', name=' ui-pipeline-shared ', namespace=' cicd ', slaveConnectTimeout=1000, label=' ui-pipeline-shared ', nodeSelector=' ', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory= false ], yamls=[apiVersion: v1 kind: Pod metadata: labels: jenkins: runners annotations: #kube2iam iam.amazonaws.com/role: arn:aws:iam::11111111111:role/some spec: containers: - command: - cat image: docker.corp.some.com:5000/some/node-builder:10 imagePullPolicy: Always name: node-builder tty: true - command: - cat # This uses the image built on the 'icecream' feature branch here: https: //gitlab.corp.some.com/platform-pipeline/platform-js- static -analysis-service/tree/icecream. image: docker.corp.some.com:5000/platform-pipeline/platform-js- static -analysis-service:icecream imagePullPolicy: Always name: platform-js- static -analysis-service tty: true - command: - cat # This uses the image built on the https: //gitlab.corp.some.com/devtools/icecream/cdn_deployer image: docker.corp.some.com:5000/devtools/icecream/cdn_deployer:stable imagePullPolicy: Always name: cdn-deploy tty: true ]} java.lang.IllegalStateException: Node was deleted, computer is null at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:149) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Pod is running: cicd/ui-pipeline-shared-jbr6l Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher FINE: All containers are running for pod ui-pipeline-shared-jbr6l Oct 09, 2019 8:58:32 AM org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher FINEST: [MODIFIED] ui-pipeline-shared-jbr6l Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud FINEST: Planned Kubernetes agents for template "Kubernetes Pod Template" : 1 Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud FINE: Connected to Kubernetes some.us-east-2.k8s.some.net URL https: //api.some.us-east-2.k8s.some.net/ Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud FINE: Building connection to Kubernetes some.us-east-2.k8s.some.net URL https: //api.some.us-east-2.k8s.some.net namespace cicd Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Template for label ui-pipeline-shared: Kubernetes Pod Template Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision INFO: Excess workload after pending Kubernetes agents: 1 Oct 09, 2019 8:58:28 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud FINE: In provisioning : [] Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher FINEST: [MODIFIED] ui-pipeline-shared-jbr6l... Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Disconnected computer ui-pipeline-shared-jbr6l Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave deleteSlavePod WARNING: Failed to delete pod for agent cicd/ui-pipeline-shared-jbr6l: not found Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: cicd/ui-pipeline-shared-jbr6l Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch INFO: Created Pod: cicd/ui-pipeline-shared-kdvwl Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud FINE: Connected to Kubernetes some.us-east-2.k8s.some.net URL https: //api.some.us-east-2.k8s.some.net/ Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud FINE: Building connection to Kubernetes some.us-east-2.k8s.some.net URL https: //api.some.us-east-2.k8s.some.net namespace cicd Oct 09, 2019 8:58:18 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminating Kubernetes instance for agent ui-pipeline-shared-jbr6l        

          I can reproduce this issue when I disable “Do not allow concurrent builds” and run pipeline job for couple times at the same time

          Our agent using kub2iam and I don’t know when and why Jenkins-master had refused the connection as it had an 'unknown client name’… maybe this is due to kub2iam, which we use for our agent… maybe something else...

          1. Enable option “Do not allow concurrent builds” - this should help prevent this issue.

          2. Deleting Error pods -  should help prevent reaching the maximum pod limit in the kubernetes cluster

          3. Stop current job - should prevent to generate Error pods…

          I hope this helps

          Mykhailo Zolotarenko added a comment - I can reproduce this issue when I disable “Do not allow concurrent builds” and run pipeline job for couple times at the same time Our agent using kub2iam and I don’t know when and why Jenkins-master had refused the connection as it had an 'unknown client name’… maybe this is due to kub2iam, which we use for our agent… maybe something else... 1. Enable option “Do not allow concurrent builds” - this should help prevent this issue. 2. Deleting Error pods -  should help prevent reaching the maximum pod limit in the kubernetes cluster 3. Stop current job - should prevent to generate Error pods… I hope this helps

            Unassigned Unassigned
            fduch Alex Medvedev
            Votes:
            6 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated: