Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-35246

Kubernetes agents not getting deleted in Jenkins after pods are deleted

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • kubernetes-plugin
    • None

      When you run a failing pipeline job the kubernetes node responsible for doing the job does not get properly deleted and after a while appears as suspended.

      These are the logs when the node ab1c5bc857e82 was being deleted and it didn't:

      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:48 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab17c1549fbe3
      May 18, 2016 4:18:48 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:18:48 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab17c1549fbe3
      May 18, 2016 4:18:48 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab17c1549fbe3
      May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab183701b37bc
      May 18, 2016 4:18:51 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:18:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab183701b37bc
      May 18, 2016 4:18:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab183701b37bc
      May 18, 2016 4:18:54 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab1a1b9a6372f
      May 18, 2016 4:18:54 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:18:54 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab1a1b9a6372f
      May 18, 2016 4:18:54 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab1a1b9a6372f
      May 18, 2016 4:18:58 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab1a407748569
      May 18, 2016 4:18:58 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:18:58 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab1a407748569
      May 18, 2016 4:18:58 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab1a407748569
      May 18, 2016 4:19:01 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:01 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:11 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:11 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
      Consulting com.cloudbees.opscenter.provisioning.ProvisioningLimitEnforcer@4fddbafb provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
      Consulting com.cloudbees.opscenter.client.cloud.OperationsCenterNodeProvisioningStrategy@2bbfaebd provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
      Consulting hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:16 PM FINE hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      Excess workload 1 detected. (planned capacity=0.005,connecting capacity=0,Qlen=0.148,available=0.032&0,online=0,m=0.5)
      May 18, 2016 4:19:16 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
      Excess workload after pending Spot instances: 1
      May 18, 2016 4:19:16 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:16 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      Started provisioning Kubernetes Pod Template from Openshift with 1 executors. Remaining excess workload: 0
      May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
      Provisioning strategy hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 declared provisioning complete
      May 18, 2016 4:19:16 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:16 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
      Created Pod: ab1c5bc857e82
      May 18, 2016 4:19:16 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
      Waiting for Pod to be scheduled (0/100): ab1c5bc857e82
      May 18, 2016 4:19:20 PM FINE hudson.slaves.ChannelPinger install
      Set up a remote ping for ab1c5bc857e82
      May 18, 2016 4:19:20 PM FINE hudson.slaves.ChannelPinger setUpPingForChannel
      Ping thread started for hudson.remoting.Channel@579bb5f6:ab1c5bc857e82 with a 5 minute interval
      May 18, 2016 4:19:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 1 is less than the available capacity 1. No provisioning strategy required
      May 18, 2016 4:19:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 1. No provisioning strategy required
      May 18, 2016 4:19:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:21 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskAccepted
       Computer KubernetesComputer name: 
       slave: 
       taskAccepted
      May 18, 2016 4:19:21 PM FINE hudson.slaves.WorkspaceList acquire
      acquired /home/jenkins/workspace/test
      java.lang.Throwable: from hudson.slaves.WorkspaceList@3e1279c
              at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:261)
              at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:183)
              at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:167)
              at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:462)
              at hudson.model.ResourceController.execute(ResourceController.java:98)
              at hudson.model.Executor.run(Executor.java:410)
      
      May 18, 2016 4:19:22 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:23 PM FINE hudson.slaves.WorkspaceList _release
      releasing /home/jenkins/workspace/test with lock count 1
      java.lang.Throwable: from hudson.slaves.WorkspaceList@3e1279c
              at hudson.slaves.WorkspaceList._release(WorkspaceList.java:208)
              at hudson.slaves.WorkspaceList.access$300(WorkspaceList.java:46)
              at hudson.slaves.WorkspaceList$1.release(WorkspaceList.java:276)
              at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$Callback.finished(ExecutorStepExecution.java:403)
              at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onFailure(BodyExecutionCallback.java:123)
              at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:286)
              at com.cloudbees.groovy.cps.impl.ValueBoundContinuation.receive(ValueBoundContinuation.java:21)
              at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
              at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
              at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:164)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:277)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:77)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:186)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:184)
              at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:47)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
              at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      
      May 18, 2016 4:19:23 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskCompleted
       Computer KubernetesComputer name: 
       slave: 
       taskCompleted
      May 18, 2016 4:19:23 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab1c5bc857e82
      May 18, 2016 4:19:23 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:23 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab1c5bc857e82
      May 18, 2016 4:19:23 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab1c5bc857e82
      May 18, 2016 4:19:23 PM FINE hudson.slaves.ChannelPinger$2 onClosed
      Terminating ping thread for ab1c5bc857e82
      May 18, 2016 4:19:31 PM INFO hudson.slaves.NodeProvisioner$2 run
      Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s)
      May 18, 2016 4:19:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
      Consulting com.cloudbees.opscenter.provisioning.ProvisioningLimitEnforcer@4fddbafb provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=1, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
      Consulting com.cloudbees.opscenter.client.cloud.OperationsCenterNodeProvisioningStrategy@2bbfaebd provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=1, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
      Consulting hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=1, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:43 PM FINE hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      Excess workload 1 detected. (planned capacity=0.076,connecting capacity=0,Qlen=0.189,available=0.023&0,online=0,m=0.5)
      May 18, 2016 4:19:43 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
      Excess workload after pending Spot instances: 1
      May 18, 2016 4:19:43 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:43 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      Started provisioning Kubernetes Pod Template from Openshift with 1 executors. Remaining excess workload: 0
      May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
      Provisioning strategy hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 declared provisioning complete
      May 18, 2016 4:19:43 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:43 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
      Created Pod: ab1cc1126bf11
      May 18, 2016 4:19:43 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
      Waiting for Pod to be scheduled (0/100): ab1cc1126bf11
      May 18, 2016 4:19:47 PM FINE hudson.slaves.ChannelPinger install
      Set up a remote ping for ab1cc1126bf11
      May 18, 2016 4:19:47 PM FINE hudson.slaves.ChannelPinger setUpPingForChannel
      Ping thread started for hudson.remoting.Channel@398e1d68:ab1cc1126bf11 with a 5 minute interval
      May 18, 2016 4:19:48 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskAccepted
       Computer KubernetesComputer name: 
       slave: 
       taskAccepted
      May 18, 2016 4:19:48 PM FINE hudson.slaves.WorkspaceList acquire
      acquired /home/jenkins/workspace/test
      java.lang.Throwable: from hudson.slaves.WorkspaceList@450e50de
              at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:261)
              at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:183)
              at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:167)
              at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:462)
              at hudson.model.ResourceController.execute(ResourceController.java:98)
              at hudson.model.Executor.run(Executor.java:410)
      
      May 18, 2016 4:19:49 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:51 PM FINE hudson.slaves.WorkspaceList _release
      releasing /home/jenkins/workspace/test with lock count 1
      java.lang.Throwable: from hudson.slaves.WorkspaceList@450e50de
              at hudson.slaves.WorkspaceList._release(WorkspaceList.java:208)
              at hudson.slaves.WorkspaceList.access$300(WorkspaceList.java:46)
              at hudson.slaves.WorkspaceList$1.release(WorkspaceList.java:276)
              at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$Callback.finished(ExecutorStepExecution.java:403)
              at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onFailure(BodyExecutionCallback.java:123)
              at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:286)
              at com.cloudbees.groovy.cps.impl.ValueBoundContinuation.receive(ValueBoundContinuation.java:21)
              at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
              at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
              at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:164)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:277)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:77)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:186)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:184)
              at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:47)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
              at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      
      May 18, 2016 4:19:51 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskCompleted
       Computer KubernetesComputer name: 
       slave: 
       taskCompleted
      May 18, 2016 4:19:51 PM INFO hudson.slaves.NodeProvisioner$2 run
      Kubernetes Pod Template provisioning successfully completed. We have now 3 computer(s)
      May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab1cc1126bf11
      May 18, 2016 4:19:51 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab1cc1126bf11
      May 18, 2016 4:19:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab1cc1126bf11
      May 18, 2016 4:19:51 PM FINE hudson.slaves.ChannelPinger$2 onClosed
      Terminating ping thread for ab1cc1126bf11
      May 18, 2016 4:20:01 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:01 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:01 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:11 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:11 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:11 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      Clear
      

      To reproduce execute the following pipeline script on a dynamic node with kubernetes plugin:

      node {
          try {
             sh 'whoami'
          } finally {
             sh 'pwd'
             sh 'echo $HOME'
          }
      }
      

      whoami will fail causing the problem.

          [JENKINS-35246] Kubernetes agents not getting deleted in Jenkins after pods are deleted

          Alvaro Lobato created issue -

          Sebastien Vas added a comment -

          I am experiencing the same behavior. It is kind of frustrating to have to manually delete slaves. Restarting Jenkins does not resolve the issue. I am worried that the suspended slaves are still accounted for, or at least creating a slow memory leak.

          Sebastien Vas added a comment - I am experiencing the same behavior. It is kind of frustrating to have to manually delete slaves. Restarting Jenkins does not resolve the issue. I am worried that the suspended slaves are still accounted for, or at least creating a slow memory leak.
          R. Tyler Croy made changes -
          Workflow Original: JNJira [ 171484 ] New: JNJira + In-Review [ 184324 ]

          can't reproduce in GCE
          A kubectl describe and kubectl logs of the pod that stays running would help

          netsabes are you using OpenShift too?

          Carlos Sanchez added a comment - can't reproduce in GCE A kubectl describe and kubectl logs of the pod that stays running would help netsabes are you using OpenShift too?

          Sebastien Vas added a comment -

          Hi Carlos. Sorry for the late response I was out of the country. the slaves are actually gone from the kubernetes cluster, they are just listed as offline / suspended in Jenkins and since we keep creating them, they pile up.

          Sebastien Vas added a comment - Hi Carlos. Sorry for the late response I was out of the country. the slaves are actually gone from the kubernetes cluster, they are just listed as offline / suspended in Jenkins and since we keep creating them, they pile up.
          Elmar Weber made changes -
          Attachment New: suspended_pods.png [ 34792 ]

          Elmar Weber added a comment -

          csanchez netsabes I have the same issue, on our side it does not seem to relate to a specific command but from what we gathered it happens when the job on the node (within a pipeline) takes less than 1-2 seconds. In that case the reference to the node does not seem to get cleaned from Jenkins. If the node actions we are scripting take longer it is cleanly removed from jenkins. It always get's removed from the kubernetes cluster when it ends in this state. Furthermore the build image we are using is rather large, around 1.5GB. On smaller ones we could not reproduce this behaviour.

          According to the Jenkins logs certain checks / actions are only done once a second, clould this be related to why the proper shutdown / kill of the pod is not realized within the plugin?

          I attached a screenshot of the situation. The log entries for all pods that end up like this follow the same pattern as below:

          Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
          INFO: Created Pod: cupenya-root-docker-1460271f3921
          Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
          INFO: Waiting for Pod to be scheduled (0/100): cupenya-root-docker-1460271f3921
          --
          Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Terminating Kubernetes instance for slave cupenya-root-docker-1460271f3921
          --
          Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Terminated Kubernetes instance for slave cupenya-root-docker-1460271f3921
          Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Disconnected computer cupenya-root-docker-1460271f3921
          Nov 10, 2016 12:13:50 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
          WARNING: Computer.threadPoolForRemoting [#138] for cupenya-root-docker-1460271f3921 terminated
          --
          Nov 10, 2016 12:18:59 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
          WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
          --
          Nov 10, 2016 12:19:19 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
          WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
          --
          Nov 10, 2016 12:20:39 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
          WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
          --
          Nov 10, 2016 12:20:49 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
          WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
          --
          Nov 10, 2016 12:21:29 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
          WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
          --
          

          You can maybe try to reproduce it with this image as it is larger, in case that is the issue:
          cupenya/docker-jenkins-slave-cpy-root

          https://github.com/cupenya/docker-jenkins-slave-cpy-root/blob/master/Dockerfile

          I'm happy to provide more details or in case you can't reproduce this in your environment show you a live example when this happens.

          Elmar Weber added a comment - csanchez netsabes I have the same issue, on our side it does not seem to relate to a specific command but from what we gathered it happens when the job on the node (within a pipeline) takes less than 1-2 seconds. In that case the reference to the node does not seem to get cleaned from Jenkins. If the node actions we are scripting take longer it is cleanly removed from jenkins. It always get's removed from the kubernetes cluster when it ends in this state. Furthermore the build image we are using is rather large, around 1.5GB. On smaller ones we could not reproduce this behaviour. According to the Jenkins logs certain checks / actions are only done once a second, clould this be related to why the proper shutdown / kill of the pod is not realized within the plugin? I attached a screenshot of the situation. The log entries for all pods that end up like this follow the same pattern as below: Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call INFO: Created Pod: cupenya-root-docker-1460271f3921 Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call INFO: Waiting for Pod to be scheduled (0/100): cupenya-root-docker-1460271f3921 -- Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminating Kubernetes instance for slave cupenya-root-docker-1460271f3921 -- Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminated Kubernetes instance for slave cupenya-root-docker-1460271f3921 Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Disconnected computer cupenya-root-docker-1460271f3921 Nov 10, 2016 12:13:50 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed WARNING: Computer.threadPoolForRemoting [#138] for cupenya-root-docker-1460271f3921 terminated -- Nov 10, 2016 12:18:59 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:19:19 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:20:39 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:20:49 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:21:29 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- You can maybe try to reproduce it with this image as it is larger, in case that is the issue: cupenya/docker-jenkins-slave-cpy-root https://github.com/cupenya/docker-jenkins-slave-cpy-root/blob/master/Dockerfile I'm happy to provide more details or in case you can't reproduce this in your environment show you a live example when this happens.

          Elmar Weber added a comment -

          I forgot: They get however cleaned up after a while, I think around 12h or so automatically.

          Elmar Weber added a comment - I forgot: They get however cleaned up after a while, I think around 12h or so automatically.

          Are you saying that the Pods don't get deleted in kubernetes, the jenkins slave is not deleted in Jenkins, or both ?

          Carlos Sanchez added a comment - Are you saying that the Pods don't get deleted in kubernetes, the jenkins slave is not deleted in Jenkins, or both ?

          Elmar Weber added a comment -

          To clarify:

          • The pods are terminated and deleted in kubernetes
          • the jenkins slave reference of them in Jenkins is not deleted when they are terminated and deleted in kubernetes

          Elmar Weber added a comment - To clarify: The pods are terminated and deleted in kubernetes the jenkins slave reference of them in Jenkins is not deleted when they are terminated and deleted in kubernetes

            vlatombe Vincent Latombe
            alobato Alvaro Lobato
            Votes:
            8 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: