Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-35246

Kubernetes agents not getting deleted in Jenkins after pods are deleted

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • kubernetes-plugin
    • None

      When you run a failing pipeline job the kubernetes node responsible for doing the job does not get properly deleted and after a while appears as suspended.

      These are the logs when the node ab1c5bc857e82 was being deleted and it didn't:

      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:48 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab17c1549fbe3
      May 18, 2016 4:18:48 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:18:48 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab17c1549fbe3
      May 18, 2016 4:18:48 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab17c1549fbe3
      May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:18:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab183701b37bc
      May 18, 2016 4:18:51 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:18:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab183701b37bc
      May 18, 2016 4:18:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab183701b37bc
      May 18, 2016 4:18:54 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab1a1b9a6372f
      May 18, 2016 4:18:54 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:18:54 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab1a1b9a6372f
      May 18, 2016 4:18:54 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab1a1b9a6372f
      May 18, 2016 4:18:58 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab1a407748569
      May 18, 2016 4:18:58 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:18:58 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab1a407748569
      May 18, 2016 4:18:58 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab1a407748569
      May 18, 2016 4:19:01 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:01 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:11 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:11 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
      Consulting com.cloudbees.opscenter.provisioning.ProvisioningLimitEnforcer@4fddbafb provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
      Consulting com.cloudbees.opscenter.client.cloud.OperationsCenterNodeProvisioningStrategy@2bbfaebd provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
      Consulting hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:16 PM FINE hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      Excess workload 1 detected. (planned capacity=0.005,connecting capacity=0,Qlen=0.148,available=0.032&0,online=0,m=0.5)
      May 18, 2016 4:19:16 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
      Excess workload after pending Spot instances: 1
      May 18, 2016 4:19:16 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:16 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      Started provisioning Kubernetes Pod Template from Openshift with 1 executors. Remaining excess workload: 0
      May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
      Provisioning strategy hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 declared provisioning complete
      May 18, 2016 4:19:16 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:16 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
      Created Pod: ab1c5bc857e82
      May 18, 2016 4:19:16 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
      Waiting for Pod to be scheduled (0/100): ab1c5bc857e82
      May 18, 2016 4:19:20 PM FINE hudson.slaves.ChannelPinger install
      Set up a remote ping for ab1c5bc857e82
      May 18, 2016 4:19:20 PM FINE hudson.slaves.ChannelPinger setUpPingForChannel
      Ping thread started for hudson.remoting.Channel@579bb5f6:ab1c5bc857e82 with a 5 minute interval
      May 18, 2016 4:19:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 1 is less than the available capacity 1. No provisioning strategy required
      May 18, 2016 4:19:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 1. No provisioning strategy required
      May 18, 2016 4:19:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:21 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskAccepted
       Computer KubernetesComputer name: 
       slave: 
       taskAccepted
      May 18, 2016 4:19:21 PM FINE hudson.slaves.WorkspaceList acquire
      acquired /home/jenkins/workspace/test
      java.lang.Throwable: from hudson.slaves.WorkspaceList@3e1279c
              at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:261)
              at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:183)
              at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:167)
              at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:462)
              at hudson.model.ResourceController.execute(ResourceController.java:98)
              at hudson.model.Executor.run(Executor.java:410)
      
      May 18, 2016 4:19:22 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:23 PM FINE hudson.slaves.WorkspaceList _release
      releasing /home/jenkins/workspace/test with lock count 1
      java.lang.Throwable: from hudson.slaves.WorkspaceList@3e1279c
              at hudson.slaves.WorkspaceList._release(WorkspaceList.java:208)
              at hudson.slaves.WorkspaceList.access$300(WorkspaceList.java:46)
              at hudson.slaves.WorkspaceList$1.release(WorkspaceList.java:276)
              at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$Callback.finished(ExecutorStepExecution.java:403)
              at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onFailure(BodyExecutionCallback.java:123)
              at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:286)
              at com.cloudbees.groovy.cps.impl.ValueBoundContinuation.receive(ValueBoundContinuation.java:21)
              at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
              at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
              at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:164)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:277)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:77)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:186)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:184)
              at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:47)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
              at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      
      May 18, 2016 4:19:23 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskCompleted
       Computer KubernetesComputer name: 
       slave: 
       taskCompleted
      May 18, 2016 4:19:23 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab1c5bc857e82
      May 18, 2016 4:19:23 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:23 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab1c5bc857e82
      May 18, 2016 4:19:23 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab1c5bc857e82
      May 18, 2016 4:19:23 PM FINE hudson.slaves.ChannelPinger$2 onClosed
      Terminating ping thread for ab1c5bc857e82
      May 18, 2016 4:19:31 PM INFO hudson.slaves.NodeProvisioner$2 run
      Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s)
      May 18, 2016 4:19:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
      Consulting com.cloudbees.opscenter.provisioning.ProvisioningLimitEnforcer@4fddbafb provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=1, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
      Consulting com.cloudbees.opscenter.client.cloud.OperationsCenterNodeProvisioningStrategy@2bbfaebd provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=1, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
      Consulting hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=1, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
      May 18, 2016 4:19:43 PM FINE hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      Excess workload 1 detected. (planned capacity=0.076,connecting capacity=0,Qlen=0.189,available=0.023&0,online=0,m=0.5)
      May 18, 2016 4:19:43 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
      Excess workload after pending Spot instances: 1
      May 18, 2016 4:19:43 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:43 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      Started provisioning Kubernetes Pod Template from Openshift with 1 executors. Remaining excess workload: 0
      May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
      Provisioning strategy hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 declared provisioning complete
      May 18, 2016 4:19:43 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:43 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
      Created Pod: ab1cc1126bf11
      May 18, 2016 4:19:43 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
      Waiting for Pod to be scheduled (0/100): ab1cc1126bf11
      May 18, 2016 4:19:47 PM FINE hudson.slaves.ChannelPinger install
      Set up a remote ping for ab1cc1126bf11
      May 18, 2016 4:19:47 PM FINE hudson.slaves.ChannelPinger setUpPingForChannel
      Ping thread started for hudson.remoting.Channel@398e1d68:ab1cc1126bf11 with a 5 minute interval
      May 18, 2016 4:19:48 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskAccepted
       Computer KubernetesComputer name: 
       slave: 
       taskAccepted
      May 18, 2016 4:19:48 PM FINE hudson.slaves.WorkspaceList acquire
      acquired /home/jenkins/workspace/test
      java.lang.Throwable: from hudson.slaves.WorkspaceList@450e50de
              at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:261)
              at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:183)
              at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:167)
              at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:462)
              at hudson.model.ResourceController.execute(ResourceController.java:98)
              at hudson.model.Executor.run(Executor.java:410)
      
      May 18, 2016 4:19:49 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:51 PM FINE hudson.slaves.WorkspaceList _release
      releasing /home/jenkins/workspace/test with lock count 1
      java.lang.Throwable: from hudson.slaves.WorkspaceList@450e50de
              at hudson.slaves.WorkspaceList._release(WorkspaceList.java:208)
              at hudson.slaves.WorkspaceList.access$300(WorkspaceList.java:46)
              at hudson.slaves.WorkspaceList$1.release(WorkspaceList.java:276)
              at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$Callback.finished(ExecutorStepExecution.java:403)
              at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onFailure(BodyExecutionCallback.java:123)
              at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:286)
              at com.cloudbees.groovy.cps.impl.ValueBoundContinuation.receive(ValueBoundContinuation.java:21)
              at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
              at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
              at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:164)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:277)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:77)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:186)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:184)
              at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:47)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
              at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      
      May 18, 2016 4:19:51 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskCompleted
       Computer KubernetesComputer name: 
       slave: 
       taskCompleted
      May 18, 2016 4:19:51 PM INFO hudson.slaves.NodeProvisioner$2 run
      Kubernetes Pod Template provisioning successfully completed. We have now 3 computer(s)
      May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:19:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminating Kubernetes instance for slave ab1cc1126bf11
      May 18, 2016 4:19:51 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
      Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
      May 18, 2016 4:19:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Terminated Kubernetes instance for slave ab1cc1126bf11
      May 18, 2016 4:19:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
      Disconnected computer ab1cc1126bf11
      May 18, 2016 4:19:51 PM FINE hudson.slaves.ChannelPinger$2 onClosed
      Terminating ping thread for ab1cc1126bf11
      May 18, 2016 4:20:01 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:01 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:01 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:11 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:11 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:11 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:21 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:31 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      May 18, 2016 4:20:41 PM FINER hudson.slaves.NodeProvisioner$2 run
      Queue length 0 is less than the available capacity 0. No provisioning strategy required
      Clear
      

      To reproduce execute the following pipeline script on a dynamic node with kubernetes plugin:

      node {
          try {
             sh 'whoami'
          } finally {
             sh 'pwd'
             sh 'echo $HOME'
          }
      }
      

      whoami will fail causing the problem.

          [JENKINS-35246] Kubernetes agents not getting deleted in Jenkins after pods are deleted

          Sebastien Vas added a comment -

          I am experiencing the same behavior. It is kind of frustrating to have to manually delete slaves. Restarting Jenkins does not resolve the issue. I am worried that the suspended slaves are still accounted for, or at least creating a slow memory leak.

          Sebastien Vas added a comment - I am experiencing the same behavior. It is kind of frustrating to have to manually delete slaves. Restarting Jenkins does not resolve the issue. I am worried that the suspended slaves are still accounted for, or at least creating a slow memory leak.

          can't reproduce in GCE
          A kubectl describe and kubectl logs of the pod that stays running would help

          netsabes are you using OpenShift too?

          Carlos Sanchez added a comment - can't reproduce in GCE A kubectl describe and kubectl logs of the pod that stays running would help netsabes are you using OpenShift too?

          Sebastien Vas added a comment -

          Hi Carlos. Sorry for the late response I was out of the country. the slaves are actually gone from the kubernetes cluster, they are just listed as offline / suspended in Jenkins and since we keep creating them, they pile up.

          Sebastien Vas added a comment - Hi Carlos. Sorry for the late response I was out of the country. the slaves are actually gone from the kubernetes cluster, they are just listed as offline / suspended in Jenkins and since we keep creating them, they pile up.

          Elmar Weber added a comment -

          csanchez netsabes I have the same issue, on our side it does not seem to relate to a specific command but from what we gathered it happens when the job on the node (within a pipeline) takes less than 1-2 seconds. In that case the reference to the node does not seem to get cleaned from Jenkins. If the node actions we are scripting take longer it is cleanly removed from jenkins. It always get's removed from the kubernetes cluster when it ends in this state. Furthermore the build image we are using is rather large, around 1.5GB. On smaller ones we could not reproduce this behaviour.

          According to the Jenkins logs certain checks / actions are only done once a second, clould this be related to why the proper shutdown / kill of the pod is not realized within the plugin?

          I attached a screenshot of the situation. The log entries for all pods that end up like this follow the same pattern as below:

          Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
          INFO: Created Pod: cupenya-root-docker-1460271f3921
          Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
          INFO: Waiting for Pod to be scheduled (0/100): cupenya-root-docker-1460271f3921
          --
          Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Terminating Kubernetes instance for slave cupenya-root-docker-1460271f3921
          --
          Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Terminated Kubernetes instance for slave cupenya-root-docker-1460271f3921
          Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Disconnected computer cupenya-root-docker-1460271f3921
          Nov 10, 2016 12:13:50 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
          WARNING: Computer.threadPoolForRemoting [#138] for cupenya-root-docker-1460271f3921 terminated
          --
          Nov 10, 2016 12:18:59 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
          WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
          --
          Nov 10, 2016 12:19:19 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
          WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
          --
          Nov 10, 2016 12:20:39 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
          WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
          --
          Nov 10, 2016 12:20:49 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
          WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
          --
          Nov 10, 2016 12:21:29 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
          WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
          --
          

          You can maybe try to reproduce it with this image as it is larger, in case that is the issue:
          cupenya/docker-jenkins-slave-cpy-root

          https://github.com/cupenya/docker-jenkins-slave-cpy-root/blob/master/Dockerfile

          I'm happy to provide more details or in case you can't reproduce this in your environment show you a live example when this happens.

          Elmar Weber added a comment - csanchez netsabes I have the same issue, on our side it does not seem to relate to a specific command but from what we gathered it happens when the job on the node (within a pipeline) takes less than 1-2 seconds. In that case the reference to the node does not seem to get cleaned from Jenkins. If the node actions we are scripting take longer it is cleanly removed from jenkins. It always get's removed from the kubernetes cluster when it ends in this state. Furthermore the build image we are using is rather large, around 1.5GB. On smaller ones we could not reproduce this behaviour. According to the Jenkins logs certain checks / actions are only done once a second, clould this be related to why the proper shutdown / kill of the pod is not realized within the plugin? I attached a screenshot of the situation. The log entries for all pods that end up like this follow the same pattern as below: Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call INFO: Created Pod: cupenya-root-docker-1460271f3921 Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call INFO: Waiting for Pod to be scheduled (0/100): cupenya-root-docker-1460271f3921 -- Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminating Kubernetes instance for slave cupenya-root-docker-1460271f3921 -- Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminated Kubernetes instance for slave cupenya-root-docker-1460271f3921 Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Disconnected computer cupenya-root-docker-1460271f3921 Nov 10, 2016 12:13:50 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed WARNING: Computer.threadPoolForRemoting [#138] for cupenya-root-docker-1460271f3921 terminated -- Nov 10, 2016 12:18:59 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:19:19 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:20:39 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:20:49 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:21:29 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- You can maybe try to reproduce it with this image as it is larger, in case that is the issue: cupenya/docker-jenkins-slave-cpy-root https://github.com/cupenya/docker-jenkins-slave-cpy-root/blob/master/Dockerfile I'm happy to provide more details or in case you can't reproduce this in your environment show you a live example when this happens.

          Elmar Weber added a comment -

          I forgot: They get however cleaned up after a while, I think around 12h or so automatically.

          Elmar Weber added a comment - I forgot: They get however cleaned up after a while, I think around 12h or so automatically.

          Are you saying that the Pods don't get deleted in kubernetes, the jenkins slave is not deleted in Jenkins, or both ?

          Carlos Sanchez added a comment - Are you saying that the Pods don't get deleted in kubernetes, the jenkins slave is not deleted in Jenkins, or both ?

          Elmar Weber added a comment -

          To clarify:

          • The pods are terminated and deleted in kubernetes
          • the jenkins slave reference of them in Jenkins is not deleted when they are terminated and deleted in kubernetes

          Elmar Weber added a comment - To clarify: The pods are terminated and deleted in kubernetes the jenkins slave reference of them in Jenkins is not deleted when they are terminated and deleted in kubernetes

          Damon Osgood added a comment -

          I am just now setting up Jenkins with Kubernetes slaves, and I do see this problem. I had thought it might have to do with failed builds, but extremely quick builds could also be the key.

          Damon Osgood added a comment - I am just now setting up Jenkins with Kubernetes slaves, and I do see this problem. I had thought it might have to do with failed builds, but extremely quick builds could also be the key.

          Elmar Weber added a comment -

          Hello, is there any update on this or an idea where to check?

           

          If there is anything we can do to help debug this or how to start looking to fix this would be great, we currently considering writing a script that cleans up these nodes, we may as well invest the effort in fixing it in the plugin =) a quick pointer of where to start looking would be great.

          Elmar Weber added a comment - Hello, is there any update on this or an idea where to check?   If there is anything we can do to help debug this or how to start looking to fix this would be great, we currently considering writing a script that cleans up these nodes, we may as well invest the effort in fixing it in the plugin =) a quick pointer of where to start looking would be great.

          Elmar Weber added a comment - - edited

          We build a simple scriptlet that we scheduled to take care of this, nothing fancy, in case anyone looks for something similar:

          for (aSlave in hudson.model.Hudson.instance.slaves) {
          {{    if (aSlave.getComputer().isOffline()) {}}}
          {{      println(aSlave.name + ' Deleted');}}}
          {{      aSlave.getComputer().doDoDelete();}}}
          {{    }}}
          }

          From http://stackoverflow.com/questions/24072354/jenkins-is-there-a-way-to-remove-all-offline-nodes-slaves-batch-remove-nod

          Elmar Weber added a comment - - edited We build a simple scriptlet that we scheduled to take care of this, nothing fancy, in case anyone looks for something similar: for (aSlave in hudson.model.Hudson.instance.slaves) { {{    if (aSlave.getComputer().isOffline()) {}}} {{      println(aSlave.name + ' Deleted');}}} {{      aSlave.getComputer().doDoDelete();}}} {{    }}} } From http://stackoverflow.com/questions/24072354/jenkins-is-there-a-way-to-remove-all-offline-nodes-slaves-batch-remove-nod

          Some notes after my tests on https://github.com/jenkinsci/kubernetes-plugin/commit/91ec6e1f2805496e74e5222849e26227110f7983:

          • The executor is deleted after end of job, but immediately creates new with some name (KubernetesComputer constructor is called second time in KubernetesSlave.createComputer after KubernetesSlave._terminate).
          • It's not depends from status of build.
          • I'm not sure, but in my case, it repeats periodically for short jobs (duration less than 5-15s), for other works correctly.

           

           

          I'll try create more detailed summary after that i learn how to build Jenkins.

          Alexander Dokuchaev added a comment - Some notes after my tests on https://github.com/jenkinsci/kubernetes-plugin/commit/91ec6e1f2805496e74e5222849e26227110f7983: The executor is deleted after end of job, but immediately creates new with some name (KubernetesComputer constructor is called second time in KubernetesSlave.createComputer after KubernetesSlave._terminate). It's not depends from status of build. I'm not sure, but in my case, it repeats periodically for short jobs (duration less than 5-15s), for other works correctly.     I'll try create more detailed summary after that i learn how to build Jenkins.

          Updated diagram in attachments.

           

          Found quick but dirty solution:

          Diff for https://github.com/chancez/kubernetes-plugin/commit/91ec6e1f2805496e74e5222849e26227110f7983

           

          --- a/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesSlave.java
          +++ b/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesSlave.java
          @@ -119,6 +119,8 @@ public class KubernetesSlave extends AbstractCloudSlave {
          protected void _terminate(TaskListener listener) throws IOException, InterruptedException {
          LOGGER.log(Level.INFO, "Terminating Kubernetes instance for slave {0}", name);
          
          {+this.setNumExecutors(0);+}
          
          Computer computer = toComputer();
          if (computer == null) {
          String msg = String.format("Computer for slave is null: %s", name);

           

          In this case, AbstrcuctCIBase.updateComputers dont call KubernetesSlave.createComputer in second time. 

           

          Fast tests have not detect any exceptions, errors and suspended executors.

          Hope that it helps you.

           

          Alexander Dokuchaev added a comment - Updated diagram in attachments.   Found quick but dirty solution: Diff for  https://github.com/chancez/kubernetes-plugin/commit/91ec6e1f2805496e74e5222849e26227110f7983   --- a/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesSlave.java +++ b/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesSlave.java @@ -119,6 +119,8 @@ public class KubernetesSlave extends AbstractCloudSlave { protected void _terminate(TaskListener listener) throws IOException, InterruptedException { LOGGER.log(Level.INFO, "Terminating Kubernetes instance for slave {0}" , name); {+ this .setNumExecutors(0);+} Computer computer = toComputer(); if (computer == null ) { String msg = String .format( "Computer for slave is null : %s" , name);   In this case, AbstrcuctCIBase.updateComputers dont call KubernetesSlave.createComputer in second time.    Fast tests have not detect any exceptions, errors and suspended executors. Hope that it helps you.  

          I have the same problem. No pod gets deleted regardless of the tasks outcome and finally all pods end in Error state till the maximum number of concurrent pods allowed is reached and my whole Jenkins comes to a halt. Using the solution suggested by adokuchaev fixes the problem and all pods get deleted.

          adokuchaev, maybe you could provide the code change as pull request on github?

          Mathias Rühle added a comment - I have the same problem. No pod gets deleted regardless of the tasks outcome and finally all pods end in Error state till the maximum number of concurrent pods allowed is reached and my whole Jenkins comes to a halt. Using the solution suggested by adokuchaev fixes the problem and all pods get deleted. adokuchaev , maybe you could provide the code change as pull request on github?

          I just found the issue JENKINS-45910 and setting the pod namespace to the same value as the cloud namespace fixes the problem.

          Mathias Rühle added a comment - I just found the issue JENKINS-45910 and setting the pod namespace to the same value as the cloud namespace fixes the problem.

          I have added checks to the tests to ensure the nodes are deleted after execution, and so far it all works as expected https://github.com/jenkinsci/kubernetes-plugin/pull/201

          Carlos Sanchez added a comment - I have added checks to the tests to ensure the nodes are deleted after execution, and so far it all works as expected https://github.com/jenkinsci/kubernetes-plugin/pull/201

          Alvaro Lobato added a comment - - edited

          csanchez I could only reproduce it if the job failed. Do you have any failing job in the tests?

          Alvaro Lobato added a comment - - edited csanchez  I could only reproduce it if the job failed. Do you have any failing job in the tests?

          Added a test where job fails and still passing.
          To be fair I've seen this error sometimes but couldn't find a reproducer yet

          Carlos Sanchez added a comment - Added a test where job fails and still passing. To be fair I've seen this error sometimes but couldn't find a reproducer yet

          I've seen it again, a very short job just running hostname. The pod is deleted but the node is left around in jenkins and marked offline.

          Carlos Sanchez added a comment - I've seen it again, a very short job just running hostname . The pod is deleted but the node is left around in jenkins and marked offline.

          Carlos Sanchez added a comment - PR at https://github.com/jenkinsci/kubernetes-plugin/pull/217

          krish2467 added a comment -

          i have been facing issue in this kubernetes slave..

          i am using K8 1.8 version 

          and i am have connected kubernetes connection succesful.. my slave pod build-fd56n is running in kubernetes succesfully and  slave build-fd56n is offline in jenkins..  

          why my jenkins job is running any idea please..

          pod template name : build  

          container template : jnlp

          image host/build-dotnet 

          arg cat

          tty : true.

          in kubernetes only one container is running name is jnlp with my image. problem is my jenkins  not able run the job using my image????

          krish2467 added a comment - i have been facing issue in this kubernetes slave.. i am using K8 1.8 version  and i am have connected kubernetes connection succesful.. my slave pod build-fd56n is running in kubernetes succesfully and  slave build-fd56n is offline in jenkins..   why my jenkins job is running any idea please.. pod template name : build   container template : jnlp image host/build-dotnet  arg cat tty : true. in kubernetes only one container is running name is jnlp with my image. problem is my jenkins  not able run the job using my image????

            vlatombe Vincent Latombe
            alobato Alvaro Lobato
            Votes:
            8 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: