[JENKINS-35246] Kubernetes agents not getting deleted in Jenkins after pods are deleted

Type: Bug
Resolution: Fixed
Priority: Minor
Component/s: kubernetes-plugin
Labels:
None

Similar Issues:
Powered by SuggestiMate

Show

When you run a failing pipeline job the kubernetes node responsible for doing the job does not get properly deleted and after a while appears as suspended.

These are the logs when the node ab1c5bc857e82 was being deleted and it didn't:

May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:48 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminating Kubernetes instance for slave ab17c1549fbe3
May 18, 2016 4:18:48 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:18:48 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminated Kubernetes instance for slave ab17c1549fbe3
May 18, 2016 4:18:48 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Disconnected computer ab17c1549fbe3
May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:51 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:18:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminating Kubernetes instance for slave ab183701b37bc
May 18, 2016 4:18:51 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:18:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminated Kubernetes instance for slave ab183701b37bc
May 18, 2016 4:18:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Disconnected computer ab183701b37bc
May 18, 2016 4:18:54 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminating Kubernetes instance for slave ab1a1b9a6372f
May 18, 2016 4:18:54 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:18:54 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminated Kubernetes instance for slave ab1a1b9a6372f
May 18, 2016 4:18:54 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Disconnected computer ab1a1b9a6372f
May 18, 2016 4:18:58 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminating Kubernetes instance for slave ab1a407748569
May 18, 2016 4:18:58 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:18:58 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminated Kubernetes instance for slave ab1a407748569
May 18, 2016 4:18:58 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Disconnected computer ab1a407748569
May 18, 2016 4:19:01 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:01 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:11 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:11 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
Consulting com.cloudbees.opscenter.provisioning.ProvisioningLimitEnforcer@4fddbafb provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
Consulting com.cloudbees.opscenter.client.cloud.OperationsCenterNodeProvisioningStrategy@2bbfaebd provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
Consulting hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
May 18, 2016 4:19:16 PM FINE hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Excess workload 1 detected. (planned capacity=0.005,connecting capacity=0,Qlen=0.148,available=0.032&0,online=0,m=0.5)
May 18, 2016 4:19:16 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
Excess workload after pending Spot instances: 1
May 18, 2016 4:19:16 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:19:16 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Started provisioning Kubernetes Pod Template from Openshift with 1 executors. Remaining excess workload: 0
May 18, 2016 4:19:16 PM FINER hudson.slaves.NodeProvisioner update
Provisioning strategy hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 declared provisioning complete
May 18, 2016 4:19:16 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:19:16 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
Created Pod: ab1c5bc857e82
May 18, 2016 4:19:16 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
Waiting for Pod to be scheduled (0/100): ab1c5bc857e82
May 18, 2016 4:19:20 PM FINE hudson.slaves.ChannelPinger install
Set up a remote ping for ab1c5bc857e82
May 18, 2016 4:19:20 PM FINE hudson.slaves.ChannelPinger setUpPingForChannel
Ping thread started for hudson.remoting.Channel@579bb5f6:ab1c5bc857e82 with a 5 minute interval
May 18, 2016 4:19:21 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 1 is less than the available capacity 1. No provisioning strategy required
May 18, 2016 4:19:21 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 1. No provisioning strategy required
May 18, 2016 4:19:21 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:21 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskAccepted
 Computer KubernetesComputer name: 
 slave: 
 taskAccepted
May 18, 2016 4:19:21 PM FINE hudson.slaves.WorkspaceList acquire
acquired /home/jenkins/workspace/test
java.lang.Throwable: from hudson.slaves.WorkspaceList@3e1279c
        at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:261)
        at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:183)
        at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:167)
        at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:462)
        at hudson.model.ResourceController.execute(ResourceController.java:98)
        at hudson.model.Executor.run(Executor.java:410)

May 18, 2016 4:19:22 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:19:23 PM FINE hudson.slaves.WorkspaceList _release
releasing /home/jenkins/workspace/test with lock count 1
java.lang.Throwable: from hudson.slaves.WorkspaceList@3e1279c
        at hudson.slaves.WorkspaceList._release(WorkspaceList.java:208)
        at hudson.slaves.WorkspaceList.access$300(WorkspaceList.java:46)
        at hudson.slaves.WorkspaceList$1.release(WorkspaceList.java:276)
        at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$Callback.finished(ExecutorStepExecution.java:403)
        at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onFailure(BodyExecutionCallback.java:123)
        at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:286)
        at com.cloudbees.groovy.cps.impl.ValueBoundContinuation.receive(ValueBoundContinuation.java:21)
        at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
        at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
        at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:164)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:277)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:77)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:186)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:184)
        at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:47)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
        at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

May 18, 2016 4:19:23 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskCompleted
 Computer KubernetesComputer name: 
 slave: 
 taskCompleted
May 18, 2016 4:19:23 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminating Kubernetes instance for slave ab1c5bc857e82
May 18, 2016 4:19:23 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:19:23 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminated Kubernetes instance for slave ab1c5bc857e82
May 18, 2016 4:19:23 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Disconnected computer ab1c5bc857e82
May 18, 2016 4:19:23 PM FINE hudson.slaves.ChannelPinger$2 onClosed
Terminating ping thread for ab1c5bc857e82
May 18, 2016 4:19:31 PM INFO hudson.slaves.NodeProvisioner$2 run
Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s)
May 18, 2016 4:19:31 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:31 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:31 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
Consulting com.cloudbees.opscenter.provisioning.ProvisioningLimitEnforcer@4fddbafb provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=1, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
Consulting com.cloudbees.opscenter.client.cloud.OperationsCenterNodeProvisioningStrategy@2bbfaebd provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=1, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
Consulting hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 provisioning strategy with state StrategyState{label=null, snapshot=LoadStatisticsSnapshot{definedExecutors=1, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=1}, plannedCapacitySnapshot=0, additionalPlannedCapacity=1}
May 18, 2016 4:19:43 PM FINE hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Excess workload 1 detected. (planned capacity=0.076,connecting capacity=0,Qlen=0.189,available=0.023&0,online=0,m=0.5)
May 18, 2016 4:19:43 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
Excess workload after pending Spot instances: 1
May 18, 2016 4:19:43 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:19:43 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Started provisioning Kubernetes Pod Template from Openshift with 1 executors. Remaining excess workload: 0
May 18, 2016 4:19:43 PM FINER hudson.slaves.NodeProvisioner update
Provisioning strategy hudson.slaves.NodeProvisioner$StandardStrategyImpl@1f4ccee0 declared provisioning complete
May 18, 2016 4:19:43 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:19:43 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
Created Pod: ab1cc1126bf11
May 18, 2016 4:19:43 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
Waiting for Pod to be scheduled (0/100): ab1cc1126bf11
May 18, 2016 4:19:47 PM FINE hudson.slaves.ChannelPinger install
Set up a remote ping for ab1cc1126bf11
May 18, 2016 4:19:47 PM FINE hudson.slaves.ChannelPinger setUpPingForChannel
Ping thread started for hudson.remoting.Channel@398e1d68:ab1cc1126bf11 with a 5 minute interval
May 18, 2016 4:19:48 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskAccepted
 Computer KubernetesComputer name: 
 slave: 
 taskAccepted
May 18, 2016 4:19:48 PM FINE hudson.slaves.WorkspaceList acquire
acquired /home/jenkins/workspace/test
java.lang.Throwable: from hudson.slaves.WorkspaceList@450e50de
        at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:261)
        at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:183)
        at hudson.slaves.WorkspaceList.allocate(WorkspaceList.java:167)
        at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$PlaceholderExecutable.run(ExecutorStepExecution.java:462)
        at hudson.model.ResourceController.execute(ResourceController.java:98)
        at hudson.model.Executor.run(Executor.java:410)

May 18, 2016 4:19:49 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:19:51 PM FINE hudson.slaves.WorkspaceList _release
releasing /home/jenkins/workspace/test with lock count 1
java.lang.Throwable: from hudson.slaves.WorkspaceList@450e50de
        at hudson.slaves.WorkspaceList._release(WorkspaceList.java:208)
        at hudson.slaves.WorkspaceList.access$300(WorkspaceList.java:46)
        at hudson.slaves.WorkspaceList$1.release(WorkspaceList.java:276)
        at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$Callback.finished(ExecutorStepExecution.java:403)
        at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onFailure(BodyExecutionCallback.java:123)
        at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$FailureAdapter.receive(CpsBodyExecution.java:286)
        at com.cloudbees.groovy.cps.impl.ValueBoundContinuation.receive(ValueBoundContinuation.java:21)
        at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
        at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
        at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:164)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:277)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:77)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:186)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:184)
        at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:47)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
        at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

May 18, 2016 4:19:51 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesComputer taskCompleted
 Computer KubernetesComputer name: 
 slave: 
 taskCompleted
May 18, 2016 4:19:51 PM INFO hudson.slaves.NodeProvisioner$2 run
Kubernetes Pod Template provisioning successfully completed. We have now 3 computer(s)
May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:51 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:19:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminating Kubernetes instance for slave ab1cc1126bf11
May 18, 2016 4:19:51 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud connect
Building connection to Kubernetes host Openshift URL https://openshift.default.svc.cluster.local
May 18, 2016 4:19:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Terminated Kubernetes instance for slave ab1cc1126bf11
May 18, 2016 4:19:51 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
Disconnected computer ab1cc1126bf11
May 18, 2016 4:19:51 PM FINE hudson.slaves.ChannelPinger$2 onClosed
Terminating ping thread for ab1cc1126bf11
May 18, 2016 4:20:01 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:01 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:01 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:11 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:11 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:11 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:21 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:21 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:21 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:31 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:31 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:31 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
May 18, 2016 4:20:41 PM FINER hudson.slaves.NodeProvisioner$2 run
Queue length 0 is less than the available capacity 0. No provisioning strategy required
Clear

To reproduce execute the following pipeline script on a dynamic node with kubernetes plugin:

node {
    try {
       sh 'whoami'
    } finally {
       sh 'pwd'
       sh 'echo $HOME'
    }
}

whoami will fail causing the problem.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

JENKINS-35246_2.png
48 kB
2017-04-17 12:41
JENKINS-35246.png
22 kB
2017-04-17 11:36
suspended_pods.png
36 kB
2016-11-10 21:04

is duplicated by

JENKINS-46444 Slave Node is Left in "offline" while Slave Pod is terminated

Closed

relates to

JENKINS-49707 Auto retry for elastic agents after channel closure

Resolved

JENKINS-57675 Pipeline steps running forever when executor fails

Resolved

Alvaro Lobato created issue - 2016-05-31 20:50

Sebastien Vas added a comment - 2016-07-06 18:59

I am experiencing the same behavior. It is kind of frustrating to have to manually delete slaves. Restarting Jenkins does not resolve the issue. I am worried that the suspended slaves are still accounted for, or at least creating a slow memory leak.

Sebastien Vas added a comment - 2016-07-06 18:59 I am experiencing the same behavior. It is kind of frustrating to have to manually delete slaves. Restarting Jenkins does not resolve the issue. I am worried that the suspended slaves are still accounted for, or at least creating a slow memory leak.

R. Tyler Croy made changes - 2016-07-25 23:52

Workflow

Original: JNJira [ 171484 ]

New: JNJira + In-Review [ 184324 ]

Carlos Sanchez added a comment - 2016-08-21 18:13

can't reproduce in GCE
A kubectl describe and kubectl logs of the pod that stays running would help

netsabes are you using OpenShift too?

Carlos Sanchez added a comment - 2016-08-21 18:13 can't reproduce in GCE A kubectl describe and kubectl logs of the pod that stays running would help netsabes are you using OpenShift too?

Sebastien Vas added a comment - 2016-09-14 18:21

Hi Carlos. Sorry for the late response I was out of the country. the slaves are actually gone from the kubernetes cluster, they are just listed as offline / suspended in Jenkins and since we keep creating them, they pile up.

Sebastien Vas added a comment - 2016-09-14 18:21 Hi Carlos. Sorry for the late response I was out of the country. the slaves are actually gone from the kubernetes cluster, they are just listed as offline / suspended in Jenkins and since we keep creating them, they pile up.

Elmar Weber made changes - 2016-11-10 21:05

Attachment

New: suspended_pods.png [ 34792 ]

Elmar Weber added a comment - 2016-11-10 21:08

csanchez netsabes I have the same issue, on our side it does not seem to relate to a specific command but from what we gathered it happens when the job on the node (within a pipeline) takes less than 1-2 seconds. In that case the reference to the node does not seem to get cleaned from Jenkins. If the node actions we are scripting take longer it is cleanly removed from jenkins. It always get's removed from the kubernetes cluster when it ends in this state. Furthermore the build image we are using is rather large, around 1.5GB. On smaller ones we could not reproduce this behaviour.

According to the Jenkins logs certain checks / actions are only done once a second, clould this be related to why the proper shutdown / kill of the pod is not realized within the plugin?

I attached a screenshot of the situation. The log entries for all pods that end up like this follow the same pattern as below:

Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
INFO: Created Pod: cupenya-root-docker-1460271f3921
Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
INFO: Waiting for Pod to be scheduled (0/100): cupenya-root-docker-1460271f3921
--
Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminating Kubernetes instance for slave cupenya-root-docker-1460271f3921
--
Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminated Kubernetes instance for slave cupenya-root-docker-1460271f3921
Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Disconnected computer cupenya-root-docker-1460271f3921
Nov 10, 2016 12:13:50 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
WARNING: Computer.threadPoolForRemoting [#138] for cupenya-root-docker-1460271f3921 terminated
--
Nov 10, 2016 12:18:59 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
--
Nov 10, 2016 12:19:19 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
--
Nov 10, 2016 12:20:39 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
--
Nov 10, 2016 12:20:49 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
--
Nov 10, 2016 12:21:29 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding
--

You can maybe try to reproduce it with this image as it is larger, in case that is the issue:
cupenya/docker-jenkins-slave-cpy-root

https://github.com/cupenya/docker-jenkins-slave-cpy-root/blob/master/Dockerfile

I'm happy to provide more details or in case you can't reproduce this in your environment show you a live example when this happens.

Elmar Weber added a comment - 2016-11-10 21:08 csanchez netsabes I have the same issue, on our side it does not seem to relate to a specific command but from what we gathered it happens when the job on the node (within a pipeline) takes less than 1-2 seconds. In that case the reference to the node does not seem to get cleaned from Jenkins. If the node actions we are scripting take longer it is cleanly removed from jenkins. It always get's removed from the kubernetes cluster when it ends in this state. Furthermore the build image we are using is rather large, around 1.5GB. On smaller ones we could not reproduce this behaviour. According to the Jenkins logs certain checks / actions are only done once a second, clould this be related to why the proper shutdown / kill of the pod is not realized within the plugin? I attached a screenshot of the situation. The log entries for all pods that end up like this follow the same pattern as below: Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call INFO: Created Pod: cupenya-root-docker-1460271f3921 Nov 10, 2016 12:13:40 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call INFO: Waiting for Pod to be scheduled (0/100): cupenya-root-docker-1460271f3921 -- Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminating Kubernetes instance for slave cupenya-root-docker-1460271f3921 -- Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminated Kubernetes instance for slave cupenya-root-docker-1460271f3921 Nov 10, 2016 12:13:50 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Disconnected computer cupenya-root-docker-1460271f3921 Nov 10, 2016 12:13:50 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed WARNING: Computer.threadPoolForRemoting [#138] for cupenya-root-docker-1460271f3921 terminated -- Nov 10, 2016 12:18:59 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:19:19 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:20:39 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:20:49 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- Nov 10, 2016 12:21:29 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making cupenya-root-docker-1460271f3921 offline because it’s not responding -- You can maybe try to reproduce it with this image as it is larger, in case that is the issue: cupenya/docker-jenkins-slave-cpy-root https://github.com/cupenya/docker-jenkins-slave-cpy-root/blob/master/Dockerfile I'm happy to provide more details or in case you can't reproduce this in your environment show you a live example when this happens.

Elmar Weber added a comment - 2016-11-10 21:09

I forgot: They get however cleaned up after a while, I think around 12h or so automatically.

Elmar Weber added a comment - 2016-11-10 21:09 I forgot: They get however cleaned up after a while, I think around 12h or so automatically.

Carlos Sanchez added a comment - 2016-11-11 09:07

Are you saying that the Pods don't get deleted in kubernetes, the jenkins slave is not deleted in Jenkins, or both ?

Carlos Sanchez added a comment - 2016-11-11 09:07 Are you saying that the Pods don't get deleted in kubernetes, the jenkins slave is not deleted in Jenkins, or both ?

Elmar Weber added a comment - 2016-11-11 13:45

To clarify:

The pods are terminated and deleted in kubernetes
the jenkins slave reference of them in Jenkins is not deleted when they are terminated and deleted in kubernetes

Elmar Weber added a comment - 2016-11-11 13:45 To clarify: The pods are terminated and deleted in kubernetes the jenkins slave reference of them in Jenkins is not deleted when they are terminated and deleted in kubernetes

Assignee:: Vincent Latombe

Reporter:: Alvaro Lobato

Votes:: 8 Vote for this issue

Watchers:: 17 Start watching this issue

Created:: 2016-05-31 20:50

Updated:: 2021-12-22 21:20

Resolved:: 2017-09-05 14:09

Jenkins

Details

Description

Attachments

Attachments

Issue Links

Activity

Collapse comment: Sebastien Vas added a comment - 2016-07-06 18:59

Expand comment: Sebastien Vas added a comment - 2016-07-06 18:59

Collapse comment: Carlos Sanchez added a comment - 2016-08-21 18:13

Expand comment: Carlos Sanchez added a comment - 2016-08-21 18:13

Collapse comment: Sebastien Vas added a comment - 2016-09-14 18:21

Expand comment: Sebastien Vas added a comment - 2016-09-14 18:21

Collapse comment: Elmar Weber added a comment - 2016-11-10 21:08

Expand comment: Elmar Weber added a comment - 2016-11-10 21:08

Collapse comment: Elmar Weber added a comment - 2016-11-10 21:09

Expand comment: Elmar Weber added a comment - 2016-11-10 21:09

Collapse comment: Carlos Sanchez added a comment - 2016-11-11 09:07

Expand comment: Carlos Sanchez added a comment - 2016-11-11 09:07

Collapse comment: Elmar Weber added a comment - 2016-11-11 13:45

Expand comment: Elmar Weber added a comment - 2016-11-11 13:45

People

Dates