- 
    
Bug
 - 
    Resolution: Unresolved
 - 
    
Minor
 - 
    None
 
When hitting Kubernetes resource quotas limit (for example a pod limit), Jenkins nodes are created and then removed over and over after each queue cycle:
- Node is created
 - Launcher tries to launch the pod and fail with
 - Node is removed
 
If the queue has a lot of items, this can slows down the queue maintenance thread and the start of build executions considerably. As each node operation requires a queue lock.
Kubernetes Plugin should maybe better adapt to the kubernetes limits to avoid this behavior.
Evidence
In case of a resource quota with pod limit, the following exception would happen at every pod creation failure:
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: <KUBERNETES_URL>/api/v1/namespaces/<NAMESPACE>/pods. Message: pods "<AGENTS_NAME>" is forbidden: exceeded quota: pod-limit, requested: pods=1, used: pods=300, limited: pods=300. 
Typically you'd see many threads removing nodes but waiting on the queue lock:
at hudson.model.Queue._withLock(Queue.java:1408) at hudson.model.Queue.withLock(Queue.java:1284) at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238) at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1711) at jenkins.model.Nodes.removeNode(Nodes.java:297) at jenkins.model.Jenkins.removeNode(Jenkins.java:2277) at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:91) at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:285)
And dependeing on the load (queue size and number of nodes), executors that try to execute queued tasks are also stuck on the queue lock:
"Executor #0 for <agentName> : executing <jobFullName> #<buildNumber>" ... waiting on condition  [0x00007efd152c3000]
    [...]
	at hudson.model.Queue._withLock(Queue.java:1408)
	at hudson.model.ResourceController.execute(ResourceController.java:104)
	at hudson.model.Executor.run(Executor.java:443)
or:
"Executor #0 for <otherAgentName>" .... waiting on condition  [0x00007efcd4201000]
   [...]
	at hudson.model.Queue._withLock(Queue.java:1469)
	at hudson.model.Queue.withLock(Queue.java:1327)
	at hudson.model.Executor.run(Executor.java:353)
Workaround
A workaround is to reflect the limit on the Kubernetes Cloud configuration.
- links to