-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Jenkins 2.93
Docker Plugin 1.1.1
Containers are using JNLP
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc)
When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens:
- Node is allocated immediately
- Node is not allocated and jenkins logs indicate why (eg: swarm is full as per my configuration for maximums in the Jenkins configuration)
- Node is allocated with a significant delay (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated.
- Node is allocated with a ridiculous delay (I just had one take 77 minutes). Logs do not indicate any activity from the Docker plugin until it is allocated. Other jobs have gotten containers allocated since (and those events are in the logs). An interesting thing I noticed is that the job sometimes gets its container only once a later build of this job requests one (they run in parallel), and then the later build waits (forever?).
How can I troubleshoot this behavior, especially #4?
Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x to 1.1.x upgrade (possibly also Jenkins 2.92>2.93 upgrade)
In fact, I have two Jenkins instances, one upgraded to plugin 1.1.1 and the other on 1.1, and the one running 1.1 is currently not exhibiting these issues (but it's also under less load)
[JENKINS-48490] Intermittently slow docker provisioning with no errors
Description |
Original:
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc) When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens: # Node is allocated *immediately* # Node is not allocated and jenkins *logs indicate why* (eg: swarm is full as per my configuration for maximums in the Jenkins configuration) # Node is allocated with a *significant delay* (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated. # Node is allocated with a *ridiculous delay* (I have a job waiting for a node now for 30 minutes). Logs do not indicate any activity from the Docker plugin. It will (probably) eventually get a node. Other jobs have gotten containers allocated since (and those events are in the logs). How can I troubleshoot this behavior, especially #4? Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x -> 1.1.x upgrade (possibly also Jenkins 2.92->2.93 upgrade) |
New:
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc) When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens: # Node is allocated *immediately* # Node is not allocated and jenkins *logs indicate why* (eg: swarm is full as per my configuration for maximums in the Jenkins configuration) # Node is allocated with a *significant delay* (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated. # Node is allocated with a *ridiculous delay* (I have a job waiting for a node now for 30 minutes). Logs do not indicate any activity from the Docker plugin. It will (probably) eventually get a node. Other jobs have gotten containers allocated since (and those events are in the logs). How can I troubleshoot this behavior, especially #4? Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x to 1.1.x upgrade (possibly also Jenkins 2.92>2.93 upgrade) |
Description |
Original:
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc) When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens: # Node is allocated *immediately* # Node is not allocated and jenkins *logs indicate why* (eg: swarm is full as per my configuration for maximums in the Jenkins configuration) # Node is allocated with a *significant delay* (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated. # Node is allocated with a *ridiculous delay* (I have a job waiting for a node now for 30 minutes). Logs do not indicate any activity from the Docker plugin. It will (probably) eventually get a node. Other jobs have gotten containers allocated since (and those events are in the logs). How can I troubleshoot this behavior, especially #4? Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x to 1.1.x upgrade (possibly also Jenkins 2.92>2.93 upgrade) |
New:
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc) When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens: # Node is allocated *immediately* # Node is not allocated and jenkins *logs indicate why* (eg: swarm is full as per my configuration for maximums in the Jenkins configuration) # Node is allocated with a *significant delay* (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated. # Node is allocated with a *ridiculous delay* (I have a job waiting for a node now for 60 minutes). Logs do not indicate any activity from the Docker plugin. It will (probably) eventually get a node. Other jobs have gotten containers allocated since (and those events are in the logs). How can I troubleshoot this behavior, especially #4? Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x to 1.1.x upgrade (possibly also Jenkins 2.92>2.93 upgrade) In fact, I have two Jenkins instances, one upgraded to plugin 1.1.1 and the other on 1.1, and the one running 1.1 is currently not exhibiting these issues (but it's also under less load) |
Description |
Original:
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc) When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens: # Node is allocated *immediately* # Node is not allocated and jenkins *logs indicate why* (eg: swarm is full as per my configuration for maximums in the Jenkins configuration) # Node is allocated with a *significant delay* (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated. # Node is allocated with a *ridiculous delay* (I have a job waiting for a node now for 60 minutes). Logs do not indicate any activity from the Docker plugin. It will (probably) eventually get a node. Other jobs have gotten containers allocated since (and those events are in the logs). How can I troubleshoot this behavior, especially #4? Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x to 1.1.x upgrade (possibly also Jenkins 2.92>2.93 upgrade) In fact, I have two Jenkins instances, one upgraded to plugin 1.1.1 and the other on 1.1, and the one running 1.1 is currently not exhibiting these issues (but it's also under less load) |
New:
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc) When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens: # Node is allocated *immediately* # Node is not allocated and jenkins *logs indicate why* (eg: swarm is full as per my configuration for maximums in the Jenkins configuration) # Node is allocated with a *significant delay* (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated. # Node is allocated with a *ridiculous delay* (I just had one take 77 minutes). Logs do not indicate any activity from the Docker plugin. It will (probably) eventually get a node. Other jobs have gotten containers allocated since (and those events are in the logs). An interesting thing I noticed is that the job sometimes gets its container only once a later build of this job requests one (they run in parallel). How can I troubleshoot this behavior, especially #4? Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x to 1.1.x upgrade (possibly also Jenkins 2.92>2.93 upgrade) In fact, I have two Jenkins instances, one upgraded to plugin 1.1.1 and the other on 1.1, and the one running 1.1 is currently not exhibiting these issues (but it's also under less load) |
Description |
Original:
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc) When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens: # Node is allocated *immediately* # Node is not allocated and jenkins *logs indicate why* (eg: swarm is full as per my configuration for maximums in the Jenkins configuration) # Node is allocated with a *significant delay* (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated. # Node is allocated with a *ridiculous delay* (I just had one take 77 minutes). Logs do not indicate any activity from the Docker plugin. It will (probably) eventually get a node. Other jobs have gotten containers allocated since (and those events are in the logs). An interesting thing I noticed is that the job sometimes gets its container only once a later build of this job requests one (they run in parallel). How can I troubleshoot this behavior, especially #4? Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x to 1.1.x upgrade (possibly also Jenkins 2.92>2.93 upgrade) In fact, I have two Jenkins instances, one upgraded to plugin 1.1.1 and the other on 1.1, and the one running 1.1 is currently not exhibiting these issues (but it's also under less load) |
New:
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc) When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens: # Node is allocated *immediately* # Node is not allocated and jenkins *logs indicate why* (eg: swarm is full as per my configuration for maximums in the Jenkins configuration) # Node is allocated with a *significant delay* (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated. # Node is allocated with a *ridiculous delay* (I just had one take 77 minutes). Logs do not indicate any activity from the Docker plugin. It will (probably) eventually get a node. Other jobs have gotten containers allocated since (and those events are in the logs). An interesting thing I noticed is that the job sometimes gets its container only once a later build of this job requests one (they run in parallel), and then the later build waits (forever?). How can I troubleshoot this behavior, especially #4? Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x to 1.1.x upgrade (possibly also Jenkins 2.92>2.93 upgrade) In fact, I have two Jenkins instances, one upgraded to plugin 1.1.1 and the other on 1.1, and the one running 1.1 is currently not exhibiting these issues (but it's also under less load) |
Description |
Original:
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc) When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens: # Node is allocated *immediately* # Node is not allocated and jenkins *logs indicate why* (eg: swarm is full as per my configuration for maximums in the Jenkins configuration) # Node is allocated with a *significant delay* (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated. # Node is allocated with a *ridiculous delay* (I just had one take 77 minutes). Logs do not indicate any activity from the Docker plugin. It will (probably) eventually get a node. Other jobs have gotten containers allocated since (and those events are in the logs). An interesting thing I noticed is that the job sometimes gets its container only once a later build of this job requests one (they run in parallel), and then the later build waits (forever?). How can I troubleshoot this behavior, especially #4? Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x to 1.1.x upgrade (possibly also Jenkins 2.92>2.93 upgrade) In fact, I have two Jenkins instances, one upgraded to plugin 1.1.1 and the other on 1.1, and the one running 1.1 is currently not exhibiting these issues (but it's also under less load) |
New:
I have a large Docker swarm (old style docker swarm API in a container). There is plenty of capacity (multi-TB of RAM, etc) When jobs (multibranch pipeline job in this case) allocate a docker node (by labels), one of these things happens: # Node is allocated *immediately* # Node is not allocated and jenkins *logs indicate why* (eg: swarm is full as per my configuration for maximums in the Jenkins configuration) # Node is allocated with a *significant delay* (minutes). Logs do not indicate why, there is no Docker Plugin log activity until the node is allocated. # Node is allocated with a *ridiculous delay* (I just had one take 77 minutes). Logs do not indicate any activity from the Docker plugin until it is allocated. Other jobs have gotten containers allocated since (and those events are in the logs). An interesting thing I noticed is that the job sometimes gets its container only once a later build of this job requests one (they run in parallel), and then the later build waits (forever?). How can I troubleshoot this behavior, especially #4? Because it is intermittent, I can't be sure, but it seems as if it has gotten worse after the Docker Plugin 1.0.x to 1.1.x upgrade (possibly also Jenkins 2.92>2.93 upgrade) In fact, I have two Jenkins instances, one upgraded to plugin 1.1.1 and the other on 1.1, and the one running 1.1 is currently not exhibiting these issues (but it's also under less load) |
Node allocation is controlled by NodeProvisioner, which is a terrible beast. I never have been able to fully understand how it works and how to tweak it for consistent results.
I'd like to get docker-plugin rely on One-Shot-Executor so we can get rid of this stuff, but this is a long terms effort.