after some detailed investigations I am quite sure that we have an issue with counting the running slaves on the docker-plugin with multiple templates. Here's how you may reproduce it:
- Setup a Jenkins with docker-plugin
- Configure a cloud type 'docker', put the Capacity limit to a high value (let's say 50 or so)
- Configure two templates: A and B.
- Set for template A to accept label A
- Set for template B to accept label B
- Use the same image (some simple image)
- Set instance limit of template A to 5
- Set instance limit of template B to 2
- Create 10 jobs assigned to label A, implementation "sleep 60"
- Create 5 jobs assigned to label B, implementation "sleep 60"
- Start all the jobs of label A at once (you may run a small groovy script for that)
- Wait 10s
- Start all the jobs of label B at once (you may run a small groovy script for that)
What you will observe is the following:
- The jobs of label A will request new slaves up to the instance limit.
- Jobs with label B will remain in the queue. No new slaves are created
- The Jobs with label B will be executed, once all the jobs in label A have been completed (and the slaves are taken offline again)
If you look into the system log you will read the following messages:
Asked to provision 23 slave(s) for: labelB
Jul 25, 2016 4:46:52 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Will provision 'image', for label: 'labelB', in cloud: 'docker'
Jul 25, 2016 4:46:52 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud addProvisionedSlave
Not Provisioning 'labelB'. Instance limit of '2' reached on server 'docker'
Please note that during that error message, not a single slave of the second template was up and running (however, 5 of the first template were up).
Repeat the same activity with setting the instance limit of template B to 6. Repeat the same kind of load. You will observe that exactly one slave of template B will be created.
Alas: The instance limits of two different templates are not counted separately (which is what the configuration UI suggests).
Impact: Though capacity is available on the docker server, the different loads are not executed in parallel.
PS: I also tried to configure a second cloud provider (of type docker), thus separating the templates into two sections. However, this did not change the situation either: Apparently, the "instances used" are counted per URL and not per template...
Thanks for checking!