Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-36919

Docker Instance Capacity counted across templates

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • docker-plugin
    • None
    • SUSE Linux Enterprise 12 SP1
      jdk1.8.0_60
      Jenkins ver. 1.635
      apache tomcat 7.0.65
      docker-plugin 0.16.0

      Hi folks,

      after some detailed investigations I am quite sure that we have an issue with counting the running slaves on the docker-plugin with multiple templates. Here's how you may reproduce it:

      1. Setup a Jenkins with docker-plugin
      2. Configure a cloud type 'docker', put the Capacity limit to a high value (let's say 50 or so)
      3. Configure two templates: A and B.
      4. Set for template A to accept label A
      5. Set for template B to accept label B
      6. Use the same image (some simple image)
      7. Set instance limit of template A to 5
      8. Set instance limit of template B to 2
      9. Create 10 jobs assigned to label A, implementation "sleep 60"
      10. Create 5 jobs assigned to label B, implementation "sleep 60"
      11. Start all the jobs of label A at once (you may run a small groovy script for that)
      12. Wait 10s
      13. Start all the jobs of label B at once (you may run a small groovy script for that)

      What you will observe is the following:

      • The jobs of label A will request new slaves up to the instance limit.
      • Jobs with label B will remain in the queue. No new slaves are created
      • The Jobs with label B will be executed, once all the jobs in label A have been completed (and the slaves are taken offline again)

      If you look into the system log you will read the following messages:

      Asked to provision 23 slave(s) for: labelB
      Jul 25, 2016 4:46:52 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
      Will provision 'image', for label: 'labelB', in cloud: 'docker'
      Jul 25, 2016 4:46:52 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud addProvisionedSlave
      Not Provisioning 'labelB'. Instance limit of '2' reached on server 'docker'

      Please note that during that error message, not a single slave of the second template was up and running (however, 5 of the first template were up).

      Repeat the same activity with setting the instance limit of template B to 6. Repeat the same kind of load. You will observe that exactly one slave of template B will be created.

      Alas: The instance limits of two different templates are not counted separately (which is what the configuration UI suggests).

      Impact: Though capacity is available on the docker server, the different loads are not executed in parallel.

      PS: I also tried to configure a second cloud provider (of type docker), thus separating the templates into two sections. However, this did not change the situation either: Apparently, the "instances used" are counted per URL and not per template...

      Thanks for checking!

          [JENKINS-36919] Docker Instance Capacity counted across templates

          Nico Schmoigl added a comment -

          it might be helpful to set the strategy to "Docker Cloud Rretention Strategy" with an idle timeout of 1 min. With that it might be easier to see (for example to separate the two different types of templates by settings the "# of executors" to 2)

          Nico Schmoigl added a comment - it might be helpful to set the strategy to "Docker Cloud Rretention Strategy" with an idle timeout of 1 min. With that it might be easier to see (for example to separate the two different types of templates by settings the "# of executors" to 2)

          Nico Schmoigl added a comment -

          Setting up a second jenkins server + association the same docker host as the first one to it reveils another interesting aspect to this issue.
          Scenario:

          • Configure the second instance to make use of the same image as the first Jenkins server (label A)..Set the Instance Capacity to 5.
          • On the first jenkins start the jobs with label A, creating 5 instances of the image on the docker server (instance capacity accordingly set).
          • On the second Jenkins start the dummy jobs for label A. Observe that no slave is started!
          • Wait until the jobs on the first servers are finished; observe that now the second Jenkins server may start instances.

          This means that the instance capacity not just is "across templates", but bound only to the image name – even across Jenkins servers (Apparently the docker server is queried, asking how many instances of a certain image is running - not cross-checked with the list of instances the local Jenkins server had started before).

          Nico Schmoigl added a comment - Setting up a second jenkins server + association the same docker host as the first one to it reveils another interesting aspect to this issue. Scenario: Configure the second instance to make use of the same image as the first Jenkins server (label A)..Set the Instance Capacity to 5. On the first jenkins start the jobs with label A, creating 5 instances of the image on the docker server (instance capacity accordingly set). On the second Jenkins start the dummy jobs for label A. Observe that no slave is started! Wait until the jobs on the first servers are finished; observe that now the second Jenkins server may start instances. This means that the instance capacity not just is "across templates", but bound only to the image name – even across Jenkins servers (Apparently the docker server is queried, asking how many instances of a certain image is running - not cross-checked with the list of instances the local Jenkins server had started before).

          Nico Schmoigl added a comment -

          Here's an attempt of a workaround:

          • create a second tag (with another name) for the same image (i.e. two tags sharing the same hash key of the image)
          • use that "second name" for the same image in the template of the docker-plugin configuration

          Instance capacity then is - at least - considered per template (but still could be cannibalized by other clients running containers using the same images)

          Nico Schmoigl added a comment - Here's an attempt of a workaround: create a second tag (with another name) for the same image (i.e. two tags sharing the same hash key of the image) use that "second name" for the same image in the template of the docker-plugin configuration Instance capacity then is - at least - considered per template (but still could be cannibalized by other clients running containers using the same images)

          Nico Schmoigl added a comment - - edited

          Here's a variant of the same issue mentioned above, which should be simpler to reproduce:

          • Configure a Cloud Docker host, Container cap high enough that it will not be considered (i.e. 10 or so).
          • Configure a template with an image that Jenkins can start, instance limit: 1, assign a label to it (for example: "docker")
          • Create a job with some dummy build step, associate it to the label.
          • Build the job and observe that it builds fine.
          • Go to your command line and start a container of the same image. Keep the container running.
          • Build the same job again. Observe that no container will be provisioned, but the following message can be read in the system log:

          Jul 30, 2016 9:10:30 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
          Will provision 'jenkins-1', for label: 'docker', in cloud: 'dummy'
          Jul 30, 2016 9:10:30 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud addProvisionedSlave
          Not Provisioning 'jenkins-1'. Instance limit of '1' reached on server 'dummy'

          (here in this case, the image of the configuration was "jenkins-1", the label was "docker" and the name of the cloud I configured was "dummy").
          Apparently, the already running container is considered to be executed by the Jenkins, which is not the case. Thus, all containers running based on a certain image is counting into the "instance limit".

          Why is this a problem to my mind? If two Jenkins servers share the same Docker host, they may not make use of the same image, as execution then is no longer predictable. Thus, "image sharing" is not possible (we may now start to argue, whether this is a bug or a feature – in my case it's a devastating effect, as I will be using a Docker Swarm cluster with up to 10 Jenkins servers attached...)

          Nico Schmoigl added a comment - - edited Here's a variant of the same issue mentioned above, which should be simpler to reproduce: Configure a Cloud Docker host, Container cap high enough that it will not be considered (i.e. 10 or so). Configure a template with an image that Jenkins can start, instance limit: 1, assign a label to it (for example: "docker") Create a job with some dummy build step, associate it to the label. Build the job and observe that it builds fine. Go to your command line and start a container of the same image. Keep the container running. Build the same job again. Observe that no container will be provisioned, but the following message can be read in the system log: Jul 30, 2016 9:10:30 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision Will provision 'jenkins-1', for label: 'docker', in cloud: 'dummy' Jul 30, 2016 9:10:30 PM INFO com.nirima.jenkins.plugins.docker.DockerCloud addProvisionedSlave Not Provisioning 'jenkins-1'. Instance limit of '1' reached on server 'dummy' (here in this case, the image of the configuration was "jenkins-1", the label was "docker" and the name of the cloud I configured was "dummy"). Apparently, the already running container is considered to be executed by the Jenkins, which is not the case. Thus, all containers running based on a certain image is counting into the "instance limit". Why is this a problem to my mind? If two Jenkins servers share the same Docker host, they may not make use of the same image, as execution then is no longer predictable. Thus, "image sharing" is not possible (we may now start to argue, whether this is a bug or a feature – in my case it's a devastating effect, as I will be using a Docker Swarm cluster with up to 10 Jenkins servers attached...)

          Nico Schmoigl added a comment -

          root cause of it all seems to be at com.nirima.jenkins.plugins.docker.DockerCloud.countCurrentDockerSlaves()

          Nico Schmoigl added a comment - root cause of it all seems to be at com.nirima.jenkins.plugins.docker.DockerCloud.countCurrentDockerSlaves()

          Nico Schmoigl added a comment -

          For a proposal how to fix this issue, see also https://github.com/jenkinsci/docker-plugin/pull/409

          Nico Schmoigl added a comment - For a proposal how to fix this issue, see also https://github.com/jenkinsci/docker-plugin/pull/409

          countCurrentDockerSlaves is based on identifying container based on image running, which was the sole option when docker was 0.x and this plugin has been created, but in 2017 is well better implemented using labels. 

          Nicolas De Loof added a comment - countCurrentDockerSlaves is based on identifying container based on image running, which was the sole option when docker was 0.x and this plugin has been created, but in 2017 is well better implemented using labels. 

          unfortunately docker-java implementation for label filters (com.github.dockerjava.core.util.FiltersEncoder) relies on JacksonJaxbJsonProvider and here comes dependency hell :'(

           

          Nicolas De Loof added a comment - unfortunately docker-java implementation for label filters (com.github.dockerjava.core.util.FiltersEncoder) relies on JacksonJaxbJsonProvider and here comes dependency hell :'(  

          Nico Schmoigl added a comment -

          What about the (ugly) idea to fetch all containers, and filtering the containers then manually by its labels? Or would retrieving of all containers each time be too costly...?

          NB PR #409  was closed (and rejected) by you some time back... Should we revisit the suggestion there once more?

           

          Nico Schmoigl added a comment - What about the (ugly) idea to fetch all containers, and filtering the containers then manually by its labels? Or would retrieving of all containers each time be too costly...? NB PR #409  was closed (and rejected) by you some time back... Should we revisit the suggestion there once more?  

          pjdarton added a comment -

          FYI: I coded a workaround to the FiltersEncoder issue mentioned above, albeit by using a local definition of that code in the plugin. This is basically a bug in docker-java, and it's not limited to just that class, but I'm not convinced it's worthwhile trying to fix docker-java.

          I spent a bit of time looking into this myself (but for other reasons).
          The cause of the problem is that the plugin is using the template's image name as a means of identifying the template, but the image name is not unique.
          What we need is to give each template its own unique name.
          We could then set a label on each container we start specifying the "template name" so that we can recognize them later.
          Lastly, we change the `countCurrentDockerSlaves()` code to use the "template name" instead of "image name".

          Note: Ideally, we'd also name the docker slaves (both in Jenkins and in docker) after that name too, instead of calling them all "docker-.............." - I have multiple docker hosts and templates that I'd like to differentiate by name, which is why I started looking into this.

          pjdarton added a comment - FYI: I coded a workaround to the FiltersEncoder issue mentioned above , albeit by using a local definition of that code in the plugin. This is basically a bug in docker-java, and it's not limited to just that class, but I'm not convinced it's worthwhile trying to fix docker-java. I spent a bit of time looking into this myself (but for other reasons). The cause of the problem is that the plugin is using the template's image name as a means of identifying the template, but the image name is not unique. What we need is to give each template its own unique name. We could then set a label on each container we start specifying the "template name" so that we can recognize them later. Lastly, we change the `countCurrentDockerSlaves()` code to use the "template name" instead of "image name". Note: Ideally, we'd also name the docker slaves (both in Jenkins and in docker) after that name too, instead of calling them all "docker-.............." - I have multiple docker hosts and templates that I'd like to differentiate by name, which is why I started looking into this.

            Unassigned Unassigned
            eagle_rainbow Nico Schmoigl
            Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: