Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-29851

global instance cap not respected for spot instances

      When the build queue piles up new instances are launched. One per waiting job, it seems. I've attached a screenshot of my configuration. The instance cap per AMI doesn't have any effect either.

      I believe this may be related to spot instances not coming up with the Jenkins slave tags, and so not being counted correctly. Though the global cap should apply to the overall number of running instances, no?

          [JENKINS-29851] global instance cap not respected for spot instances

          Ryan Aslett added a comment -

          This keeps biting us. We had global instance cap of 30, and a per cloud cap of 30, as well as cc28xl caps at aws. Turns out that our aws caps only applied to on demand instances, and so did jenkins. i.e. we had 90 machines running at once.

          Ryan Aslett added a comment - This keeps biting us. We had global instance cap of 30, and a per cloud cap of 30, as well as cc28xl caps at aws. Turns out that our aws caps only applied to on demand instances, and so did jenkins. i.e. we had 90 machines running at once.

          I think this is major, since it causes money to be spent.

          Archie Brentano added a comment - I think this is major, since it causes money to be spent.

          Ryan Aslett added a comment - - edited

          I've narrowed this down to what I think are is the problem in countCurrentEC2Slaves:

          First, it is only looping through the reservations and calling describeInstances : https://github.com/jenkinsci/ec2-plugin/blob/master/src/main/java/hudson/plugins/ec2/EC2Cloud.java#L232-232

          This will only count spot requests that have been fulfilled, not pending spot requests as they have not yet become an instance.

          What is probably needed is a call to describeSpotInstanceRequests() to get the spot instances, and additionally increment n if there are pending spot requests. Either that or there needs to be an entirely separate set of checks for the maximum number of spot requests.

          We have a cron that can sometimes send as many as 100 jobs at once, and we end up with spikes of 100 machines provisioned even though we set our limit to 20. It would be ideal if we could use spot instances + limits on the number running + pending.

          Ryan Aslett added a comment - - edited I've narrowed this down to what I think are is the problem in countCurrentEC2Slaves: First, it is only looping through the reservations and calling describeInstances : https://github.com/jenkinsci/ec2-plugin/blob/master/src/main/java/hudson/plugins/ec2/EC2Cloud.java#L232-232 This will only count spot requests that have been fulfilled, not pending spot requests as they have not yet become an instance. What is probably needed is a call to describeSpotInstanceRequests() to get the spot instances, and additionally increment n if there are pending spot requests. Either that or there needs to be an entirely separate set of checks for the maximum number of spot requests. We have a cron that can sometimes send as many as 100 jobs at once, and we end up with spikes of 100 machines provisioned even though we set our limit to 20. It would be ideal if we could use spot instances + limits on the number running + pending.

          James Judd added a comment -

          We have run into this as well. Probably won't have time to look into this before January.

          James Judd added a comment - We have run into this as well. Probably won't have time to look into this before January.

          Francis Upton added a comment -

          s/b resolved in 1.30

          Francis Upton added a comment - s/b resolved in 1.30

          Code changed in jenkins
          User: Francis Upton IV
          Path:
          src/main/java/hudson/plugins/ec2/EC2Cloud.java
          src/main/java/hudson/plugins/ec2/EC2SpotSlave.java
          src/main/java/hudson/plugins/ec2/SlaveTemplate.java
          http://jenkins-ci.org/commit/ec2-plugin/f85b37b8f3ace611191c8b3507d03dcbda999a55
          Log:
          JENKINS-29851 Global instance cap not calculated for spot instances correctly
          JENKINS-32397 Spot instance AMI counting has problems
          JENKINS-32398 Remove spot instance nodes when requests are canceled

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Francis Upton IV Path: src/main/java/hudson/plugins/ec2/EC2Cloud.java src/main/java/hudson/plugins/ec2/EC2SpotSlave.java src/main/java/hudson/plugins/ec2/SlaveTemplate.java http://jenkins-ci.org/commit/ec2-plugin/f85b37b8f3ace611191c8b3507d03dcbda999a55 Log: JENKINS-29851 Global instance cap not calculated for spot instances correctly JENKINS-32397 Spot instance AMI counting has problems JENKINS-32398 Remove spot instance nodes when requests are canceled

          Code changed in jenkins
          User: Francis Upton IV
          Path:
          src/main/java/hudson/plugins/ec2/EC2Cloud.java
          http://jenkins-ci.org/commit/ec2-plugin/848fec3b9a473386173a7faabae25ef911d54b36
          Log:
          JENKINS-29851 Global instance cap not calculated for spot instances correctly (fixed NPE)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Francis Upton IV Path: src/main/java/hudson/plugins/ec2/EC2Cloud.java http://jenkins-ci.org/commit/ec2-plugin/848fec3b9a473386173a7faabae25ef911d54b36 Log: JENKINS-29851 Global instance cap not calculated for spot instances correctly (fixed NPE)

          A B added a comment -

          Our instance cap is set to 1, but seems to be completely ignored. Not uncommon for 10 instances to be launched simultaneously. Is there anything required in our EC2 environment / AMI in order for the instance cap to be respected?

          A B added a comment - Our instance cap is set to 1, but seems to be completely ignored. Not uncommon for 10 instances to be launched simultaneously. Is there anything required in our EC2 environment / AMI in order for the instance cap to be respected?

          A B added a comment -

          Our workaround is to manually launch instances before sending the jobs to Jenkins. This is a luxury we have, based on our configuration, that others may not have. But I wanted to share it anyway.

          If the instances are already idle (or in use) then the instance cap is respected and additional slaves are not initiated. It suggests to me that, when inspecting the instance cap, instances that are currently booting up are being ignored.

           

          A B added a comment - Our workaround is to manually launch instances before sending the jobs to Jenkins. This is a luxury we have, based on our configuration, that others may not have. But I wanted to share it anyway. If the instances are already idle (or in use) then the instance cap is respected and additional slaves are not initiated. It suggests to me that, when inspecting the instance cap, instances that are currently booting up are being ignored.  

            francisu Francis Upton
            arienkock Arien Kock
            Votes:
            6 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: