Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55618

When using spot instances, requests 3-4 nodes when one is needed

    • Icon: New Feature New Feature
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • ec2-plugin

      When it is time to request a worker node, more than one can be requested.

      It requests one, then rapidly checks for if its up (node needs a minute or so to come up when using spot instances), and fires another one, then another – until one of spots is finally there on time. This one becomes Jenkins slave, rest linger there until Spot Marketplace kills it.

      Seems to be 1.42 specific: I reverted to version 1.41 and it seems to work fine.

      This is how it looks in Jenkins log file:
      (notice really short delays, only sir-bd78adpq gets into Jenkins slave list)

      Jan 15, 2019 2:17:40 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Spot instance id in provision: sir-dvfib1pn
      Jan 15, 2019 2:17:40 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate

      {ami='ami-XXXXX', labels=''}. Attempting provision finished, excess workload: 0
      Jan 15, 2019 2:17:40 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: We have now 1 computers, waiting for 1 more
      Jan 15, 2019 2:17:40 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      INFO: Started provisioning current_ami (ami-XXXXX) from ec2-ec2 cloud with 1 executors. Remaining excess workload: 0
      Jan 15, 2019 2:17:40 PM hudson.plugins.ec2.EC2Cloud$1 call
      WARNING: SlaveTemplate{ami='ami-XXXXX', labels=''}

      . Node terminated is neither pending, neither running, its

      {2}. Terminate provisioning
      Jan 15, 2019 2:17:48 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate{ami='ami-XXXXX', labels=''}. Attempting to provision slave needed by excess workload of 1 units
      Jan 15, 2019 2:17:49 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Launching ami-XXXXX for template current_ami

      Jan 15, 2019 2:17:50 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Spot instance id in provision: sir-bieg943m
      Jan 15, 2019 2:17:50 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate{ami='ami-XXXXX', labels=''}. Attempting provision finished, excess workload: 0
      Jan 15, 2019 2:17:50 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: We have now 1 computers, waiting for 1 more
      Jan 15, 2019 2:17:50 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      INFO: Started provisioning current_ami (ami-XXXXX) from ec2-ec2 cloud with 1 executors. Remaining excess workload: 0
      Jan 15, 2019 2:17:50 PM hudson.plugins.ec2.EC2Cloud$1 call
      WARNING: SlaveTemplate{ami='ami-XXXXX', labels=''}. Node terminated is neither pending, neither running, its {2}

      . Terminate provisioning
      Jan 15, 2019 2:17:58 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate

      {ami='ami-XXXXX', labels=''}. Attempting to provision slave needed by excess workload of 1 units

      Jan 15, 2019 2:17:59 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Launching ami-XXXXX for template current_ami

      Jan 15, 2019 2:18:00 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Spot instance id in provision: sir-5mtr9erp
      Jan 15, 2019 2:18:00 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate{ami='ami-XXXXX', labels=''}

      . Attempting provision finished, excess workload: 0
      Jan 15, 2019 2:18:00 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: We have now 1 computers, waiting for 1 more
      Jan 15, 2019 2:18:00 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      INFO: Started provisioning current_ami (ami-XXXXX) from ec2-ec2 cloud with 1 executors. Remaining excess workload: 0
      Jan 15, 2019 2:18:00 PM hudson.plugins.ec2.EC2Cloud$1 call
      WARNING: SlaveTemplate

      {ami='ami-XXXXX', labels=''}. Node terminated is neither pending, neither running, its {2}. Terminate provisioning
      Jan 15, 2019 2:18:08 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate{ami='ami-XXXXX', labels=''}

      . Attempting to provision slave needed by excess workload of 1 units
      Jan 15, 2019 2:18:09 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Launching ami-XXXXX for template current_ami
      Jan 15, 2019 2:18:10 PM hudson.plugins.ec2.SlaveTemplate provisionSpot

      INFO: Spot instance id in provision: sir-bd78adpq
      Jan 15, 2019 2:18:10 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate

      {ami='ami-XXXXX', labels=''}

      . Attempting provision finished, excess workload: 0
      Jan 15, 2019 2:18:10 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: We have now 1 computers, waiting for 1 more
      Jan 15, 2019 2:18:10 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      INFO: Started provisioning current_ami (ami-XXXXX) from ec2-ec2 cloud with 1 executors. Remaining excess workload: 0
      Jan 15, 2019 2:18:21 PM hudson.plugins.ec2.EC2Cloud$1 call

          [JENKINS-55618] When using spot instances, requests 3-4 nodes when one is needed

          Vladislav Naumov created issue -
          Sylvain LEBEAUPIN made changes -
          Attachment New: jenkins.ec2.142.txt [ 45741 ]

          I have exactly the same issue with the version 1.42

          In my settings, I set only 1 executor (advanced section) of a spot instance.

          Work fine with the previous version 1.40.1.

          Since the update to 1.42:

          • I view only 1 slave (good);
          • several spot instances are started (~4-5 argh);
          • jenkins logs display several provisioning every 10s (jenkins.ec2.142.txt)

          Sylvain LEBEAUPIN added a comment - I have exactly the same issue with the version 1.42 In my settings, I set only 1 executor (advanced section) of a spot instance. Work fine with the previous version 1.40.1. Since the update to 1.42: I view only 1 slave (good); several spot instances are started (~4-5 argh); jenkins logs display several provisioning every 10s (jenkins.ec2.142.txt)

          we are facing this issue also

          Alistair Gilbert added a comment - we are facing this issue also

          Also facing the same issue; this is quite a problem as the entire benefit of spot instances is reduced user cost.

          Harry McLaughlin added a comment - Also facing the same issue; this is quite a problem as the entire benefit of spot instances is reduced user cost.
          Rinat Khairullin made changes -
          Priority Original: Major [ 3 ] New: Blocker [ 1 ]

          We are experiencing the same issue, totally unable to use spot instances as plugin creates huge amount of instances that is only limited by cap value.

          Rinat Khairullin added a comment - We are experiencing the same issue, totally unable to use spot instances as plugin creates huge amount of instances that is only limited by cap value.

          Seems to be fixed in 1.43

          Vladislav Naumov added a comment - Seems to be fixed in 1.43

          vnaum, I used version 1.43 but still I am getting the same issue.

          can anyone has any clue how to resolve this issue .Please help.

          Shubham Mishra added a comment - vnaum , I used version 1.43 but still I am getting the same issue. can anyone has any clue how to resolve this issue .Please help.

          The existing algorithm of ec2-plugin to raise nodes  is quite reactive, and is not waiting that the node is online, for some historical reason to follow as much as possible the peak (and the fact that the linux node is payed by seconds )

          To fix your problem we need to implement a new algorithm or option of waiting online nodes, that will be not possible before the 1.46/7 we don't have enough capacity for that at the moment.

          One question how long takes your node to be ready for use ? 

          FABRIZIO MANFREDI added a comment - The existing algorithm of ec2-plugin to raise nodes  is quite reactive, and is not waiting that the node is online, for some historical reason to follow as much as possible the peak (and the fact that the linux node is payed by seconds ) To fix your problem we need to implement a new algorithm or option of waiting online nodes, that will be not possible before the 1.46/7 we don't have enough capacity for that at the moment. One question how long takes your node to be ready for use ? 

            thoulen FABRIZIO MANFREDI
            vnaum Vladislav Naumov
            Votes:
            13 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: