Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55618

When using spot instances, requests 3-4 nodes when one is needed

    • Icon: New Feature New Feature
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • ec2-plugin

      When it is time to request a worker node, more than one can be requested.

      It requests one, then rapidly checks for if its up (node needs a minute or so to come up when using spot instances), and fires another one, then another – until one of spots is finally there on time. This one becomes Jenkins slave, rest linger there until Spot Marketplace kills it.

      Seems to be 1.42 specific: I reverted to version 1.41 and it seems to work fine.

      This is how it looks in Jenkins log file:
      (notice really short delays, only sir-bd78adpq gets into Jenkins slave list)

      Jan 15, 2019 2:17:40 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Spot instance id in provision: sir-dvfib1pn
      Jan 15, 2019 2:17:40 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate

      {ami='ami-XXXXX', labels=''}. Attempting provision finished, excess workload: 0
      Jan 15, 2019 2:17:40 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: We have now 1 computers, waiting for 1 more
      Jan 15, 2019 2:17:40 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      INFO: Started provisioning current_ami (ami-XXXXX) from ec2-ec2 cloud with 1 executors. Remaining excess workload: 0
      Jan 15, 2019 2:17:40 PM hudson.plugins.ec2.EC2Cloud$1 call
      WARNING: SlaveTemplate{ami='ami-XXXXX', labels=''}

      . Node terminated is neither pending, neither running, its

      {2}. Terminate provisioning
      Jan 15, 2019 2:17:48 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate{ami='ami-XXXXX', labels=''}. Attempting to provision slave needed by excess workload of 1 units
      Jan 15, 2019 2:17:49 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Launching ami-XXXXX for template current_ami

      Jan 15, 2019 2:17:50 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Spot instance id in provision: sir-bieg943m
      Jan 15, 2019 2:17:50 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate{ami='ami-XXXXX', labels=''}. Attempting provision finished, excess workload: 0
      Jan 15, 2019 2:17:50 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: We have now 1 computers, waiting for 1 more
      Jan 15, 2019 2:17:50 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      INFO: Started provisioning current_ami (ami-XXXXX) from ec2-ec2 cloud with 1 executors. Remaining excess workload: 0
      Jan 15, 2019 2:17:50 PM hudson.plugins.ec2.EC2Cloud$1 call
      WARNING: SlaveTemplate{ami='ami-XXXXX', labels=''}. Node terminated is neither pending, neither running, its {2}

      . Terminate provisioning
      Jan 15, 2019 2:17:58 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate

      {ami='ami-XXXXX', labels=''}. Attempting to provision slave needed by excess workload of 1 units

      Jan 15, 2019 2:17:59 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Launching ami-XXXXX for template current_ami

      Jan 15, 2019 2:18:00 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Spot instance id in provision: sir-5mtr9erp
      Jan 15, 2019 2:18:00 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate{ami='ami-XXXXX', labels=''}

      . Attempting provision finished, excess workload: 0
      Jan 15, 2019 2:18:00 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: We have now 1 computers, waiting for 1 more
      Jan 15, 2019 2:18:00 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      INFO: Started provisioning current_ami (ami-XXXXX) from ec2-ec2 cloud with 1 executors. Remaining excess workload: 0
      Jan 15, 2019 2:18:00 PM hudson.plugins.ec2.EC2Cloud$1 call
      WARNING: SlaveTemplate

      {ami='ami-XXXXX', labels=''}. Node terminated is neither pending, neither running, its {2}. Terminate provisioning
      Jan 15, 2019 2:18:08 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate{ami='ami-XXXXX', labels=''}

      . Attempting to provision slave needed by excess workload of 1 units
      Jan 15, 2019 2:18:09 PM hudson.plugins.ec2.SlaveTemplate provisionSpot
      INFO: Launching ami-XXXXX for template current_ami
      Jan 15, 2019 2:18:10 PM hudson.plugins.ec2.SlaveTemplate provisionSpot

      INFO: Spot instance id in provision: sir-bd78adpq
      Jan 15, 2019 2:18:10 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: SlaveTemplate

      {ami='ami-XXXXX', labels=''}

      . Attempting provision finished, excess workload: 0
      Jan 15, 2019 2:18:10 PM hudson.plugins.ec2.EC2Cloud provision
      INFO: We have now 1 computers, waiting for 1 more
      Jan 15, 2019 2:18:10 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
      INFO: Started provisioning current_ami (ami-XXXXX) from ec2-ec2 cloud with 1 executors. Remaining excess workload: 0
      Jan 15, 2019 2:18:21 PM hudson.plugins.ec2.EC2Cloud$1 call

          [JENKINS-55618] When using spot instances, requests 3-4 nodes when one is needed

          We are experiencing the same issue, totally unable to use spot instances as plugin creates huge amount of instances that is only limited by cap value.

          Rinat Khairullin added a comment - We are experiencing the same issue, totally unable to use spot instances as plugin creates huge amount of instances that is only limited by cap value.

          Seems to be fixed in 1.43

          Vladislav Naumov added a comment - Seems to be fixed in 1.43

          vnaum, I used version 1.43 but still I am getting the same issue.

          can anyone has any clue how to resolve this issue .Please help.

          Shubham Mishra added a comment - vnaum , I used version 1.43 but still I am getting the same issue. can anyone has any clue how to resolve this issue .Please help.

          The existing algorithm of ec2-plugin to raise nodes  is quite reactive, and is not waiting that the node is online, for some historical reason to follow as much as possible the peak (and the fact that the linux node is payed by seconds )

          To fix your problem we need to implement a new algorithm or option of waiting online nodes, that will be not possible before the 1.46/7 we don't have enough capacity for that at the moment.

          One question how long takes your node to be ready for use ? 

          FABRIZIO MANFREDI added a comment - The existing algorithm of ec2-plugin to raise nodes  is quite reactive, and is not waiting that the node is online, for some historical reason to follow as much as possible the peak (and the fact that the linux node is payed by seconds ) To fix your problem we need to implement a new algorithm or option of waiting online nodes, that will be not possible before the 1.46/7 we don't have enough capacity for that at the moment. One question how long takes your node to be ready for use ? 

          My Node is already in active state.but not sure why slave spot instance are not created.Here is the log error for the same.

           

          SlaveTemplate{ami='ami-XXXXXXX', labels='SPOT_TEST1'}. Attempting to provision slave needed by excess workload of 1 units

          May 28, 2019 4:10:40 PM INFO hudson.plugins.ec2.SlaveTemplate provisionSpot

          Launching ami-XXXXXXX for template ci-tools_test_SPOT_TEST

          May 28, 2019 4:10:41 PM INFO hudson.plugins.ec2.SlaveTemplate provisionSpot

          Spot instance id in provision: sir-4cqg8y5j

          May 28, 2019 4:10:41 PM INFO hudson.plugins.ec2.EC2Cloud provision

          SlaveTemplate{ami='ami-XXXXXXX', labels='SPOT_TEST1'}. Attempting provision finished, excess workload: 0

          May 28, 2019 4:10:41 PM INFO hudson.plugins.ec2.EC2Cloud provision

          We have now 13 computers, waiting for 1 more

          May 28, 2019 4:10:41 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply

          Started provisioning EC2 (AWS) - ci-tools_test_SPOT_TEST from ec2-AWS with 1 executors. Remaining excess workload: 0

          May 28, 2019 4:10:47 PM WARNING hudson.plugins.ec2.EC2Cloud$1 call

          SlaveTemplate{ami='ami-XXXXXXX', labels='SPOT_TEST1'}. Node shutting-down is neither pending, neither running, its {2}. Terminate provisioning

          Shubham Mishra added a comment - My Node is already in active state.but not sure why slave spot instance are not created.Here is the log error for the same.   SlaveTemplate{ami='ami-XXXXXXX', labels='SPOT_TEST1'}. Attempting to provision slave needed by excess workload of 1 units May 28, 2019 4:10:40 PM INFO hudson.plugins.ec2.SlaveTemplate provisionSpot Launching ami-XXXXXXX for template ci-tools_test_SPOT_TEST May 28, 2019 4:10:41 PM INFO hudson.plugins.ec2.SlaveTemplate provisionSpot Spot instance id in provision: sir-4cqg8y5j May 28, 2019 4:10:41 PM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami='ami-XXXXXXX', labels='SPOT_TEST1'}. Attempting provision finished, excess workload: 0 May 28, 2019 4:10:41 PM INFO hudson.plugins.ec2.EC2Cloud provision We have now 13 computers, waiting for 1 more May 28, 2019 4:10:41 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply Started provisioning EC2 (AWS) - ci-tools_test_SPOT_TEST from ec2-AWS with 1 executors. Remaining excess workload: 0 May 28, 2019 4:10:47 PM WARNING hudson.plugins.ec2.EC2Cloud$1 call SlaveTemplate{ami='ami-XXXXXXX', labels='SPOT_TEST1'}. Node shutting-down is neither pending, neither running, its {2}. Terminate provisioning

          > how long takes your node to be ready for use ?

          Up to a minute with Spot instances. It used to be much worse few years ago – it could take up to 5 minutes back then.

          Most of this time is spent in that marketplace – on-demand instance comes up 2-3x times faster from same AMI.

          Vladislav Naumov added a comment - > how long takes your node to be ready for use ? Up to a minute with Spot instances. It used to be much worse few years ago – it could take up to 5 minutes back then. Most of this time is spent in that marketplace – on-demand instance comes up 2-3x times faster from same AMI.

          Can someone help me to resolve this issue as I am stuck and not able to resolve this issue.

          vnaum, could you please help me to resolve this issue.

          Shubham Mishra added a comment - Can someone help me to resolve this issue as I am stuck and not able to resolve this issue. vnaum , could you please help me to resolve this issue.

          Hi sjvl37, I am also getting the same error in my error logs while using the ec2 plugin version 1.43.

          If I change the version to 1.40,would the spot instance work fine?

          Please suggest.

          Shubham Mishra added a comment - Hi sjvl37 , I am also getting the same error in my error logs while using the ec2 plugin version 1.43. If I change the version to 1.40,would the spot instance work fine? Please suggest.

          > If I change the version to 1.40,would the spot instance work fine?

          Try rolling back to 1.41 first.
          It worked fine for me.
          But then again, 1.43 works for me, too.

          Vladislav Naumov added a comment - > If I change the version to 1.40,would the spot instance work fine? Try rolling back to 1.41 first. It worked fine for me. But then again, 1.43 works for me, too.

          Thanjs vnaum, I will check and test it.

          BTW do we need to downgrade the jenkins version also because as of now I am using jenkins version 2.138 and ec2 plyugin version is 1.43.

          Please suggest.

          Shubham Mishra added a comment - Thanjs vnaum , I will check and test it. BTW do we need to downgrade the jenkins version also because as of now I am using jenkins version 2.138 and ec2 plyugin version is 1.43. Please suggest.

            thoulen FABRIZIO MANFREDI
            vnaum Vladislav Naumov
            Votes:
            13 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: