Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-23787

EC2-plugin not spooling up stopped nodes - "still in the queue ... all nodes of label ... are offline"

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • ec2-plugin
    • Jenkins 1.572, EC2 plugin 1.21, Node Iterator API Plugin 1.5

      The Jenkins EC2 plugin no longer launches stopped nodes. Unfortunately I'm not sure exactly when it stopped working - I wasn't sure that was the issue until later, due to unrelated issues caused by too many nodes spawning and having to be killed.

      If I use Manage Jenkins -> Manage Nodes to start a stopped EC2 node that a build is waiting on manually, the build proceeds.

      Builds succeed when the EC2 plugin spawns a new node for the first time. It's only a problem if the node is stopped for idleness - the plugin doesn't seem to restart it.

      Builds get stuck with output like:

      Triggering bdr_linux ? x64,debian7
      Triggering bdr_linux ? x86,amazonlinux201209
      Triggering bdr_linux ? x86,debian6
      Triggering bdr_linux ? x64,amazonlinux201209
      Configuration bdr_linux ? x86,amazonlinux201209 is still in the queue: Amazon Linux 2012.09 EBS 32-bit  (i-b848fbfa) is offline
      Configuration bdr_linux ? x86,amazonlinux201209 is still in the queue: All nodes of label ?amazonlinux201209&&x86? are offline
      

      where there's at least one node with that label stopped, ready to start and use.

      There's no sign that any attempt is made to start the node.

          [JENKINS-23787] EC2-plugin not spooling up stopped nodes - "still in the queue ... all nodes of label ... are offline"

          Magnar Sveen added a comment - - edited

          We had this same issue. Or the same symptoms at least. If your issue is with matrix build, then it's not the same. But I'll add this here in case someone else stumbles over the same problem:

          The latest stable release of Jenkins just looked at the instance AMI ID to determine if any slaves were running. Since our master had the same AMI as our slave, Jenkins figured we had reached our maximum amount of slaves (1).

          Having hit the maximum amount of slaves, it never tried provisioning any new slaves - and the EC2 plugin fires up stopped slaves as part of the provision call.

          Upgrading to a SNAPSHOT version of Jenkins solved the issue. It now adds a tag to all slaves, and uses that when counting.

          Magnar Sveen added a comment - - edited We had this same issue. Or the same symptoms at least. If your issue is with matrix build, then it's not the same. But I'll add this here in case someone else stumbles over the same problem: The latest stable release of Jenkins just looked at the instance AMI ID to determine if any slaves were running. Since our master had the same AMI as our slave, Jenkins figured we had reached our maximum amount of slaves (1). Having hit the maximum amount of slaves, it never tried provisioning any new slaves - and the EC2 plugin fires up stopped slaves as part of the provision call. Upgrading to a SNAPSHOT version of Jenkins solved the issue. It now adds a tag to all slaves, and uses that when counting.

          I'm having the same issue with Jenkins 1.626 and ec2 plugin 1.29. None of the offline ec2 slaves are starting up when the matrix job starts. Instead I have to manually start them up. Does anyone have a solution for this?

          Ebrahim Moshaya added a comment - I'm having the same issue with Jenkins 1.626 and ec2 plugin 1.29. None of the offline ec2 slaves are starting up when the matrix job starts. Instead I have to manually start them up. Does anyone have a solution for this?

          Francis Upton added a comment -

          I have just made a fix for this in unreleased 1.30, if you can get the SNAPSHOT version of the ec2-plugin, hopefully this is fixed. Please give it and try and let me know.

          Francis Upton added a comment - I have just made a fix for this in unreleased 1.30, if you can get the SNAPSHOT version of the ec2-plugin, hopefully this is fixed. Please give it and try and let me know.

          francisu I found the main issue was with using the ec2 plugin with the Throttle Concurrent Builds plugin:

          https://wiki.jenkins-ci.org/display/JENKINS/Throttle+Concurrent+Builds+Plugin

          If I set a job to to the only job executing on a node with 5 executors. Other jobs triggered wait in the queue for the job to finish as opposed to spinning up another instance that's in a stopped state.

          Ebrahim Moshaya added a comment - francisu I found the main issue was with using the ec2 plugin with the Throttle Concurrent Builds plugin: https://wiki.jenkins-ci.org/display/JENKINS/Throttle+Concurrent+Builds+Plugin If I set a job to to the only job executing on a node with 5 executors. Other jobs triggered wait in the queue for the job to finish as opposed to spinning up another instance that's in a stopped state.

          Francis Upton added a comment -

          Thanks for letting me know. I will close this one and reopen my duplicated report, as that has to do with hooking up to EC2 instances that are actually running (even though the corresponding Jenkins slave is offline).

          Francis Upton added a comment - Thanks for letting me know. I will close this one and reopen my duplicated report, as that has to do with hooking up to EC2 instances that are actually running (even though the corresponding Jenkins slave is offline).

          Ted Xiao added a comment - - edited

          I think the commit https://github.com/jenkinsci/ec2-plugin/commit/c9d69a5da8c9be094701d4c191ba7b1d06c200c9 breaks plugin, user can not launch multiple instances for same AMI since DescribeInstancesRequest returns all instances instead of stopped instances, and no instances can be launched if there is any running

          below line was removed
          diFilters.add(new Filter("instance-state-name").withValues(InstanceStateName.Stopped.toString(), InstanceStateName.Stopping.toString()));

          Ted Xiao added a comment - - edited I think the commit https://github.com/jenkinsci/ec2-plugin/commit/c9d69a5da8c9be094701d4c191ba7b1d06c200c9 breaks plugin, user can not launch multiple instances for same AMI since DescribeInstancesRequest returns all instances instead of stopped instances, and no instances can be launched if there is any running below line was removed diFilters.add(new Filter("instance-state-name").withValues(InstanceStateName.Stopped.toString(), InstanceStateName.Stopping.toString()));

          Francis Upton added a comment -

          @Ted, I don't see how that line broke it. Note that I changed EC2Cloud to look up any running (or pending) instances and check that they are known. Only if they are known will they count against the existing instances. I will test it again though, to try multiple instances with the same AMI. The reason the filter was removed is that we want to see if there is a running instance that's not known, and if so, use that one.

          Francis Upton added a comment - @Ted, I don't see how that line broke it. Note that I changed EC2Cloud to look up any running (or pending) instances and check that they are known. Only if they are known will they count against the existing instances. I will test it again though, to try multiple instances with the same AMI. The reason the filter was removed is that we want to see if there is a running instance that's not known, and if so, use that one.

          Francis Upton added a comment -

          I believe I have now fixed this.

          Francis Upton added a comment - I believe I have now fixed this.

          Francis Upton added a comment -

          Should be released in 1.30

          Francis Upton added a comment - Should be released in 1.30

          Francis Upton added a comment -

          Fixed a problem with this of it not respecting the instance caps for on-demand nodes.

          Francis Upton added a comment - Fixed a problem with this of it not respecting the instance caps for on-demand nodes.

            francisu Francis Upton
            ringerc Craig Ringer
            Votes:
            2 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: