Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55492

Plugin is not reusing stopped AWS EC2 instances

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • ec2-plugin
    • None
    • Plugin version 1.42

      Using 1.39 of the plugin, the behaviour I observed was that when the instance cap was set to 2 and 2 slaves were already provisioned (e.g. in the stopped EC2 state rather than  terminated), when a slave is required for a pending queued build, the plugin used to simply start one of the 2 stopped instances as required.

       

      Now, on 1.42, the plugin appears to leave the previous 2 EC2 instances in the stopped state on AWS and provision a brand new slave, leading to it exceeding the instance cap of 2 and there now being 3 EC2 instances (albeit 2 stopped and only 1 running).

      For now, my workaround was to go back to 1.39.

          [JENKINS-55492] Plugin is not reusing stopped AWS EC2 instances

          Are the nodes in stop state created before the update ? 

          From 1.39 to 1.41 has been changed the tag labeling, in your case if the nodes was created before the upgrade the label associated is not recognize by the new version.

          What is reported in the log in term of the number of stopped instances ? 

          FABRIZIO MANFREDI added a comment - Are the nodes in stop state created before the update ?  From 1.39 to 1.41 has been changed the tag labeling, in your case if the nodes was created before the upgrade the label associated is not recognize by the new version. What is reported in the log in term of the number of stopped instances ? 

          I might experiencing the same issue. Do you also have multiple subnet ids configured?

          At least in my case this is the culprit: The DescribeInstancesRequest to look for possible instances filters by ONE subnetId equal to chooseSubnetId() - which will return one of the defined subnets in a round-robin fashion.

          As I have a subnet per AZ, the first instance will be started (and on idle, stopped) in AZ a. Then, when we need an instance again, chooseSubnetId() will have moved to a different subnet in AZ b - no instances can be found there, so a new one is created.

          What should happen instead is, that the DescribeInstancesRequest should filter for ANY of the defined subnetIds.

          Sascha Kettler added a comment - I might experiencing the same issue. Do you also have multiple subnet ids configured? At least in my case this is the culprit: The DescribeInstancesRequest to look for possible instances filters by ONE subnetId equal to chooseSubnetId() - which will return one of the defined subnets in a round-robin fashion. As I have a subnet per AZ, the first instance will be started (and on idle, stopped) in AZ a. Then, when we need an instance again, chooseSubnetId() will have moved to a different subnet in AZ b - no instances can be found there, so a new one is created. What should happen instead is, that the DescribeInstancesRequest should filter for ANY of the defined subnetIds.

          Same here, fresh install, current plugin version. AMI template is configured to stop instead of terminate, instances are stopped on idle, but when a new job is started a new instance is launched instead of starting a stopped one.

          Dirk Heinrichs added a comment - Same here, fresh install, current plugin version. AMI template is configured to stop instead of terminate, instances are stopped on idle, but when a new job is started a new instance is launched instead of starting a stopped one.

          Anton Goorin added a comment -

          I'm observing this behaviour for more than a year, Installed 4 fresh servers, and all of the leave stopped nodes as garbage.
          For now I just delete them manually once in a while.  

          Anton Goorin added a comment - I'm observing this behaviour for more than a year, Installed 4 fresh servers, and all of the leave stopped nodes as garbage. For now I just delete them manually once in a while.  

          Evan added a comment -

          2.0.4 of this plugin still has this issue  Multi subnet/az config, turned on "stop instance" instead of allowing it to terminate on Friday, came back on Tuesday and had 185 stopped instances plus 40 running.  

          Evan added a comment - 2.0.4 of this plugin still has this issue  Multi subnet/az config, turned on "stop instance" instead of allowing it to terminate on Friday, came back on Tuesday and had 185 stopped instances plus 40 running.  

          evanrich408 Does it work for you when you have only 1 subnet configured in the plugin?
          I have only 1 subnet configured and it doesn't start the stopped instances. it tries to connect via ssh but the instance remains stopped in aws ec2.

          or are we talking about just the presence of multiple subnets in the vpc?

          Sebastian Opel added a comment - evanrich408 Does it work for you when you have only 1 subnet configured in the plugin? I have only 1 subnet configured and it doesn't start the stopped instances. it tries to connect via ssh but the instance remains stopped in aws ec2. or are we talking about just the presence of multiple subnets in the vpc?

          evanrich408 Sorry, my problem was, that i set the wrong availabily zone. i configured "eu-central-1" instead of "eu-central-1a".

          So with a single subnet configured it works.

          Sebastian Opel added a comment - evanrich408 Sorry, my problem was, that i set the wrong availabily zone. i configured "eu-central-1" instead of "eu-central-1a". So with a single subnet configured it works.

            thoulen FABRIZIO MANFREDI
            davidgoate David Goate
            Votes:
            3 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: