• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • ec2-plugin
    • None
    • Jenkins ver. 2.176.1, 2.204.2
      ec2 plugin 1.43, 1.44, 1.45, 1.49.1
    • ec2 1.51

      Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

      The plugin will just loop on this:

      SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
      May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
      SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
      May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
      Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
      

      If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

      It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

      We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

      We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

          [JENKINS-57795] Orphaned EC2 instances after Jenkins restart

          Jakub Bochenski created issue -
          Jakub Bochenski made changes -
          Description Original: Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

          The plugin will just loop on this:
          {code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
          May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
          SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
          May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
          Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
          {code}

          If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

          It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

          We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

          It seems the problems do not occur when I do a `/safeRestart` but they do if I use e.g. "restart Jenkins when no jobs are running" form the Update Center.

          We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.
          New: Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

          The plugin will just loop on this:
          {code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
          May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
          SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
          May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
          Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
          {code}

          If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

          It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

          We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

          It seems the problems do not occur when I do a {{/safeRestart}} but they do if I use e.g. "restart Jenkins when no jobs are running" form the Update Center.

          We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

          thoulen it would be nice to at least get some pointers on how to debug this further or work around it

          Jakub Bochenski added a comment - thoulen it would be nice to at least get some pointers on how to debug this further or work around it

          raihaan maybe you would care to respond?

          Jakub Bochenski added a comment - raihaan maybe you would care to respond?

          Can you tell me which version are you using ? 

          There is a bug of the calculation, but should not  affect you case.

          What is the configuration of your pool ? 

          do you have more then one pool with same description, ami and tags ? 

          Can you try with 2  ?

          FABRIZIO MANFREDI added a comment - i  Can you tell me which version are you using ?  There is a bug of the calculation, but should not  affect you case. What is the configuration of your pool ?  do you have more then one pool with same description, ami and tags ?  Can you try with 2  ?
          Jakub Bochenski made changes -
          Description Original: Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

          The plugin will just loop on this:
          {code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
          May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
          SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
          May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
          Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
          {code}

          If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

          It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

          We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

          It seems the problems do not occur when I do a {{/safeRestart}} but they do if I use e.g. "restart Jenkins when no jobs are running" form the Update Center.

          We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.
          New: Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

          The plugin will just loop on this:
          {code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
          May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
          SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
          May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
          Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
          {code}

          If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

          It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

          We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

          We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

          Jakub Bochenski added a comment - - edited

          This is happening at least since 1.43 and it just happened on 1.44

          I have only one EC2 cloud configured, but I also have an ECS cloud (they use separate agent labels).

          This is our cloud configuration done via groovy script:

          final cloud = new AmazonEC2Cloud(
                  'ec2',
                  false,
                  config.ec2_access_key,
                  config.ec2_region,
                  config.ec2_ssh_key,
                  config.ec2_instance_cap,
                  [
          
          
                          new SlaveTemplate(
                                  config.ec2_ami_id,
                                  '',
                                  null,
                                  config.ec2_security_groups,
                                  '/tmp',
                                  InstanceType.fromValue(config.ec2_instance_type),
                                  false,
                                  config.ec2_label,
                                  Node.Mode.NORMAL,
                                  "ec2 (${config.ec2_ami_id})",
                                  '',
                                  '/tmp',
                                  '',
                                  '1',
                                  config.ec2_remote_user,
                                  new UnixData(null, null, null, null),
                                  '',
                                  false,
                                  config.ec2_subnet_id,
                                  [
                                          Name: 'acme', 
                                          Contact : 'acme@acme.com',
                                  ].collect{ new EC2Tag(it.key,it.value) },
                                  '30',
                                  false,
                                  '',
                                  config.ec2_arn_role,
                                  true,
                                  false,
                                  false,
                                  '1800',
                                  false,
                                  '',
                                  false,
                                  false,
                                  false,
                                  false
                          )],
                  config.ec2_arn_role,
                  ''
          )

          Jakub Bochenski added a comment - - edited This is happening at least since 1.43 and it just happened on 1.44 I have only one EC2 cloud configured, but I also have an ECS cloud (they use separate agent labels). This is our cloud configuration done via groovy script: final cloud = new AmazonEC2Cloud( 'ec2' , false , config.ec2_access_key, config.ec2_region, config.ec2_ssh_key, config.ec2_instance_cap, [ new SlaveTemplate( config.ec2_ami_id, '', null , config.ec2_security_groups, '/tmp' , InstanceType.fromValue(config.ec2_instance_type), false , config.ec2_label, Node.Mode.NORMAL, "ec2 (${config.ec2_ami_id})" , '', '/tmp' , '', '1' , config.ec2_remote_user, new UnixData( null , null , null , null ), '', false , config.ec2_subnet_id, [ Name: 'acme' , Contact : 'acme@acme.com' , ].collect{ new EC2Tag(it.key,it.value) }, '30' , false , '', config.ec2_arn_role, true , false , false , '1800' , false , '', false , false , false , false )], config.ec2_arn_role, '' )

          Can you try with 2 ?

          If I reproduce the issue with instance cap = 1, then increase the cap to 2 I will get a new agent spawned (but only 1)

          Now trying to reproduce this with 2 instances getting orphaned.

          I also tried setting instance cap on slave template to 2 (it was blank before) – doesn't seem to help

          Jakub Bochenski added a comment - Can you try with 2 ? If I reproduce the issue with instance cap = 1, then increase the cap to 2 I will get a new agent spawned (but only 1) Now trying to reproduce this with 2 instances getting orphaned. I also tried setting instance cap on slave template to 2 (it was blank before) – doesn't seem to help

          I'm now getting this situation with instance cap = 2. I have two matching instances on EC2, both are active.
          Plugin is looping with above message, with no agents available for the builds.

          Now when I terminated one of the instances an interesting thing happened. Jenkins was able to pick up the other instance and reconnect it

          SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. Cannot provision - no capacity for instances: 0
          
          Jun 27, 2019 11:35:07 AM WARNING hudson.plugins.ec2.EC2Cloud provision
          
          Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}
          
          Jun 27, 2019 11:35:16 AM INFO hudson.plugins.ec2.EC2Cloud provision
          
          SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units
          
          Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
          
          SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. Considering launching
          
          Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice
          
          AMI had xvda
          
          Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice
          
          {DeleteOnTermination: true,SnapshotId: snap-0b70f104d64ae4a48,VolumeSize: 8,VolumeType: gp2,Encrypted: false,}
          
          Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
          
          SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate
          
          Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
          
          SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. Looking for existing instances with describe-instance: {Filters: SNAP
          
          Jun 27, 2019 11:35:18 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
          
          SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. checkInstance: i-0e454aea630ccb88f. true - Instance is not connected to Jenkins
          

          Jakub Bochenski added a comment - I'm now getting this situation with instance cap = 2. I have two matching instances on EC2, both are active. Plugin is looping with above message, with no agents available for the builds. Now when I terminated one of the instances an interesting thing happened. Jenkins was able to pick up the other instance and reconnect it SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. Cannot provision - no capacity for instances: 0 Jun 27, 2019 11:35:07 AM WARNING hudson.plugins.ec2.EC2Cloud provision Can 't raise nodes for SlaveTemplate{ami=' ami-0efbb291c6e8cc847 ', labels=' docker docker-bakery'} Jun 27, 2019 11:35:16 AM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. Considering launching Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice AMI had xvda Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice {DeleteOnTermination: true ,SnapshotId: snap-0b70f104d64ae4a48,VolumeSize: 8,VolumeType: gp2,Encrypted: false ,} Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. Looking for existing instances with describe-instance: {Filters: SNAP Jun 27, 2019 11:35:18 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. checkInstance: i-0e454aea630ccb88f. true - Instance is not connected to Jenkins

          Above looks like maybe there is some "off by one" error, when the plugin won't attempt to re-connect instances if it's at instance cap

          Jakub Bochenski added a comment - Above looks like maybe there is some "off by one" error, when the plugin won't attempt to re-connect instances if it's at instance cap

            thoulen FABRIZIO MANFREDI
            jbochenski Jakub Bochenski
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: