[JENKINS-57795] Orphaned EC2 instances after Jenkins restart

Jakub Bochenski added a comment - 2019-06-25 13:42

thoulen it would be nice to at least get some pointers on how to debug this further or work around it

Jakub Bochenski added a comment - 2019-06-25 13:42 thoulen it would be nice to at least get some pointers on how to debug this further or work around it

Jakub Bochenski added a comment - 2019-06-26 13:56

raihaan maybe you would care to respond?

Jakub Bochenski added a comment - 2019-06-26 13:56 raihaan maybe you would care to respond?

FABRIZIO MANFREDI added a comment - 2019-06-26 15:02

i

Can you tell me which version are you using ?

There is a bug of the calculation, but should not affect you case.

What is the configuration of your pool ?

do you have more then one pool with same description, ami and tags ?

Can you try with 2 ?

FABRIZIO MANFREDI added a comment - 2019-06-26 15:02 i Can you tell me which version are you using ? There is a bug of the calculation, but should not affect you case. What is the configuration of your pool ? do you have more then one pool with same description, ami and tags ? Can you try with 2 ?

Jakub Bochenski added a comment - 2019-06-26 15:48 - edited

This is happening at least since 1.43 and it just happened on 1.44

I have only one EC2 cloud configured, but I also have an ECS cloud (they use separate agent labels).

This is our cloud configuration done via groovy script:

final cloud = new AmazonEC2Cloud(
        'ec2',
        false,
        config.ec2_access_key,
        config.ec2_region,
        config.ec2_ssh_key,
        config.ec2_instance_cap,
        [


                new SlaveTemplate(
                        config.ec2_ami_id,
                        '',
                        null,
                        config.ec2_security_groups,
                        '/tmp',
                        InstanceType.fromValue(config.ec2_instance_type),
                        false,
                        config.ec2_label,
                        Node.Mode.NORMAL,
                        "ec2 (${config.ec2_ami_id})",
                        '',
                        '/tmp',
                        '',
                        '1',
                        config.ec2_remote_user,
                        new UnixData(null, null, null, null),
                        '',
                        false,
                        config.ec2_subnet_id,
                        [
                                Name: 'acme', 
                                Contact : 'acme@acme.com',
                        ].collect{ new EC2Tag(it.key,it.value) },
                        '30',
                        false,
                        '',
                        config.ec2_arn_role,
                        true,
                        false,
                        false,
                        '1800',
                        false,
                        '',
                        false,
                        false,
                        false,
                        false
                )],
        config.ec2_arn_role,
        ''
)

Jakub Bochenski added a comment - 2019-06-26 15:48 - edited This is happening at least since 1.43 and it just happened on 1.44 I have only one EC2 cloud configured, but I also have an ECS cloud (they use separate agent labels). This is our cloud configuration done via groovy script: final cloud = new AmazonEC2Cloud( 'ec2' , false , config.ec2_access_key, config.ec2_region, config.ec2_ssh_key, config.ec2_instance_cap, [ new SlaveTemplate( config.ec2_ami_id, '', null , config.ec2_security_groups, '/tmp' , InstanceType.fromValue(config.ec2_instance_type), false , config.ec2_label, Node.Mode.NORMAL, "ec2 (${config.ec2_ami_id})" , '', '/tmp' , '', '1' , config.ec2_remote_user, new UnixData( null , null , null , null ), '', false , config.ec2_subnet_id, [ Name: 'acme' , Contact : 'acme@acme.com' , ].collect{ new EC2Tag(it.key,it.value) }, '30' , false , '', config.ec2_arn_role, true , false , false , '1800' , false , '', false , false , false , false )], config.ec2_arn_role, '' )

Jakub Bochenski added a comment - 2019-06-27 10:47

Can you try with 2 ?

If I reproduce the issue with instance cap = 1, then increase the cap to 2 I will get a new agent spawned (but only 1)

Now trying to reproduce this with 2 instances getting orphaned.

I also tried setting instance cap on slave template to 2 (it was blank before) – doesn't seem to help

Jakub Bochenski added a comment - 2019-06-27 10:47 Can you try with 2 ? If I reproduce the issue with instance cap = 1, then increase the cap to 2 I will get a new agent spawned (but only 1) Now trying to reproduce this with 2 instances getting orphaned. I also tried setting instance cap on slave template to 2 (it was blank before) – doesn't seem to help

Jakub Bochenski added a comment - 2019-06-27 11:38

I'm now getting this situation with instance cap = 2. I have two matching instances on EC2, both are active.
Plugin is looping with above message, with no agents available for the builds.

Now when I terminated one of the instances an interesting thing happened. Jenkins was able to pick up the other instance and reconnect it

SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. Cannot provision - no capacity for instances: 0

Jun 27, 2019 11:35:07 AM WARNING hudson.plugins.ec2.EC2Cloud provision

Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}

Jun 27, 2019 11:35:16 AM INFO hudson.plugins.ec2.EC2Cloud provision

SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units

Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. Considering launching

Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice

AMI had xvda

Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice

{DeleteOnTermination: true,SnapshotId: snap-0b70f104d64ae4a48,VolumeSize: 8,VolumeType: gp2,Encrypted: false,}

Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate

Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. Looking for existing instances with describe-instance: {Filters: SNAP

Jun 27, 2019 11:35:18 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker docker-bakery'}. checkInstance: i-0e454aea630ccb88f. true - Instance is not connected to Jenkins

Jakub Bochenski added a comment - 2019-06-27 11:38 I'm now getting this situation with instance cap = 2. I have two matching instances on EC2, both are active. Plugin is looping with above message, with no agents available for the builds. Now when I terminated one of the instances an interesting thing happened. Jenkins was able to pick up the other instance and reconnect it SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. Cannot provision - no capacity for instances: 0 Jun 27, 2019 11:35:07 AM WARNING hudson.plugins.ec2.EC2Cloud provision Can 't raise nodes for SlaveTemplate{ami=' ami-0efbb291c6e8cc847 ', labels=' docker docker-bakery'} Jun 27, 2019 11:35:16 AM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. Considering launching Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice AMI had xvda Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice {DeleteOnTermination: true ,SnapshotId: snap-0b70f104d64ae4a48,VolumeSize: 8,VolumeType: gp2,Encrypted: false ,} Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate Jun 27, 2019 11:35:17 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. Looking for existing instances with describe-instance: {Filters: SNAP Jun 27, 2019 11:35:18 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-0efbb291c6e8cc847' , labels= 'docker docker-bakery' }. checkInstance: i-0e454aea630ccb88f. true - Instance is not connected to Jenkins

Jakub Bochenski added a comment - 2019-06-27 11:40

Above looks like maybe there is some "off by one" error, when the plugin won't attempt to re-connect instances if it's at instance cap

Jakub Bochenski added a comment - 2019-06-27 11:40 Above looks like maybe there is some "off by one" error, when the plugin won't attempt to re-connect instances if it's at instance cap

Jakub Bochenski added a comment - 2019-06-27 12:11

I double checked this.
If I'm at cap = 1 with 1 orphaned instance and increase the cap to 2 then the plugin will spawn a new instance.
If I'm at cap = 2 with 2 orphaned instances and terminate one of the instances manually then the plugin will reconnect the other instance

Jakub Bochenski added a comment - 2019-06-27 12:11 I double checked this. If I'm at cap = 1 with 1 orphaned instance and increase the cap to 2 then the plugin will spawn a new instance. If I'm at cap = 2 with 2 orphaned instances and terminate one of the instances manually then the plugin will reconnect the other instance

Jakub Bochenski added a comment - 2019-07-04 13:35 - edited

thoulen raihaan I know this is OSS and there are is not SLA. Still could you tell me if I and when I can expect any assistance from you?

Jakub Bochenski added a comment - 2019-07-04 13:35 - edited thoulen raihaan I know this is OSS and there are is not SLA. Still could you tell me if I and when I can expect any assistance from you?

FABRIZIO MANFREDI added a comment - 2019-07-05 09:05

I believe I found the problem, I trying to put in the 1.44.2 that should be release in a couple of days.

One more question what do you mean with orphaned, stop state or no longer in the jenkins interface ?

Did you apply the all the IAM role requested specify in the ec2 plugin page ?

FABRIZIO MANFREDI added a comment - 2019-07-05 09:05 I believe I found the problem, I trying to put in the 1.44.2 that should be release in a couple of days. One more question what do you mean with orphaned, stop state or no longer in the jenkins interface ? Did you apply the all the IAM role requested specify in the ec2 plugin page ?

Jakub Bochenski added a comment - 2019-07-05 12:36 - edited

> One more question what do you mean with orphaned, stop state or no longer in the jenkins interface ?

It's not available as an agent in jenkins. It's still in running state when I check the status in AWS console

> Did you apply the all the IAM role requested specify in the ec2 plugin page ?

I believe I did, since this is a random error it doesn't happen every time. E.g. the instances get terminated after idle timeout

Jakub Bochenski added a comment - 2019-07-05 12:36 - edited > One more question what do you mean with orphaned, stop state or no longer in the jenkins interface ? It's not available as an agent in jenkins. It's still in running state when I check the status in AWS console > Did you apply the all the IAM role requested specify in the ec2 plugin page ? I believe I did, since this is a random error it doesn't happen every time. E.g. the instances get terminated after idle timeout

Jakub Bochenski added a comment - 2019-08-05 13:30

> I believe I found the problem, I trying to put in the 1.44.2 that should be release in a couple of days.

thoulen it's been a month now and I can't see any new releases after 1.44.1. Any updates?

Jakub Bochenski added a comment - 2019-08-05 13:30 > I believe I found the problem, I trying to put in the 1.44.2 that should be release in a couple of days. thoulen it's been a month now and I can't see any new releases after 1.44.1. Any updates?

FABRIZIO MANFREDI added a comment - 2019-08-10 19:11

Can you test the 1.45 ?

FABRIZIO MANFREDI added a comment - 2019-08-10 19:11 Can you test the 1.45 ?

Jakub Bochenski added a comment - 2019-08-12 09:23

thoulen the same problem is happening with 1.45

Jakub Bochenski added a comment - 2019-08-12 09:23 thoulen the same problem is happening with 1.45

Raihaan Shouhell added a comment - 2019-09-02 03:44

I'm not sure how to replicate this issue :/

Raihaan Shouhell added a comment - 2019-09-02 03:44 I'm not sure how to replicate this issue :/

Jakub Bochenski added a comment - 2019-09-02 09:36

raihaan I've provided the full configuration via a groovy script above. What else do you need?

Jakub Bochenski added a comment - 2019-09-02 09:36 raihaan I've provided the full configuration via a groovy script above. What else do you need?

Raihaan Shouhell added a comment - 2019-09-02 09:40

I have made a cloud, set the instanceCap to 1 and restarted without running into the orphaning problem. Is there a way to reproduce this consistently from your end? Also what is the number of instances that get run in your setup?

Raihaan Shouhell added a comment - 2019-09-02 09:40 I have made a cloud, set the instanceCap to 1 and restarted without running into the orphaning problem. Is there a way to reproduce this consistently from your end? Also what is the number of instances that get run in your setup?

cedric lecoz added a comment - 2019-09-05 13:15 - edited

Hi all,
I've done an upgrade of our jenkins last week, ec2 plugin moved from 1.43 to 1.45.
This issue had already been seen on 1.43, but rarelly.
On new version I get at least an occurence a day (upgrade was core + plugin, everything a couple month old).

Going thought logs to try to figure out what was happening before I found this ticket, I saw the following traces, adding it here in case it could help debug the problem.
In all the cases, my EC2 instance is started correctly, it's just that jenkins doesn't see it.

When it works:

Sep 05, 2019 9:25:53 AM hudson.plugins.ec2.EC2Cloud provision
INFO: SlaveTemplate{ami='ami-****', labels='build-yocto-persistent'}. Attempting provision finished, excess workload: -1
Sep 05, 2019 9:25:53 AM hudson.plugins.ec2.EC2Cloud provision
INFO: We have now 27 computers, waiting for 1 more
Sep 05, 2019 9:25:53 AM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning EC2 (ec2-project) - build-yocto-persistent from ec2-ec2-project with 2 executors. Remaining excess workload: -1
INFO: SlaveTemplate{ami='ami-****', labels='build-yocto-persistent'} Node EC2 (ec2-project) - build-yocto-persistent (i-****) moved to RUNNING state in 5 seconds and is ready to be connected by Jenkins
Sep 05, 2019 9:25:58 AM hudson.plugins.ec2.EC2Cloud log
INFO: Launching instance: i-****
Sep 05, 2019 9:25:58 AM hudson.plugins.ec2.EC2Cloud log
Sep 05, 2019 9:25:58 AM hudson.plugins.ec2.EC2Cloud log
INFO: Connecting to 10.1.0.234 on port 22, with timeout 10000.
Sep 05, 2019 9:26:03 AM hudson.slaves.NodeProvisioner$2 run
INFO: EC2 (ec2-project) - build-yocto-persistent provisioning successfully completed. We have now 27 computer(s)
Sep 05, 2019 9:26:03 AM com.tsystems.sbs.LogFileFilterOutputStream <init>

When It does not work:

Sep 05, 2019 11:51:13 AM hudson.plugins.ec2.EC2Cloud provision
INFO: SlaveTemplate{ami='ami-****', labels='build-yocto-persistent'}. Attempting provision finished, excess workload: -1
Sep 05, 2019 11:51:13 AM hudson.plugins.ec2.EC2Cloud provision
INFO: We have now 27 computers, waiting for 1 more
Sep 05, 2019 11:51:13 AM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning EC2 (ec2-project) - build-yocto-persistent from ec2-ec2-project with 2 executors. Remaining excess workload: -1
Sep 05, 2019 11:51:13 AM hudson.plugins.ec2.EC2Cloud$1 call
WARNING: SlaveTemplate{ami='ami-****', labels='build-yocto-persistent'}. Node stopped is neither pending, neither running, its {2}. Terminate provisioning
Sep 05, 2019 11:51:14 AM hudson.plugins.repo.ChangeLog saveChangeLog
INFO: No logs found

In that case the Node stopped is neither pending, neither running ... trace popped in less that a second instead of the 5 seconds when no problem.

An other observation I made, is in my cloudtrail logs,
when it works, I can see the following calls to AWS :

  -09:25:53 StartInstances  using  the  i-**** instance ID.
  -09:25:53 DescribeInstances, using the i-**** instance Id as seen in following requestParameters
    "requestParameters": {
        "instancesSet": {
            "items": [
                {
                    "instanceId": "i-****"
                }
            ]
        },
        "filterSet": {}
    },
  -09:25:55 CreateGrant (for decryption)
  -09:25:58  DescribeInstance using the i-**** instance Id as seen in above requestParameters.
  - ...

when it does not work:

  -11:51:13 StartInstances  using  the  i-**** instance ID.
  -11:51:14 DescribeInstances, using the i-**** instance Id  as seen in above requestParameters
  -11:51:15 CreateGrant (for decryption)
  -11:51:19 DescribeInstance empty parameters  as seen in following requestParameters
    "requestParameters": {
        "instancesSet": {},
        "filterSet": {}
    },

I did a bit more testing, everytime I reproduce the issue, I do not have a correct (with instanceId) DescribeInstance after the first one.

EDIT: As Jakub in the next comment I reproduce the issue on instances with a cap=1. those instances have specificities like a second drive, so cap needs to be 1.
On generic instances. with cap > 1 I haven't seen the problem

All the best,

Cedric

cedric lecoz added a comment - 2019-09-05 13:15 - edited Hi all, I've done an upgrade of our jenkins last week, ec2 plugin moved from 1.43 to 1.45. This issue had already been seen on 1.43, but rarelly. On new version I get at least an occurence a day (upgrade was core + plugin, everything a couple month old). Going thought logs to try to figure out what was happening before I found this ticket, I saw the following traces, adding it here in case it could help debug the problem. In all the cases, my EC2 instance is started correctly, it's just that jenkins doesn't see it. When it works: Sep 05, 2019 9:25:53 AM hudson.plugins.ec2.EC2Cloud provision INFO: SlaveTemplate{ami= 'ami-****' , labels= 'build-yocto-persistent' }. Attempting provision finished, excess workload: -1 Sep 05, 2019 9:25:53 AM hudson.plugins.ec2.EC2Cloud provision INFO: We have now 27 computers, waiting for 1 more Sep 05, 2019 9:25:53 AM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning EC2 (ec2-project) - build-yocto-persistent from ec2-ec2-project with 2 executors. Remaining excess workload: -1 INFO: SlaveTemplate{ami= 'ami-****' , labels= 'build-yocto-persistent' } Node EC2 (ec2-project) - build-yocto-persistent (i-****) moved to RUNNING state in 5 seconds and is ready to be connected by Jenkins Sep 05, 2019 9:25:58 AM hudson.plugins.ec2.EC2Cloud log INFO: Launching instance: i-**** Sep 05, 2019 9:25:58 AM hudson.plugins.ec2.EC2Cloud log Sep 05, 2019 9:25:58 AM hudson.plugins.ec2.EC2Cloud log INFO: Connecting to 10.1.0.234 on port 22, with timeout 10000. Sep 05, 2019 9:26:03 AM hudson.slaves.NodeProvisioner$2 run INFO: EC2 (ec2-project) - build-yocto-persistent provisioning successfully completed. We have now 27 computer(s) Sep 05, 2019 9:26:03 AM com.tsystems.sbs.LogFileFilterOutputStream <init> When It does not work: Sep 05, 2019 11:51:13 AM hudson.plugins.ec2.EC2Cloud provision INFO: SlaveTemplate{ami= 'ami-****' , labels= 'build-yocto-persistent' }. Attempting provision finished, excess workload: -1 Sep 05, 2019 11:51:13 AM hudson.plugins.ec2.EC2Cloud provision INFO: We have now 27 computers, waiting for 1 more Sep 05, 2019 11:51:13 AM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply INFO: Started provisioning EC2 (ec2-project) - build-yocto-persistent from ec2-ec2-project with 2 executors. Remaining excess workload: -1 Sep 05, 2019 11:51:13 AM hudson.plugins.ec2.EC2Cloud$1 call WARNING: SlaveTemplate{ami= 'ami-****' , labels= 'build-yocto-persistent' }. Node stopped is neither pending, neither running, its {2}. Terminate provisioning Sep 05, 2019 11:51:14 AM hudson.plugins.repo.ChangeLog saveChangeLog INFO: No logs found In that case the Node stopped is neither pending, neither running ... trace popped in less that a second instead of the 5 seconds when no problem. An other observation I made, is in my cloudtrail logs, when it works, I can see the following calls to AWS : -09:25:53 StartInstances using the i-**** instance ID. -09:25:53 DescribeInstances, using the i-**** instance Id as seen in following requestParameters "requestParameters" : { "instancesSet" : { "items" : [ { "instanceId" : "i-****" } ] }, "filterSet" : {} }, -09:25:55 CreateGrant ( for decryption) -09:25:58 DescribeInstance using the i-**** instance Id as seen in above requestParameters. - ... when it does not work: -11:51:13 StartInstances using the i-**** instance ID. -11:51:14 DescribeInstances, using the i-**** instance Id as seen in above requestParameters -11:51:15 CreateGrant ( for decryption) -11:51:19 DescribeInstance empty parameters as seen in following requestParameters "requestParameters" : { "instancesSet" : {}, "filterSet" : {} }, I did a bit more testing, everytime I reproduce the issue, I do not have a correct (with instanceId) DescribeInstance after the first one. EDIT: As Jakub in the next comment I reproduce the issue on instances with a cap=1. those instances have specificities like a second drive, so cap needs to be 1. On generic instances. with cap > 1 I haven't seen the problem All the best, Cedric

Jakub Bochenski added a comment - 2019-09-05 14:10

Is there a way to reproduce this consistently from your end? Also what is the number of instances that get run in your setup?

It happens quite often after restart but I have no way to reproduce it 100%.

Maybe the fact that the instance counting logic sees the EC2 machine (since it says there is no capacity), but the attaching node can't connect it for some reason would be a hint?

lso what is the number of instances that get run in your setup?

I'm not sure if I understand. The instance cap is 1 so we have at most 1 instance.

Jakub Bochenski added a comment - 2019-09-05 14:10 Is there a way to reproduce this consistently from your end? Also what is the number of instances that get run in your setup? It happens quite often after restart but I have no way to reproduce it 100%. Maybe the fact that the instance counting logic sees the EC2 machine (since it says there is no capacity), but the attaching node can't connect it for some reason would be a hint? lso what is the number of instances that get run in your setup? I'm not sure if I understand. The instance cap is 1 so we have at most 1 instance.

Raihaan Shouhell added a comment - 2019-09-12 02:46

From this log
```
WARNING: SlaveTemplate

{ami='ami-****', labels='build-yocto-persistent'}

. Node stopped is neither pending, neither running, its {2}. Terminate provisioning
```
It says that the node has been stopped. Btw are you on ondemand slaves or spots

Raihaan Shouhell added a comment - 2019-09-12 02:46 From this log ``` WARNING: SlaveTemplate {ami='ami-****', labels='build-yocto-persistent'} . Node stopped is neither pending, neither running, its {2}. Terminate provisioning ``` It says that the node has been stopped. Btw are you on ondemand slaves or spots

cedric lecoz added a comment - 2019-09-12 11:28

Hi raihaan,
That's what it says, but the EC2 instance was alive, I could ssh and work away, it's just that Jenkins was not aware of it. my instances are on demand.
Attached jenkins_201909121030.log a log with a bit more data 2 differences EC2 had the issue (or similar) the first one (aws-audit-ec2) had an EC2 running and terminated an hour before (so it was still showing as terminated in my EC2 console). the second one the EC2 already existed, and was just stopped. I tried to clean the log at best, but I have too many jobs running on other ec2, it's noisy.
BR,
Cedric.

cedric lecoz added a comment - 2019-09-12 11:28 Hi raihaan , That's what it says, but the EC2 instance was alive, I could ssh and work away, it's just that Jenkins was not aware of it. my instances are on demand. Attached jenkins_201909121030.log a log with a bit more data 2 differences EC2 had the issue (or similar) the first one (aws-audit-ec2) had an EC2 running and terminated an hour before (so it was still showing as terminated in my EC2 console). the second one the EC2 already existed, and was just stopped. I tried to clean the log at best, but I have too many jobs running on other ec2, it's noisy. BR, Cedric.

Raihaan Shouhell added a comment - 2019-09-13 06:52

sirzic jbochenski could someone test https://ci.jenkins.io/job/Plugins/job/ec2-plugin/job/PR-397/2/artifact/org/jenkins-ci/plugins/ec2/1.46-rc1050.a8a95e8dd7f5/ec2-1.46-rc1050.a8a95e8dd7f5.hpi and see if this issue still occurs? This retries on missing instances instead of giving up immediately.

Raihaan Shouhell added a comment - 2019-09-13 06:52 sirzic jbochenski could someone test https://ci.jenkins.io/job/Plugins/job/ec2-plugin/job/PR-397/2/artifact/org/jenkins-ci/plugins/ec2/1.46-rc1050.a8a95e8dd7f5/ec2-1.46-rc1050.a8a95e8dd7f5.hpi and see if this issue still occurs? This retries on missing instances instead of giving up immediately.

cedric lecoz added a comment - 2019-09-15 08:48

HI raihaan,
I updated to that new version this morning, few tests I did were ok. I added a job which will start / destroy / start ..an ec2 using the plugin every 15min, and asked the team to ping me if they see the problem happen. if we don't see the problem, will try to update this ticket by next Thursday.
BR,
Cedric.

cedric lecoz added a comment - 2019-09-15 08:48 HI raihaan , I updated to that new version this morning, few tests I did were ok. I added a job which will start / destroy / start ..an ec2 using the plugin every 15min, and asked the team to ping me if they see the problem happen. if we don't see the problem, will try to update this ticket by next Thursday. BR, Cedric.

cedric lecoz added a comment - 2019-09-16 10:12

Hi raihaan,

Reproduced it twice this morning, attached jenkins.temp_dsl.log one of the log.
Plugin manager still shows

{1.46-rc1050.a8a95e8dd7f5}

for EC2 plugin

C.

cedric lecoz added a comment - 2019-09-16 10:12 Hi raihaan , Reproduced it twice this morning, attached jenkins.temp_dsl.log one of the log. Plugin manager still shows {1.46-rc1050.a8a95e8dd7f5} for EC2 plugin C.

Raihaan Shouhell added a comment - 2019-09-17 04:32 - edited

[^ec2.hpi] sirzic

^{For your latest issue the linked HPI should solve it. The issue you seem to see is when starting from a stopped instance due to eventual consistency of AWS APIs it occasionally sees a freshly started instance as stopped as a result for newly started instances I added a retry to deal with this.}

Raihaan Shouhell added a comment - 2019-09-17 04:32 - edited [^ec2.hpi] sirzic For your latest issue the linked HPI should solve it. The issue you seem to see is when starting from a stopped instance due to eventual consistency of AWS APIs it occasionally sees a freshly started instance as stopped as a result for newly started instances I added a retry to deal with this.

cedric lecoz added a comment - 2019-09-17 07:54

ok tks, will try asap but that may not be before the WE, jenkins is slightly too busy during the week

cedric lecoz added a comment - 2019-09-17 07:54 ok tks, will try asap but that may not be before the WE, jenkins is slightly too busy during the week

cedric lecoz added a comment - 2019-09-18 09:27 - edited

hi raihaan,
Is the ec2.hpi plugin you attached here the same which was built by https://github.com/jenkinsci/ec2-plugin/pull/398 ?
It's easier to add to my ci env (automated) when the plugin comes directly from ci.jenkins.io, and easier to track too

I am asking because it does not looks like PR-398 includes what I tested from PR-397.

tks,
C/

cedric lecoz added a comment - 2019-09-18 09:27 - edited hi raihaan , Is the ec2.hpi plugin you attached here the same which was built by https://github.com/jenkinsci/ec2-plugin/pull/398 ? It's easier to add to my ci env (automated) when the plugin comes directly from ci.jenkins.io, and easier to track too I am asking because it does not looks like PR-398 includes what I tested from PR-397. tks, C/

Raihaan Shouhell added a comment - 2019-09-18 09:29

sirzic yes it is i attached it directly because CI was struggling to build it yesterday. I have removed the attachment.

Raihaan Shouhell added a comment - 2019-09-18 09:29 sirzic yes it is i attached it directly because CI was struggling to build it yesterday. I have removed the attachment.

cedric lecoz added a comment - 2019-09-19 11:32

Hi raihaan,
Using the 1.46-rc1050.43f9773eed95 plugin, I reproduced the issue when starting a new EC2 after the previous one was terminated, see attached log start_fresh_1.46-rc1050.43f9773eed95.txt. (what I believe was fixed in PR-397).

Issue from a stopped slave has not yet been reproduced.

BR,
Cedric.

cedric lecoz added a comment - 2019-09-19 11:32 Hi raihaan , Using the 1.46-rc1050.43f9773eed95 plugin, I reproduced the issue when starting a new EC2 after the previous one was terminated, see attached log start_fresh_1.46-rc1050.43f9773eed95.txt . (what I believe was fixed in PR-397). Issue from a stopped slave has not yet been reproduced. BR, Cedric.

cedric lecoz added a comment - 2019-09-24 06:25

Hi,
Status update, since last week I have not reproduced the issue when starting a stopped instance.
I have reproduced a dozen of times the issue when the previous instance was terminated.

I just saw there was a new PR-399 1.46-rc1052.8c6d855421ac associated to this ticket, so I pushed it to our Jenkins, will keep you updated.
C/

cedric lecoz added a comment - 2019-09-24 06:25 Hi, Status update, since last week I have not reproduced the issue when starting a stopped instance. I have reproduced a dozen of times the issue when the previous instance was terminated. I just saw there was a new PR-399 1.46-rc1052.8c6d855421ac associated to this ticket, so I pushed it to our Jenkins, will keep you updated. C/

cedric lecoz added a comment - 2019-09-26 10:41

same problem (starting a new instance when previous instance has been terminated). seen on PR399.
C/

cedric lecoz added a comment - 2019-09-26 10:41 same problem (starting a new instance when previous instance has been terminated). seen on PR399. C/

Jakub Bochenski added a comment - 2019-09-30 09:35 - edited

I'm still seeing issues in 1.45. The instance is in running state but plugin can't see t be able to connect it.

I can't test it on 1.46 because of ~~JENKINS-59564~~

Sep 30, 2019 9:32:03 AM INFO hudson.plugins.ec2.EC2Cloud provision

SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units

Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Considering launching

Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice

AMI had xvda

Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice

{DeleteOnTermination: true,SnapshotId: snap-0f2d5ab1c6f918116,VolumeSize: 8,VolumeType: gp2,Encrypted: false,}

Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate

Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Looking for existing instances with describe-instance: {Filters: [{Name: image-id,Values: [ami-02769bd03e603e42f]}, {Name: instance-type,Values: [t3.micro]}, {Name: key-name,Values: [j4a-bochja]}, {Name: subnet-id,Values: [subnet-0a167eb56a247e891]}, {Name: instance.group-id,Values: [sg-0a4f5e5ac5bb602e4]}, {Name: tag:Name,Values: [ew1-j4a-jenkins-slave-ec2]}, {Name: tag:DeploymentName,Values: [ew1-j4a]}, {Name: tag:CostCenter,Values: [31505]}, {Name: tag:DeploymentType,Values: [dev]}, {Name: tag:DeploymentGroup,Values: [ew1-j4a]}, {Name: tag:jenkins_server_url,Values: [https://acme.com/]}, {Name: tag:jenkins_slave_type,Values: [demand_ec2 (ami-02769bd03e603e42f)]}],InstanceIds: [],}

Sep 30, 2019 9:32:05 AM INFO hudson.plugins.ec2.CloudHelper getInstance

Unexpected number of reservations reported by EC2 for instance id 'i-0571c7e8c36a6b783', expected 1 result, found []. Instance seems to be dead.

Sep 30, 2019 9:32:05 AM WARNING hudson.plugins.ec2.EC2Cloud provision

SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Exception during provisioning
com.amazonaws.AmazonClientException: Unexpected number of reservations reported by EC2 for instance id 'i-0571c7e8c36a6b783', expected 1 result, found []. Instance seems to be dead.
	at hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:54)
	at hudson.plugins.ec2.CloudHelper.getInstanceWithRetry(CloudHelper.java:25)
	at hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:566)
	at hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:165)
	at hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:56)
	at hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:1104)
	at hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:773)
	at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:745)
	at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:585)
	at hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:540)
	at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:589)
	at hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:615)
	at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
	at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
	at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:62)
	at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:807)
	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Sep 30, 2019 9:32:13 AM INFO hudson.plugins.ec2.EC2Cloud provision

SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units

Sep 30, 2019 9:32:14 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave

SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Cannot provision - no capacity for instances: 0

Sep 30, 2019 9:32:14 AM WARNING hudson.plugins.ec2.EC2Cloud provision

Can't raise nodes for SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}

Sep 30, 2019 9:32:23 AM INFO hudson.plugins.ec2.EC2Cloud provision

SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units

Sep 30, 2019 9:32:24 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave

SlaveTemplate{ami='ami-02769bd03e603e42f', labels='docker docker-bakery'}. Cannot provision - no capacity for instances: 0

Jakub Bochenski added a comment - 2019-09-30 09:35 - edited I'm still seeing issues in 1.45. The instance is in running state but plugin can't see t be able to connect it. I can't test it on 1.46 because of JENKINS-59564 Sep 30, 2019 9:32:03 AM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Considering launching Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice AMI had xvda Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice {DeleteOnTermination: true ,SnapshotId: snap-0f2d5ab1c6f918116,VolumeSize: 8,VolumeType: gp2,Encrypted: false ,} Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate Sep 30, 2019 9:32:04 AM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Looking for existing instances with describe-instance: {Filters: [{Name: image-id,Values: [ami-02769bd03e603e42f]}, {Name: instance-type,Values: [t3.micro]}, {Name: key-name,Values: [j4a-bochja]}, {Name: subnet-id,Values: [subnet-0a167eb56a247e891]}, {Name: instance.group-id,Values: [sg-0a4f5e5ac5bb602e4]}, {Name: tag:Name,Values: [ew1-j4a-jenkins-slave-ec2]}, {Name: tag:DeploymentName,Values: [ew1-j4a]}, {Name: tag:CostCenter,Values: [31505]}, {Name: tag:DeploymentType,Values: [dev]}, {Name: tag:DeploymentGroup,Values: [ew1-j4a]}, {Name: tag:jenkins_server_url,Values: [https: //acme.com/]}, {Name: tag:jenkins_slave_type,Values: [demand_ec2 (ami-02769bd03e603e42f)]}],InstanceIds: [],} Sep 30, 2019 9:32:05 AM INFO hudson.plugins.ec2.CloudHelper getInstance Unexpected number of reservations reported by EC2 for instance id 'i-0571c7e8c36a6b783' , expected 1 result, found []. Instance seems to be dead. Sep 30, 2019 9:32:05 AM WARNING hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Exception during provisioning com.amazonaws.AmazonClientException: Unexpected number of reservations reported by EC2 for instance id 'i-0571c7e8c36a6b783' , expected 1 result, found []. Instance seems to be dead. at hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:54) at hudson.plugins.ec2.CloudHelper.getInstanceWithRetry(CloudHelper.java:25) at hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:566) at hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:165) at hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:56) at hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:1104) at hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:773) at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:745) at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:585) at hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:540) at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:589) at hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:615) at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715) at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320) at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:62) at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:807) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Sep 30, 2019 9:32:13 AM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Sep 30, 2019 9:32:14 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Cannot provision - no capacity for instances: 0 Sep 30, 2019 9:32:14 AM WARNING hudson.plugins.ec2.EC2Cloud provision Can 't raise nodes for SlaveTemplate{ami=' ami-02769bd03e603e42f ', labels=' docker docker-bakery'} Sep 30, 2019 9:32:23 AM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Sep 30, 2019 9:32:24 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave SlaveTemplate{ami= 'ami-02769bd03e603e42f' , labels= 'docker docker-bakery' }. Cannot provision - no capacity for instances: 0

Jakub Bochenski added a comment - 2020-02-24 13:25

We are still seeing issues with spawning instances after restart.
Do you plan to work on resolving this? thoulen raihaan

Jakub Bochenski added a comment - 2020-02-24 13:25 We are still seeing issues with spawning instances after restart. Do you plan to work on resolving this? thoulen raihaan

Raihaan Shouhell added a comment - 2020-02-25 08:40

Hey Jakub I haven't had any time to look into this, Do you have multiple clouds enabled? It seems weird that your instance can't be found with just the instance id. On top of that the instance is in the RUNNING state as you say

Raihaan Shouhell added a comment - 2020-02-25 08:40 Hey Jakub I haven't had any time to look into this, Do you have multiple clouds enabled? It seems weird that your instance can't be found with just the instance id. On top of that the instance is in the RUNNING state as you say

Jakub Bochenski added a comment - 2020-02-25 15:03 - edited

We have only 1 cloud active, you can find the groovy script to configure it above (I have disabled to ECS cloud to test this).
I'm on version 1.49.1 of plugin and Jenkins ver. 2.204.2.

I was able to reproduce this with just:

start Jenkins master instance
trigger job that will ask for the EC2-powered agent
wait for job to finish and stop Jenkins
start Jenkins again
trigger job that will ask for the EC2-powered agent, now it won't be able to hook the running instance again

The log output looks like this:

Feb 25, 2020 2:55:21 PM INFO hudson.plugins.ec2.EC2Cloud provision

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units

Feb 25, 2020 2:55:21 PM INFO com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory addProxyConfig

Configuring Proxy. Proxy Host: ew1-internal-proxy.services-ci-infra-services.fsapi.com Proxy Port: 8080

Feb 25, 2020 2:55:22 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Cannot provision - no capacity for instances: 0

Feb 25, 2020 2:55:22 PM WARNING hudson.plugins.ec2.EC2Cloud provision

Can't raise nodes for SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}

Feb 25, 2020 2:55:22 PM INFO hudson.plugins.ec2.EC2Cloud provision

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units

Feb 25, 2020 2:55:22 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Cannot provision - no capacity for instances: 0

Feb 25, 2020 2:55:22 PM WARNING hudson.plugins.ec2.EC2Cloud provision

Can't raise nodes for SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}

Feb 25, 2020 2:55:31 PM INFO hudson.plugins.ec2.EC2Cloud provision

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units

Feb 25, 2020 2:55:31 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Cannot provision - no capacity for instances: 0

Feb 25, 2020 2:55:31 PM WARNING hudson.plugins.ec2.EC2Cloud provision

Can't raise nodes for SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}

After I terminate the instance manually in EC2 Console the log is

Feb 25, 2020 3:18:01 PM INFO hudson.plugins.ec2.EC2Cloud provision

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units

Feb 25, 2020 3:18:01 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Considering launching

Feb 25, 2020 3:18:01 PM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice

AMI had xvda

Feb 25, 2020 3:18:01 PM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice

{DeleteOnTermination: true,SnapshotId: snap-040ff84a8c849bc15,VolumeSize: 8,VolumeType: gp2,Encrypted: false}

Feb 25, 2020 3:18:01 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate

Feb 25, 2020 3:18:02 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Looking for existing instances with describe-instance: {Filters: [{Name: image-id,Values: [ami-028d96c69234f9d1a]}, {Name: instance-type,Values: [t3.micro]}, {Name: key-name,Values: [j4a-ec2-ssh-key]}, {Name: subnet-id,Values: [subnet-0eeb1506910488624]}, {Name: instance.group-id,Values: [sg-092080629f55910cb]}, {Name: tag:Name,Values: [ew1-j4a-jenkins-slave-ec2]}, {Name: tag:jenkins_server_url,Values: [https://mastermaster.j4a.services-ci-infra-services.acme.com/]}, {Name: tag:DeploymentName,Values: [ew1-j4a]}, {Name: tag:Contact,Values: [production-cloud-unity@acme.com]}, {Name: tag:jenkins_slave_type,Values: [demand_ec2 (ami-028d96c69234f9d1a)]}, {Name: tag:CostCenter,Values: [31505]}, {Name: tag:DeploymentType,Values: [dev]}, {Name: tag:DeploymentGroup,Values: [ew1-j4a]}],InstanceIds: [],}

Feb 25, 2020 3:18:02 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. checkInstance: i-02af1b5b6a87aad30.. false - Instance is terminated or shutting down

Feb 25, 2020 3:18:03 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Return instance: {AmiLaunchIndex: 0,ImageId: ami-028d96c69234f9d1a,InstanceId: i-046afc69c32c1acdd,InstanceType: t3.micro,KeyName: j4a-ec2-ssh-key,LaunchTime: Tue Feb 25 15:18:03 UTC 2020,Monitoring: {State: disabled},Placement: {AvailabilityZone: eu-west-1a,GroupName: ,Tenancy: default,},PrivateDnsName: ip-10-20-4-66.eu-west-1.compute.internal,PrivateIpAddress: 10.20.4.66,ProductCodes: [],PublicDnsName: ,State: {Code: 0,Name: pending},StateTransitionReason: ,SubnetId: subnet-0eeb1506910488624,VpcId: vpc-0facb2e34ac58e041,Architecture: x86_64,BlockDeviceMappings: [],ClientToken: 152cf4fd-4f06-428c-9f18-60f681886744,EbsOptimized: false,Hypervisor: xen,ElasticGpuAssociations: [],ElasticInferenceAcceleratorAssociations: [],NetworkInterfaces: [{Attachment: {AttachTime: Tue Feb 25 15:18:03 UTC 2020,AttachmentId: eni-attach-05888fedc412b1b40,DeleteOnTermination: true,DeviceIndex: 0,Status: attaching},Description: ,Groups: [{GroupName: ew1-j4a-jenkins-sg-EC2SSlaveSecurityGroup-1C3WU317MF58C,GroupId: sg-092080629f55910cb}],Ipv6Addresses: [],MacAddress: 0a:eb:20:fd:16:60,NetworkInterfaceId: eni-07ced70b3de03e95f,OwnerId: 403015111228,PrivateDnsName: ip-10-20-4-66.eu-west-1.compute.internal,PrivateIpAddress: 10.20.4.66,PrivateIpAddresses: [{Primary: true,PrivateDnsName: ip-10-20-4-66.eu-west-1.compute.internal,PrivateIpAddress: 10.20.4.66}],SourceDestCheck: true,Status: in-use,SubnetId: subnet-0eeb1506910488624,VpcId: vpc-0facb2e34ac58e041,InterfaceType: interface}],RootDeviceName: xvda,RootDeviceType: ebs,SecurityGroups: [{GroupName: ew1-j4a-jenkins-sg-EC2SSlaveSecurityGroup-1C3WU317MF58C,GroupId: sg-092080629f55910cb}],SourceDestCheck: true,StateReason: {Code: pending,Message: pending},Tags: [{Key: jenkins_server_url,Value: https://mastermaster.j4a.services-ci-infra-services.fsapi.com/}, {Key: jenkins_slave_type,Value: demand_ec2 (ami-028d96c69234f9d1a)}, {Key: CostCenter,Value: 31505}, {Key: DeploymentName,Value: ew1-j4a}, {Key: DeploymentType,Value: dev}, {Key: Name,Value: ew1-j4a-jenkins-slave-ec2}, {Key: DeploymentGroup,Value: ew1-j4a}, {Key: Contact,Value: production-cloud-unity@f-secure.com}],VirtualizationType: hvm,CpuOptions: {CoreCount: 1,ThreadsPerCore: 2},CapacityReservationSpecification: {CapacityReservationPreference: open,},Licenses: [],MetadataOptions: {State: pending,HttpTokens: optional,HttpPutResponseHopLimit: 1,HttpEndpoint: enabled}}

Feb 25, 2020 3:18:03 PM INFO hudson.plugins.ec2.EC2Cloud provision

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Attempting provision finished, excess workload: 0

Feb 25, 2020 3:18:03 PM INFO hudson.plugins.ec2.EC2Cloud provision

We have now 1 computers, waiting for 1 more

Feb 25, 2020 3:18:18 PM INFO hudson.plugins.ec2.EC2Cloud$1 call

SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'} Node EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) (i-046afc69c32c1acdd) moved to RUNNING state in 15 seconds and is ready to be connected by Jenkins

Feb 25, 2020 3:18:21 PM INFO hudson.plugins.ec2.EC2RetentionStrategy start

Start requested for EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) (i-046afc69c32c1acdd)

Feb 25, 2020 3:18:21 PM INFO hudson.plugins.ec2.EC2Cloud log

Launching instance: i-046afc69c32c1acdd

Feb 25, 2020 3:18:21 PM INFO hudson.plugins.ec2.EC2Cloud log

bootstrap()

Feb 25, 2020 3:18:21 PM INFO hudson.plugins.ec2.EC2Cloud log

Getting keypair...

Jakub Bochenski added a comment - 2020-02-25 15:03 - edited We have only 1 cloud active, you can find the groovy script to configure it above (I have disabled to ECS cloud to test this). I'm on version 1.49.1 of plugin and Jenkins ver. 2.204.2. I was able to reproduce this with just: start Jenkins master instance trigger job that will ask for the EC2-powered agent wait for job to finish and stop Jenkins start Jenkins again trigger job that will ask for the EC2-powered agent, now it won't be able to hook the running instance again The log output looks like this: Feb 25, 2020 2:55:21 PM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Feb 25, 2020 2:55:21 PM INFO com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory addProxyConfig Configuring Proxy. Proxy Host: ew1-internal-proxy.services-ci-infra-services.fsapi.com Proxy Port: 8080 Feb 25, 2020 2:55:22 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Cannot provision - no capacity for instances: 0 Feb 25, 2020 2:55:22 PM WARNING hudson.plugins.ec2.EC2Cloud provision Can 't raise nodes for SlaveTemplate{ami=' ami-028d96c69234f9d1a ', labels=' docker docker-bakery'} Feb 25, 2020 2:55:22 PM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Feb 25, 2020 2:55:22 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Cannot provision - no capacity for instances: 0 Feb 25, 2020 2:55:22 PM WARNING hudson.plugins.ec2.EC2Cloud provision Can 't raise nodes for SlaveTemplate{ami=' ami-028d96c69234f9d1a ', labels=' docker docker-bakery'} Feb 25, 2020 2:55:31 PM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Feb 25, 2020 2:55:31 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Cannot provision - no capacity for instances: 0 Feb 25, 2020 2:55:31 PM WARNING hudson.plugins.ec2.EC2Cloud provision Can 't raise nodes for SlaveTemplate{ami=' ami-028d96c69234f9d1a ', labels=' docker docker-bakery'} After I terminate the instance manually in EC2 Console the log is Feb 25, 2020 3:18:01 PM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Attempting to provision slave needed by excess workload of 1 units Feb 25, 2020 3:18:01 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Considering launching Feb 25, 2020 3:18:01 PM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice AMI had xvda Feb 25, 2020 3:18:01 PM INFO hudson.plugins.ec2.SlaveTemplate setupRootDevice {DeleteOnTermination: true ,SnapshotId: snap-040ff84a8c849bc15,VolumeSize: 8,VolumeType: gp2,Encrypted: false } Feb 25, 2020 3:18:01 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate Feb 25, 2020 3:18:02 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Looking for existing instances with describe-instance: {Filters: [{Name: image-id,Values: [ami-028d96c69234f9d1a]}, {Name: instance-type,Values: [t3.micro]}, {Name: key-name,Values: [j4a-ec2-ssh-key]}, {Name: subnet-id,Values: [subnet-0eeb1506910488624]}, {Name: instance.group-id,Values: [sg-092080629f55910cb]}, {Name: tag:Name,Values: [ew1-j4a-jenkins-slave-ec2]}, {Name: tag:jenkins_server_url,Values: [https: //mastermaster.j4a.services-ci-infra-services.acme.com/]}, {Name: tag:DeploymentName,Values: [ew1-j4a]}, {Name: tag:Contact,Values: [production-cloud-unity@acme.com]}, {Name: tag:jenkins_slave_type,Values: [demand_ec2 (ami-028d96c69234f9d1a)]}, {Name: tag:CostCenter,Values: [31505]}, {Name: tag:DeploymentType,Values: [dev]}, {Name: tag:DeploymentGroup,Values: [ew1-j4a]}],InstanceIds: [],} Feb 25, 2020 3:18:02 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. checkInstance: i-02af1b5b6a87aad30.. false - Instance is terminated or shutting down Feb 25, 2020 3:18:03 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Return instance: {AmiLaunchIndex: 0,ImageId: ami-028d96c69234f9d1a,InstanceId: i-046afc69c32c1acdd,InstanceType: t3.micro,KeyName: j4a-ec2-ssh-key,LaunchTime: Tue Feb 25 15:18:03 UTC 2020,Monitoring: {State: disabled},Placement: {AvailabilityZone: eu-west-1a,GroupName: ,Tenancy: default ,},PrivateDnsName: ip-10-20-4-66.eu-west-1.compute.internal,PrivateIpAddress: 10.20.4.66,ProductCodes: [],PublicDnsName: ,State: {Code: 0,Name: pending},StateTransitionReason: ,SubnetId: subnet-0eeb1506910488624,VpcId: vpc-0facb2e34ac58e041,Architecture: x86_64,BlockDeviceMappings: [],ClientToken: 152cf4fd-4f06-428c-9f18-60f681886744,EbsOptimized: false ,Hypervisor: xen,ElasticGpuAssociations: [],ElasticInferenceAcceleratorAssociations: [],NetworkInterfaces: [{Attachment: {AttachTime: Tue Feb 25 15:18:03 UTC 2020,AttachmentId: eni-attach-05888fedc412b1b40,DeleteOnTermination: true ,DeviceIndex: 0,Status: attaching},Description: ,Groups: [{GroupName: ew1-j4a-jenkins-sg-EC2SSlaveSecurityGroup-1C3WU317MF58C,GroupId: sg-092080629f55910cb}],Ipv6Addresses: [],MacAddress: 0a:eb:20:fd:16:60,NetworkInterfaceId: eni-07ced70b3de03e95f,OwnerId: 403015111228,PrivateDnsName: ip-10-20-4-66.eu-west-1.compute.internal,PrivateIpAddress: 10.20.4.66,PrivateIpAddresses: [{Primary: true ,PrivateDnsName: ip-10-20-4-66.eu-west-1.compute.internal,PrivateIpAddress: 10.20.4.66}],SourceDestCheck: true ,Status: in-use,SubnetId: subnet-0eeb1506910488624,VpcId: vpc-0facb2e34ac58e041,InterfaceType: interface }],RootDeviceName: xvda,RootDeviceType: ebs,SecurityGroups: [{GroupName: ew1-j4a-jenkins-sg-EC2SSlaveSecurityGroup-1C3WU317MF58C,GroupId: sg-092080629f55910cb}],SourceDestCheck: true ,StateReason: {Code: pending,Message: pending},Tags: [{Key: jenkins_server_url,Value: https: //mastermaster.j4a.services-ci-infra-services.fsapi.com/}, {Key: jenkins_slave_type,Value: demand_ec2 (ami-028d96c69234f9d1a)}, {Key: CostCenter,Value: 31505}, {Key: DeploymentName,Value: ew1-j4a}, {Key: DeploymentType,Value: dev}, {Key: Name,Value: ew1-j4a-jenkins-slave-ec2}, {Key: DeploymentGroup,Value: ew1-j4a}, {Key: Contact,Value: production-cloud-unity@f-secure.com}],VirtualizationType: hvm,CpuOptions: {CoreCount: 1,ThreadsPerCore: 2},CapacityReservationSpecification: {CapacityReservationPreference: open,},Licenses: [],MetadataOptions: {State: pending,HttpTokens: optional,HttpPutResponseHopLimit: 1,HttpEndpoint: enabled}} Feb 25, 2020 3:18:03 PM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' }. Attempting provision finished, excess workload: 0 Feb 25, 2020 3:18:03 PM INFO hudson.plugins.ec2.EC2Cloud provision We have now 1 computers, waiting for 1 more Feb 25, 2020 3:18:18 PM INFO hudson.plugins.ec2.EC2Cloud$1 call SlaveTemplate{ami= 'ami-028d96c69234f9d1a' , labels= 'docker docker-bakery' } Node EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) (i-046afc69c32c1acdd) moved to RUNNING state in 15 seconds and is ready to be connected by Jenkins Feb 25, 2020 3:18:21 PM INFO hudson.plugins.ec2.EC2RetentionStrategy start Start requested for EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) (i-046afc69c32c1acdd) Feb 25, 2020 3:18:21 PM INFO hudson.plugins.ec2.EC2Cloud log Launching instance: i-046afc69c32c1acdd Feb 25, 2020 3:18:21 PM INFO hudson.plugins.ec2.EC2Cloud log bootstrap() Feb 25, 2020 3:18:21 PM INFO hudson.plugins.ec2.EC2Cloud log Getting keypair...

Jakub Bochenski added a comment - 2020-02-25 15:16 - edited

BTW I didn't think about it earlier, but shouldn't the plugin actually terminate the EC2 instance on Jenkins shutdown? Otherwise it could stay there indefinitely

Jakub Bochenski added a comment - 2020-02-25 15:16 - edited BTW I didn't think about it earlier, but shouldn't the plugin actually terminate the EC2 instance on Jenkins shutdown? Otherwise it could stay there indefinitely

Jakub Bochenski added a comment - 2020-02-25 15:34 - edited

Actually I've noticed another problematic thing. After I manually terminate the instance the plugin will spawn 4 new instances that will get terminated immediatelly before finally getting the fifth one up. I thought it's was a fluke at first but it seems to be reproducible consistently.
Log: https://gist.github.com/jakub-bochenski/c24b1f8e24e7be77aa2522df2c8caaed

It seems the plugin just terminates the instance for no reason:

Feb 25, 2020 3:18:30 PM INFO hudson.plugins.ec2.EC2Cloud log

Launching remoting agent (via Trilead SSH2 Connection):  java  -jar /tmp/remoting.jar -workDir /opt/jenkins

Feb 25, 2020 3:18:32 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate

Terminated EC2 instance (terminated): i-046afc69c32c1acdd

Feb 25, 2020 3:18:32 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate

Removed EC2 instance from jenkins master: i-046afc69c32c1acdd

Also notice this, despite instance cap=1

Feb 25, 2020 3:18:21 PM INFO hudson.slaves.NodeProvisioner lambda$update$6

EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) provisioning successfully completed. We have now 2 computer(s)

Jakub Bochenski added a comment - 2020-02-25 15:34 - edited Actually I've noticed another problematic thing. After I manually terminate the instance the plugin will spawn 4 new instances that will get terminated immediatelly before finally getting the fifth one up. I thought it's was a fluke at first but it seems to be reproducible consistently. Log: https://gist.github.com/jakub-bochenski/c24b1f8e24e7be77aa2522df2c8caaed It seems the plugin just terminates the instance for no reason: Feb 25, 2020 3:18:30 PM INFO hudson.plugins.ec2.EC2Cloud log Launching remoting agent (via Trilead SSH2 Connection): java -jar /tmp/remoting.jar -workDir /opt/jenkins Feb 25, 2020 3:18:32 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate Terminated EC2 instance (terminated): i-046afc69c32c1acdd Feb 25, 2020 3:18:32 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate Removed EC2 instance from jenkins master: i-046afc69c32c1acdd Also notice this, despite instance cap=1 Feb 25, 2020 3:18:21 PM INFO hudson.slaves.NodeProvisioner lambda$update$6 EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) provisioning successfully completed. We have now 2 computer(s)

Jakub Bochenski added a comment - 2020-02-25 15:54

I tried this a few more times. So far it's reproducible 100% (which is in a way good)

Jakub Bochenski added a comment - 2020-02-25 15:54 I tried this a few more times. So far it's reproducible 100% (which is in a way good)

Raihaan Shouhell added a comment - 2020-02-26 00:01

Plugin can't terminate all instances on shutdown / restart simply because that can cause shutdown to stall.

Do you have any logs from the retention strategy. I see in your logs that the instances were stopped but I'm not certain why.

Raihaan Shouhell added a comment - 2020-02-26 00:01 Plugin can't terminate all instances on shutdown / restart simply because that can cause shutdown to stall. Do you have any logs from the retention strategy. I see in your logs that the instances were stopped but I'm not certain why.

Jakub Bochenski added a comment - 2020-02-26 11:05 - edited

Do you have any logs from the retention strategy. I see in your logs that the instances were stopped but I'm not certain why.

I'm not sure what you mean. I have pasted all of the Jenkins log output here already.
Do you want me to enable DEBUG level logging for some components?

Jakub Bochenski added a comment - 2020-02-26 11:05 - edited Do you have any logs from the retention strategy. I see in your logs that the instances were stopped but I'm not certain why. I'm not sure what you mean. I have pasted all of the Jenkins log output here already. Do you want me to enable DEBUG level logging for some components?

Jakub Bochenski added a comment - 2020-03-06 15:36

raihaan I have filled a separate issue about the agents dying during launch as it happens independently of this issue.

Jakub Bochenski added a comment - 2020-03-06 15:36 raihaan I have filled a separate issue about the agents dying during launch as it happens independently of this issue.

Pierson Yieh added a comment - 2020-04-07 22:44 - edited

We've also seen this behavior before though we're not sure how to reproduce the problem. We saw it when we'd hit our max AWS request limit and Jenkins started losing track of nodes and couldn't spin up new ones cause the orphaned nodes were still being counted towards the max instance count, but weren't showing up in the Jenkins UI.

I'm able to "simulate" the "losing track of nodes" by running a groovy script on the Jenkins Master to manually remove the node for the Jenkins object. And we're looking into implementing a feature to automatically re-attach these orphaned nodes to Jenkins.

Update: seems the SlaveTemplate.checkInstance() finds our orphan nodes and were able to re-attach them to the Jenkins Master. Not sure why in the past they weren't getting re-attached.

Pierson Yieh added a comment - 2020-04-07 22:44 - edited We've also seen this behavior before though we're not sure how to reproduce the problem. We saw it when we'd hit our max AWS request limit and Jenkins started losing track of nodes and couldn't spin up new ones cause the orphaned nodes were still being counted towards the max instance count, but weren't showing up in the Jenkins UI. I'm able to "simulate" the "losing track of nodes" by running a groovy script on the Jenkins Master to manually remove the node for the Jenkins object. And we're looking into implementing a feature to automatically re-attach these orphaned nodes to Jenkins. Update: seems the SlaveTemplate.checkInstance() finds our orphan nodes and were able to re-attach them to the Jenkins Master. Not sure why in the past they weren't getting re-attached.

Jakub Bochenski added a comment - 2020-04-08 11:30

I was able to resolve my problem in the same way as described in https://issues.jenkins-ci.org/browse/JENKINS-61370?focusedCommentId=388247&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-388247

The swallowing of useful error output is a big issue that should be improved

Jakub Bochenski added a comment - 2020-04-08 11:30 I was able to resolve my problem in the same way as described in https://issues.jenkins-ci.org/browse/JENKINS-61370?focusedCommentId=388247&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-388247 The swallowing of useful error output is a big issue that should be improved

Pierson Yieh added a comment - 2020-04-08 20:02

jbochenski Was the problem that was solved the issue of orphan nodes not getting reconnected or agents dying during launch? Our issue is of orphan nodes not getting re-attached to their respective Jenkins Masters.

Pierson Yieh added a comment - 2020-04-08 20:02 jbochenski Was the problem that was solved the issue of orphan nodes not getting reconnected or agents dying during launch? Our issue is of orphan nodes not getting re-attached to their respective Jenkins Masters.

Pierson Yieh added a comment - 2020-04-14 19:39 - edited

We've identified the cause of our issue. The orphan re-attachment logic is tied the EC2Cloud's provision method. But the issue occurs when the actual number of existing AWS nodes has hit an instance cap (i.e. no more nodes can be provisioned). Because we've hit an instance cap, provisioning isn't even attempted and the orphan re-attachment logic isn't triggered. Submitted a PR here: https://github.com/jenkinsci/ec2-plugin/pull/448

Pierson Yieh added a comment - 2020-04-14 19:39 - edited We've identified the cause of our issue. The orphan re-attachment logic is tied the EC2Cloud's provision method. But the issue occurs when the actual number of existing AWS nodes has hit an instance cap (i.e. no more nodes can be provisioned). Because we've hit an instance cap, provisioning isn't even attempted and the orphan re-attachment logic isn't triggered. Submitted a PR here: https://github.com/jenkinsci/ec2-plugin/pull/448

Manoj added a comment - 2020-10-22 00:47

Hi, we still face the exact same issue with Jenkins(2.222.4), EC plugin(1.50.2.1). We face this issue mostly with windows instances which is built using the groovy EC2 config injected as an init script. Any update whether the fix is released or any timeline for it? Looking at this shows it is not yet released. An update on this would be helpful

Manoj added a comment - 2020-10-22 00:47 Hi, we still face the exact same issue with Jenkins(2.222.4), EC plugin(1.50.2.1). We face this issue mostly with windows instances which is built using the groovy EC2 config injected as an init script. Any update whether the fix is released or any timeline for it? Looking at this shows it is not yet released. An update on this would be helpful

Manoj added a comment - 2020-11-04 01:41

Today I tried 1.53 version and the issue is not resolved.

SlaveTemplate{ami='ami-038e073abe89730b3', labels='win2016dlp'}. Attempting to provision slave needed by excess workload of 1 units
Nov 04, 2020 12:36:01 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlaveSlaveTemplate{ami='ami-038e073abe89730b3', labels='win2016dlp'}. Cannot provision - no capacity for instances: 0
Nov 04, 2020 12:36:01 PM WARNING hudson.plugins.ec2.EC2Cloud provisionCan't raise nodes for SlaveTemplate{ami='ami-038e073abe89730b3', labels='win2016dlp'}

However, Jenkins identified the node and the node details screen shows "Launch Agent" Button. But the agent is not running.

Please note, this is a windows agent.

Manoj added a comment - 2020-11-04 01:41 Today I tried 1.53 version and the issue is not resolved. SlaveTemplate{ami= 'ami-038e073abe89730b3' , labels= 'win2016dlp' }. Attempting to provision slave needed by excess workload of 1 units Nov 04, 2020 12:36:01 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlaveSlaveTemplate{ami= 'ami-038e073abe89730b3' , labels= 'win2016dlp' }. Cannot provision - no capacity for instances: 0 Nov 04, 2020 12:36:01 PM WARNING hudson.plugins.ec2.EC2Cloud provisionCan 't raise nodes for SlaveTemplate{ami=' ami-038e073abe89730b3 ', labels=' win2016dlp'} However, Jenkins identified the node and the node details screen shows "Launch Agent" Button. But the agent is not running. Please note, this is a windows agent.

Raihaan Shouhell added a comment - 2020-11-08 11:24

manojtr the identification of the agent is what this ticket is about, so i will assume this is resolved could you open a new one sharing more details about your situation

Raihaan Shouhell added a comment - 2020-11-08 11:24 manojtr the identification of the agent is what this ticket is about, so i will assume this is resolved could you open a new one sharing more details about your situation

Manoj added a comment - 2020-11-16 00:30

raihaan but I think the description of this issue says exactly what I described above. Do I need to still open another ticket? Sorry I am confused, what you mean by the identification of the agent?

Manoj added a comment - 2020-11-16 00:30 raihaan but I think the description of this issue says exactly what I described above. Do I need to still open another ticket? Sorry I am confused, what you mean by the identification of the agent?

Jenkins

Details

Description

Attachments

Attachments

Issue Links

Activity

Collapse comment: Jakub Bochenski added a comment - 2019-06-25 13:42

Expand comment: Jakub Bochenski added a comment - 2019-06-25 13:42

Collapse comment: Jakub Bochenski added a comment - 2019-06-26 13:56

Expand comment: Jakub Bochenski added a comment - 2019-06-26 13:56

Collapse comment: FABRIZIO MANFREDI added a comment - 2019-06-26 15:02

Expand comment: FABRIZIO MANFREDI added a comment - 2019-06-26 15:02

Collapse comment: Jakub Bochenski added a comment - 2019-06-26 15:48, Edited by Jakub Bochenski - 2019-06-26 15:51

Expand comment: Jakub Bochenski added a comment - 2019-06-26 15:48, Edited by Jakub Bochenski - 2019-06-26 15:51

Collapse comment: Jakub Bochenski added a comment - 2019-06-27 10:47

Expand comment: Jakub Bochenski added a comment - 2019-06-27 10:47

Collapse comment: Jakub Bochenski added a comment - 2019-06-27 11:38

Expand comment: Jakub Bochenski added a comment - 2019-06-27 11:38

Collapse comment: Jakub Bochenski added a comment - 2019-06-27 11:40

Expand comment: Jakub Bochenski added a comment - 2019-06-27 11:40

Collapse comment: Jakub Bochenski added a comment - 2019-06-27 12:11

Expand comment: Jakub Bochenski added a comment - 2019-06-27 12:11

Collapse comment: Jakub Bochenski added a comment - 2019-07-04 13:35, Edited by Jakub Bochenski - 2019-07-04 13:36

Expand comment: Jakub Bochenski added a comment - 2019-07-04 13:35, Edited by Jakub Bochenski - 2019-07-04 13:36

Collapse comment: FABRIZIO MANFREDI added a comment - 2019-07-05 09:05

Expand comment: FABRIZIO MANFREDI added a comment - 2019-07-05 09:05

Collapse comment: Jakub Bochenski added a comment - 2019-07-05 12:36, Edited by Jakub Bochenski - 2019-07-05 12:38

Expand comment: Jakub Bochenski added a comment - 2019-07-05 12:36, Edited by Jakub Bochenski - 2019-07-05 12:38

Collapse comment: Jakub Bochenski added a comment - 2019-08-05 13:30

Expand comment: Jakub Bochenski added a comment - 2019-08-05 13:30

Collapse comment: FABRIZIO MANFREDI added a comment - 2019-08-10 19:11

Expand comment: FABRIZIO MANFREDI added a comment - 2019-08-10 19:11

Collapse comment: Jakub Bochenski added a comment - 2019-08-12 09:23

Expand comment: Jakub Bochenski added a comment - 2019-08-12 09:23

Collapse comment: Raihaan Shouhell added a comment - 2019-09-02 03:44

Expand comment: Raihaan Shouhell added a comment - 2019-09-02 03:44

Collapse comment: Jakub Bochenski added a comment - 2019-09-02 09:36

Expand comment: Jakub Bochenski added a comment - 2019-09-02 09:36

Collapse comment: Raihaan Shouhell added a comment - 2019-09-02 09:40

Expand comment: Raihaan Shouhell added a comment - 2019-09-02 09:40

Collapse comment: cedric lecoz added a comment - 2019-09-05 13:15, Edited by cedric lecoz - 2019-09-05 15:03

Expand comment: cedric lecoz added a comment - 2019-09-05 13:15, Edited by cedric lecoz - 2019-09-05 15:03

Collapse comment: Jakub Bochenski added a comment - 2019-09-05 14:10

Expand comment: Jakub Bochenski added a comment - 2019-09-05 14:10

Collapse comment: Raihaan Shouhell added a comment - 2019-09-12 02:46

Expand comment: Raihaan Shouhell added a comment - 2019-09-12 02:46

Collapse comment: cedric lecoz added a comment - 2019-09-12 11:28

Expand comment: cedric lecoz added a comment - 2019-09-12 11:28

Collapse comment: Raihaan Shouhell added a comment - 2019-09-13 06:52

Expand comment: Raihaan Shouhell added a comment - 2019-09-13 06:52

Collapse comment: cedric lecoz added a comment - 2019-09-15 08:48

Expand comment: cedric lecoz added a comment - 2019-09-15 08:48

Collapse comment: cedric lecoz added a comment - 2019-09-16 10:12

Expand comment: cedric lecoz added a comment - 2019-09-16 10:12

Collapse comment: Raihaan Shouhell added a comment - 2019-09-17 04:32, Edited by Raihaan Shouhell - 2019-09-17 04:32

Expand comment: Raihaan Shouhell added a comment - 2019-09-17 04:32, Edited by Raihaan Shouhell - 2019-09-17 04:32

Collapse comment: cedric lecoz added a comment - 2019-09-17 07:54

Expand comment: cedric lecoz added a comment - 2019-09-17 07:54

Collapse comment: cedric lecoz added a comment - 2019-09-18 09:27, Edited by cedric lecoz - 2019-09-18 09:29

Expand comment: cedric lecoz added a comment - 2019-09-18 09:27, Edited by cedric lecoz - 2019-09-18 09:29

Collapse comment: Raihaan Shouhell added a comment - 2019-09-18 09:29

Expand comment: Raihaan Shouhell added a comment - 2019-09-18 09:29

Collapse comment: cedric lecoz added a comment - 2019-09-19 11:32

Expand comment: cedric lecoz added a comment - 2019-09-19 11:32

Collapse comment: cedric lecoz added a comment - 2019-09-24 06:25

Expand comment: cedric lecoz added a comment - 2019-09-24 06:25

Collapse comment: cedric lecoz added a comment - 2019-09-26 10:41

Expand comment: cedric lecoz added a comment - 2019-09-26 10:41

Collapse comment: Jakub Bochenski added a comment - 2019-09-30 09:35, Edited by Jakub Bochenski - 2019-09-30 11:24

Expand comment: Jakub Bochenski added a comment - 2019-09-30 09:35, Edited by Jakub Bochenski - 2019-09-30 11:24

Collapse comment: Jakub Bochenski added a comment - 2020-02-24 13:25

Expand comment: Jakub Bochenski added a comment - 2020-02-24 13:25

Collapse comment: Raihaan Shouhell added a comment - 2020-02-25 08:40

Expand comment: Raihaan Shouhell added a comment - 2020-02-25 08:40

Collapse comment: Jakub Bochenski added a comment - 2020-02-25 15:03, Edited by Jakub Bochenski - 2020-02-25 15:20

Expand comment: Jakub Bochenski added a comment - 2020-02-25 15:03, Edited by Jakub Bochenski - 2020-02-25 15:20

Collapse comment: Jakub Bochenski added a comment - 2020-02-25 15:16, Edited by Jakub Bochenski - 2020-02-25 15:16

Expand comment: Jakub Bochenski added a comment - 2020-02-25 15:16, Edited by Jakub Bochenski - 2020-02-25 15:16