Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53920

After upgrade to 1.40 the slaves that are stopped via stopOnTerminate are not started again

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Blocker
    • Resolution: Fixed
    • ec2-plugin
    • None
    • Jenkins 2.144 and ec2-plugin 1.40 or 1.40.1. Amazon region eu-west-1.

    Description

      The slave stops after idle time correctly. But then on a build starting, instead of starting the stopped slave, it creates a new slave instead. If the number of stopped slaves is the instance cap, then it can't start or create any slaves and the build hangs forever.

      Reverting to 1.39 everything works as expected.

      This are the logs on 1.39:

      No existing instance found - created new instance: {AmiLaunchIndex: 0,ImageId:
      ami-123123213123,InstanceId: i-087a3e1ca8c5fbbce,InstanceType:
      t3.medium,KeyName: developer,LaunchTime: Fri Oct 05 10:12:00 UTC
      2018,Monitoring: {State: disabled},Placement: {AvailabilityZone:
      eu-west-1a,GroupName: ,Tenancy: default,},PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0,ProductCodes: [],PublicDnsName: ,State: {Code: 0,Name:
      pending},StateTransitionReason: ,SubnetId: subnet-111111111,VpcId:
      vpc-22222222,Architecture: x86_64,BlockDeviceMappings: [],ClientToken:
      123,EbsOptimized: true,Hypervisor:
      xen,ElasticGpuAssociations: [],NetworkInterfaces: [{Attachment: {AttachTime: Fri
      Oct 05 10:12:00 UTC 2018,AttachmentId:
      eni-attach-123,DeleteOnTermination: true,DeviceIndex: 0,Status:
      attaching},Description: ,Groups: [{GroupName:
      system-dev-default-instances,GroupId: sg-111111111}],Ipv6Addresses:
      [],MacAddress: 12:21:12:12:12,NetworkInterfaceId:
      eni-123,OwnerId: 123123,PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0,PrivateIpAddresses: [{Primary: true,PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0}],SourceDestCheck: true,Status: in-use,SubnetId:
      subnet-111111111,VpcId: vpc-222222}],RootDeviceName:
      /dev/xvda,RootDeviceType: ebs,SecurityGroups: [{GroupName:
      system-dev-default-instances,GroupId: sg-111111111}],SourceDestCheck:
      true,StateReason: {Code: pending,Message: pending},Tags: [{Key:
      jenkins_slave_type,Value: demand_Slave}],VirtualizationType: hvm,CpuOptions:
      {CoreCount: 1,ThreadsPerCore: 2}}
      
      ----
      
      Found stopped instance - starting it: {AmiLaunchIndex: 0,ImageId:
      ami-123123213123,InstanceId: i-087a3e1ca8c5fbbce,InstanceType:
      t3.medium,KeyName: developer,LaunchTime: Fri Oct 05 10:12:00 UTC
      2018,Monitoring: {State: disabled},Placement: {AvailabilityZone:
      eu-west-1a,GroupName: ,Tenancy: default,},PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0,ProductCodes: [],PublicDnsName: ,State: {Code: 80,Name:
      stopped},StateTransitionReason: User initiated (2018-10-05 10:23:12
      GMT),SubnetId: subnet-111111111,VpcId:
      vpc-222222,Architecture: x86_64,BlockDeviceMappings: [{DeviceName:
      /dev/xvda,Ebs: {AttachTime: Fri Oct 05 10:12:01 UTC 2018,DeleteOnTermination:
      true,Status: attached,VolumeId: vol-0c6543588e4595d61}}],ClientToken:
      123,EbsOptimized: true,EnaSupport:
      true,Hypervisor: xen,ElasticGpuAssociations: [],NetworkInterfaces: [{Attachment:
      {AttachTime: Fri Oct 05 10:12:00 UTC 2018,AttachmentId:
      eni-attach-123,DeleteOnTermination: true,DeviceIndex: 0,Status:
      attached},Description: ,Groups: [{GroupName:
      system-dev-default-instances,GroupId: sg-111111111}],Ipv6Addresses:
      [],MacAddress: 12:21:12:12:12,NetworkInterfaceId:
      eni-123,OwnerId: 123123,PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0,PrivateIpAddresses: [{Primary: true,PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0}],SourceDestCheck: true,Status: in-use,SubnetId:
      subnet-111111111,VpcId: vpc-222222}],RootDeviceName:
      /dev/xvda,RootDeviceType: ebs,SecurityGroups: [{GroupName:
      system-dev-default-instances,GroupId: sg-111111111}],SourceDestCheck:
      true,StateReason: {Code: Client.UserInitiatedShutdown,Message:
      Client.UserInitiatedShutdown: User initiated shutdown},Tags: [{Key:
      jenkins_slave_type,Value: demand_Slave}],VirtualizationType: hvm,CpuOptions:
      {CoreCount: 1,ThreadsPerCore: 2}} result:{StartingInstances: [{CurrentState:
      {Code: 0,Name: pending},InstanceId: i-087a3e1ca8c5fbbce,PreviousState: {Code:
      80,Name: stopped}}]}
      
      

      As you can see 1.39 prints "Found stopped instance".
      On 1.40 I get this:

      SlaveTemplate{ami='ami-123123213123', labels='ec2'}. Looking for existing
      instances with describe-instance: {Filters: [{Name: image-id,Values:
      [ami-123123213123]}, {Name: instance-type,Values: [t3.medium]}, {Name:
      key-name,Values: [developer]}, {Name: availability-zone,Values: [eu-west-1a]},
      {Name: subnet-id,Values: [subnet-111111111]}, {Name:
      instance.group-id,Values: [sg-111111111]}, {Name:
      tag:jenkins_slave_type,Values: [demand_Slave]}],InstanceIds: [],}
      
      SlaveTemplate{ami='ami-123123213123', labels='ec2'}. Return instance:
      {AmiLaunchIndex: 0,ImageId: ami-123123213123,InstanceId:
      i-0adb33f8b01edbeb9,InstanceType: t3.medium,KeyName: developer,LaunchTime: Fri
      Oct 05 11:33:38 UTC 2018,Monitoring: {State: disabled},Placement:
      {AvailabilityZone: eu-west-1a,GroupName: ,Tenancy: default,},PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0,ProductCodes: [],PublicDnsName: ,State: {Code: 0,Name:
      pending},StateTransitionReason: ,SubnetId: subnet-111111111,VpcId:
      vpc-222222,Architecture: x86_64,BlockDeviceMappings: [],ClientToken:
      123,EbsOptimized: true,Hypervisor:
      xen,ElasticGpuAssociations: [],NetworkInterfaces: [{Attachment: {AttachTime: Fri
      Oct 05 11:33:38 UTC 2018,AttachmentId:
      eni-attach-03d008e758569bfb0,DeleteOnTermination: true,DeviceIndex: 0,Status:
      attaching},Description: ,Groups: [{GroupName:
      system-dev-default-instances,GroupId: sg-111111111}],Ipv6Addresses:
      [],MacAddress: 12:21:12:12:12,NetworkInterfaceId:
      eni-123,OwnerId: 123123,PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0,PrivateIpAddresses: [{Primary: true,PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0}],SourceDestCheck: true,Status: in-use,SubnetId:
      subnet-111111111,VpcId: vpc-222222}],RootDeviceName:
      /dev/xvda,RootDeviceType: ebs,SecurityGroups: [{GroupName:
      system-dev-default-instances,GroupId: sg-111111111}],SourceDestCheck:
      true,StateReason: {Code: pending,Message: pending},Tags: [{Key:
      jenkins_slave_type,Value: demand_Slave}],VirtualizationType: hvm,CpuOptions:
      {CoreCount: 1,ThreadsPerCore: 2}}
      
      ---
      
      SlaveTemplate{ami='ami-123123213123', labels='ec2'}. Looking for existing
      instances with describe-instance: {Filters: [{Name: image-id,Values:
      [ami-123123213123]}, {Name: instance-type,Values: [t3.medium]}, {Name:
      key-name,Values: [developer]}, {Name: availability-zone,Values: [eu-west-1a]},
      {Name: subnet-id,Values: [subnet-111111111]}, {Name:
      instance.group-id,Values: [sg-111111111]}, {Name:
      tag:jenkins_slave_type,Values: [demand_Slave]}],InstanceIds: [],}
      
      SlaveTemplate{ami='ami-123123213123', labels='ec2'}. checkInstance:
      i-0adb33f8b01edbeb9.. false - found existing corresponding Jenkins slave:
      i-0adb33f8b01edbeb9
      
      SlaveTemplate{ami='ami-123123213123', labels='ec2'}. Return instance:
      {AmiLaunchIndex: 0,ImageId: ami-123123213123,InstanceId:
      i-0640e006ce9d0f32f,InstanceType: t3.medium,KeyName: developer,LaunchTime: Fri
      Oct 05 11:45:34 UTC 2018,Monitoring: {State: disabled},Placement:
      {AvailabilityZone: eu-west-1a,GroupName: ,Tenancy: default,},PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0,ProductCodes: [],PublicDnsName: ,State: {Code: 0,Name:
      pending},StateTransitionReason: ,SubnetId: subnet-111111111,VpcId:
      vpc-222222,Architecture: x86_64,BlockDeviceMappings: [],ClientToken:
      123,EbsOptimized: true,Hypervisor:
      xen,ElasticGpuAssociations: [],NetworkInterfaces: [{Attachment: {AttachTime: Fri
      Oct 05 11:45:34 UTC 2018,AttachmentId:
      eni-attach-123,DeleteOnTermination: true,DeviceIndex: 0,Status:
      attaching},Description: ,Groups: [{GroupName:
      system-dev-default-instances,GroupId: sg-111111111}],Ipv6Addresses:
      [],MacAddress: 12:21:12:12:12,NetworkInterfaceId:
      eni-123,OwnerId: 123123,PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0,PrivateIpAddresses: [{Primary: true,PrivateDnsName:
      ip-10-0-0-0.eu-west-1.compute.internal,PrivateIpAddress:
      10.0.0.0}],SourceDestCheck: true,Status: in-use,SubnetId:
      subnet-111111111,VpcId: vpc-222222}],RootDeviceName:
      /dev/xvda,RootDeviceType: ebs,SecurityGroups: [{GroupName:
      system-dev-default-instances,GroupId: sg-111111111}],SourceDestCheck:
      true,StateReason: {Code: pending,Message: pending},Tags: [{Key:
      jenkins_slave_type,Value: demand_Slave}],VirtualizationType: hvm,CpuOptions:
      {CoreCount: 1,ThreadsPerCore: 2}}
      
      

      When it says "Return instance" above, it returns a new instance, instead of the one stopped.

      I am still going to go through the changes on Github and see if I can understand where this regression was introduced.

      PS: I changed IDs on logs to hide amazon details

      Attachments

        Issue Links

          Activity

            michelzanini Michel Zanini added a comment -

            thoulen,
            I have added the permission and tested.
            Still, the 1.41-SNAPSHOT build fails to start the slave, this time it goes in a loop with this:

            Nov 06, 2018 11:42:02 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
            SlaveTemplate{ami='ami-0773a0bac9a5fb3bb', labels='ec2'}. Cannot provision - no capacity for instances: 0
            Nov 06, 2018 11:42:02 AM WARNING hudson.plugins.ec2.EC2Cloud provision
            Can't raise nodes for SlaveTemplate{ami='ami-0773a0bac9a5fb3bb', labels='ec2'}
            
            michelzanini Michel Zanini added a comment - thoulen , I have added the permission and tested. Still, the 1.41-SNAPSHOT build fails to start the slave, this time it goes in a loop with this: Nov 06, 2018 11:42:02 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave SlaveTemplate{ami= 'ami-0773a0bac9a5fb3bb' , labels= 'ec2' }. Cannot provision - no capacity for instances: 0 Nov 06, 2018 11:42:02 AM WARNING hudson.plugins.ec2.EC2Cloud provision Can 't raise nodes for SlaveTemplate{ami=' ami-0773a0bac9a5fb3bb ', labels=' ec2'}
            michelzanini Michel Zanini added a comment -

            I also tested with 1.41 released version.
            I had the same problem:

            Cannot provision - no capacity for instances: 0
            

            This is happening due to the fact I have instance cap of 2 and already 2 slaves are created and at STOPPED state. It still tries to create another instead of starting the ones STOPPED. Increasing the cap to 3 make a new box be created.
            I am not sure what I am doing wrong since same thing works on 1.39.

            This is my role permissions:

            {
                "Version": "2012-10-17",
                "Statement": [
                    {
                        "Sid": "VisualEditor",
                        "Effect": "Allow",
                        "Action": [
                            "ec2:DescribeSpotInstanceRequests",
                            "ec2:CancelSpotInstanceRequests",
                            "ec2:GetConsoleOutput",
                            "ec2:RequestSpotInstances",
                            "ec2:RunInstances",
                            "ec2:StartInstances",
                            "ec2:StopInstances",
                            "ec2:TerminateInstances",
                            "ec2:CreateTags",
                            "ec2:DeleteTags",
                            "ec2:DescribeInstances",
                            "ec2:DescribeKeyPairs",
                            "ec2:DescribeRegions",
                            "ec2:DescribeImages",
                            "ec2:DescribeAvailabilityZones",
                            "ec2:DescribeSecurityGroups",
                            "ec2:DescribeSubnets",
                            "iam:PassRole",
                            "iam:ListInstanceProfilesForRole"
                        ],
                        "Resource": "*"
                    }
                ]
            }
            
            michelzanini Michel Zanini added a comment - I also tested with 1.41 released version. I had the same problem: Cannot provision - no capacity for instances: 0 This is happening due to the fact I have instance cap of 2 and already 2 slaves are created and at STOPPED state. It still tries to create another instead of starting the ones STOPPED. Increasing the cap to 3 make a new box be created. I am not sure what I am doing wrong since same thing works on 1.39. This is my role permissions: { "Version" : "2012-10-17" , "Statement" : [ { "Sid" : "VisualEditor" , "Effect" : "Allow" , "Action" : [ "ec2:DescribeSpotInstanceRequests" , "ec2:CancelSpotInstanceRequests" , "ec2:GetConsoleOutput" , "ec2:RequestSpotInstances" , "ec2:RunInstances" , "ec2:StartInstances" , "ec2:StopInstances" , "ec2:TerminateInstances" , "ec2:CreateTags" , "ec2:DeleteTags" , "ec2:DescribeInstances" , "ec2:DescribeKeyPairs" , "ec2:DescribeRegions" , "ec2:DescribeImages" , "ec2:DescribeAvailabilityZones" , "ec2:DescribeSecurityGroups" , "ec2:DescribeSubnets" , "iam:PassRole" , "iam:ListInstanceProfilesForRole" ], "Resource" : "*" } ] }

            Yes, that is a bug in the instance cap with the Stopped instances that are counted as a running machine. I will try to put the fix in the 1.42 

            thoulen FABRIZIO MANFREDI added a comment - Yes, that is a bug in the instance cap with the Stopped instances that are counted as a running machine. I will try to put the fix in the 1.42 
            paulbovbel Paul Bovbel added a comment -

            Did this make it into 1.42? still a blocking issue.

            paulbovbel Paul Bovbel added a comment - Did this make it into 1.42? still a blocking issue.

            i think so. for us it is working again with the latest version.

            tuky Tobias Krönke added a comment - i think so. for us it is working again with the latest version.

            People

              thoulen FABRIZIO MANFREDI
              michelzanini Michel Zanini
              Votes:
              9 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: