Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-62939

Launching EC2 slaves is taking a very long time

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • ec2-plugin
    • None
    • Operating System: AWS EC2, Amazon Linux and Amazon Linux 2
      JRE/JDK: 8
      Jenkins version: 2.235.1 (LTS)
      Amazon EC2 plugin version: 1.50.3
      Amazon Web Services SDK version: 1.11.799

      We have been using the Amazon EC2 plugin for quite some time. In the past it would launch EC2 slaves very quickly. Now, it is taking a very long time. It appears to take over 7 minutes for each slave.

       

      We have lots of logs that look like:

      INFO: The instance EC2 (us-east-1) - Java 11 Slave (i-xyz) has a blank console. Maybe the console is yet not available. If enough time has passed, consider changing the key verification strategy or the AMI used by one printing out the host key in the instance console

      This seems to take long for both check-new-hard and check-new-soft slaves.

       

      I'm not sure how the EC2 plugin checks this. But, I did open the EC2 web console and performed "Actions->Instance Settings->Get System Log"

      I see lines like this a full 5 minutes before the EC2 plugin can verify the keys and connect:

       

      {{<14>Jul 2 21:09:05 ec2: }}
      <14>Jul 2 21:09:05 ec2: #############################################################
      <14>Jul 2 21:09:05 ec2: ----BEGIN SSH HOST KEY FINGERPRINTS----
      <14>Jul 2 21:09:05 ec2: 256 SHA256:XYZ no comment (ECDSA)
      <14>Jul 2 21:09:05 ec2: 256 SHA256:XYZ no comment (ED25519)
      <14>Jul 2 21:09:05 ec2: 2048 SHA256:XYZ no comment (RSA)
      <14>Jul 2 21:09:05 ec2: ----END SSH HOST KEY FINGERPRINTS----
      <14>Jul 2 21:09:05 ec2: #############################################################
      ----BEGIN SSH HOST KEY KEYS----
      ecdsa-sha2-nistp256 XYZ
      ssh-ed25519 XYZ
      ssh-rsa XYZ

       

      To be quite clear, the slaves do eventually connect. That is quite consistent. The issue here is the length of time has increased greatly.

          [JENKINS-62939] Launching EC2 slaves is taking a very long time

          For me it looks like the time between the initial "Looking for existing instances with describe-instance" and the actual launch takes around 2 minutes.

          The Looking for existing instances with describe-instance

          2020-08-24 11:47:15.823+0000 [id=15]	INFO	hudson.plugins.ec2.SlaveTemplate#logProvisionInfo: SlaveTemplate{ami='ami-08420dffedda7a5cd', labels='EC2-Crack'}. Looking for existing instances with describe-instance: {Filters: [{Name: image-id,Values: [ami-08420dffedda7a5cd]}, {Name: instance-type,Values: [g4dn.xlarge]}, {Name: key-name,Values: [Jenkins(upstairs)]}, {Name: subnet-id,Values: [subnet-01301a49c7ea7b6d8]}, {Name: instance.group-id,Values: [sg-0d3d199dfbfcf9588, sg-0d83883b5163f515d]}, {Name: tag:jenkins_slave_type,Values: [demand_EC2-CRACK]}, {Name: tag:jenkins_server_url,Values: [http://192.168.20.144/]}, {Name: tag:Name,Values: [EC2-Cracker-agent]}],InstanceIds: [],}
          

          The next line:

          2020-08-24 11:49:01.540+0000 [id=15]	INFO	hudson.plugins.ec2.SlaveTemplate#logProvisionInfo: SlaveTemplate{ami='ami-08420dffedda7a5cd', labels='EC2-Crack'}. There is no spot capacity available matching your request, falling back to on-demand instance.
          

          Aviad Raviv-Vash added a comment - For me it looks like the time between the initial "Looking for existing instances with describe-instance" and the actual launch takes around 2 minutes. The Looking for existing instances with describe-instance 2020-08-24 11:47:15.823+0000 [id=15] INFO hudson.plugins.ec2.SlaveTemplate#logProvisionInfo: SlaveTemplate{ami= 'ami-08420dffedda7a5cd' , labels= 'EC2-Crack' }. Looking for existing instances with describe-instance: {Filters: [{Name: image-id,Values: [ami-08420dffedda7a5cd]}, {Name: instance-type,Values: [g4dn.xlarge]}, {Name: key-name,Values: [Jenkins(upstairs)]}, {Name: subnet-id,Values: [subnet-01301a49c7ea7b6d8]}, {Name: instance.group-id,Values: [sg-0d3d199dfbfcf9588, sg-0d83883b5163f515d]}, {Name: tag:jenkins_slave_type,Values: [demand_EC2-CRACK]}, {Name: tag:jenkins_server_url,Values: [http: //192.168.20.144/]}, {Name: tag:Name,Values: [EC2-Cracker-agent]}],InstanceIds: [],} The next line: 2020-08-24 11:49:01.540+0000 [id=15] INFO hudson.plugins.ec2.SlaveTemplate#logProvisionInfo: SlaveTemplate{ami= 'ami-08420dffedda7a5cd' , labels= 'EC2-Crack' }. There is no spot capacity available matching your request, falling back to on-demand instance.

          We ran into the same issue a few months back and we started to use Stopped Agents as a workaround. The latest test i ran was with ec2 1.50.2.1 and on average it's still taking 5 more minutes to connect to EC2 (even after i see in AWS Console the agent is already running and status checks have passed).

          • Is there any plan to address this issue in the near future.
          • If not, do you know which prior ec2-plugin version will work - i am wondering how far we should roll back. dvenable aviadcye thoulen any thoughts on this.

          saurabh deshpande added a comment - We ran into the same issue a few months back and we started to use Stopped Agents as a workaround. The latest test i ran was with ec2 1.50.2.1 and on average it's still taking 5 more minutes to connect to EC2 (even after i see in AWS Console the agent is already running and status checks have passed). Is there any plan to address this issue in the near future. If not, do you know which prior ec2-plugin version will work - i am wondering how far we should roll back. dvenable aviadcye thoulen  any thoughts on this.

          Andy added a comment - - edited

          The documentation states that the 'Check New Hard' strategy uses the Instance Console to look for the SSH Host Key (source):

          This strategy checks the SSH host key provided by the instance with the key printed out in the instance console during the instance initialization

          The AWS EC2 documentation on the Instance Console mentions (source):

          The posted output is not continuously updated; only when it is likely to be of the most value.

          I can verify this by using the AWS SDK to retrieve the console output for an instance which has been recently started. When using the `GetConsoleOutputRequest` object (consistent with the ec2-plugin approach here) to retrieve the console output containing the host keys, the output is only available around 5 minutes after the instance has been started. To my understanding, that is the root cause for this defect.

          There is a `withLatest` function (source) on the `GetConsoleOutputRequest` object; I've tested and confirmed that the latest console output is immediately retrieved (bypassing this 5 minute wait) when this is used. However the `withLatest` function is only supported on EC2 Instances built on the Nitro System (source)] and an exception is thrown when attempting to read the 'latest' console of an Instance Type that is not Nitro based. Unfortunately I can't see a working way to identify whether or not the Instance Type is Nitro based using the AWS-SDK in order to conditionalise the use of this 'withLatest' functionality.

          I thought this information was worth sharing if useful to anybody else working on a fix for this.

          Andy added a comment - - edited The documentation states that the 'Check New Hard' strategy uses the Instance Console to look for the SSH Host Key ( source ): This strategy checks the SSH host key provided by the instance with the key printed out in the instance console during the instance initialization The AWS EC2 documentation on the Instance Console mentions ( source ): The posted output is not continuously updated; only when it is likely to be of the most value. I can verify this by using the AWS SDK to retrieve the console output for an instance which has been recently started. When using the `GetConsoleOutputRequest` object (consistent with the ec2-plugin approach  here ) to retrieve the console output containing the host keys, the output is only available around 5 minutes after the instance has been started. To my understanding, that is the root cause for this defect. There is a `withLatest` function ( source ) on the `GetConsoleOutputRequest` object; I've tested and confirmed that the latest console output is immediately retrieved (bypassing this 5 minute wait) when this is used. However the `withLatest` function is only supported on EC2 Instances built on the Nitro System ( source )] and an exception is thrown when attempting to read the 'latest' console of an Instance Type that is not Nitro based. Unfortunately I can't see a working way to identify whether or not the Instance Type is Nitro based using the AWS-SDK in order to conditionalise the use of this 'withLatest' functionality. I thought this information was worth sharing if useful to anybody else working on a fix for this.

          Mamadou Barry added a comment -

          We are facing a similar issue after upgrading to the last version

          Mamadou Barry added a comment - We are facing a similar issue after upgrading to the last version

          Mamadou Barry added a comment -

          When are we going to get a fix for this?

          Mamadou Barry added a comment - When are we going to get a fix for this?

          Patrik Boström added a comment - - edited

          andyshearer Thanks for your information, it was really helpful! Managed to figure out that it is possible to check if an instance has hypervisor Nitro by doing a DescribeInstanceTypes. 

          Opened a PR with a suggested fix:
          https://github.com/jenkinsci/ec2-plugin/pull/707

          It will just speedup for Nitro based instance tyoes with 'Check New Hard' strategy

          Patrik Boström added a comment - - edited andyshearer  Thanks for your information, it was really helpful! Managed to figure out that it is possible to check if an instance has hypervisor Nitro by doing a DescribeInstanceTypes.  Opened a PR with a suggested fix: https://github.com/jenkinsci/ec2-plugin/pull/707 It will just speedup for Nitro based instance tyoes with 'Check New Hard' strategy

            thoulen FABRIZIO MANFREDI
            dvenable David Venable
            Votes:
            7 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated: