• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • ec2-plugin
    • Jenkins 2.34, Ubuntu 16.04 (updated)

      There is a initial set-up issue.
      I have set in the config an "Launch Timeout in seconds = 45".
      When the instance is started directly the ssh probing is started. When the system is booting there a time that the SSH is not yet started and when you ssh in to the server you get an ssh: connect to host ec2-xxx.eu-west-1.compute.amazonaws.com port 22: Connection refused. error.
      The plugin gives during the boot for some times a timeout but when the IP is reachable but SSH is not yet ready it trips over an unknown error and then stops. Leaving the server without a client running and jobs in the queue.
      I have manually re-launch the client connection to get it starting.

      INFO: Connecting to ec2-xxx.eu-west-1.compute.amazonaws.com on port 22, with timeout 10000.
      Dec 05, 2016 3:16:04 PM null
      INFO: Failed to connect via ssh: The kexTimeout (10000 ms) expired.
      Dec 05, 2016 3:16:04 PM null
      INFO: Waiting for SSH to come up. Sleeping 5.
      ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
      java.lang.NullPointerException
      	at hudson.plugins.ec2.ssh.EC2UnixLauncher.bootstrap(EC2UnixLauncher.java:309)
      	at hudson.plugins.ec2.ssh.EC2UnixLauncher.launch(EC2UnixLauncher.java:131)
      	at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:122)
      	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:261)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      Why is the config timeout not used so that the instance can boot normally before connecting?
      And why does this error stops and does not keeps trying to set-up the slave?
      This happens when you use the external ssh option.
      When you use the internal one it works fine.

          [JENKINS-40223] Unable to connect on ec2 launch

          Vojtech Mencl added a comment -

          The same issue for me in Jenkins 2.36. It is failing (randomly - sometimes it works) for both system ssh and internal ssh.

          Vojtech Mencl added a comment - The same issue for me in Jenkins 2.36. It is failing (randomly - sometimes it works) for both system ssh and internal ssh.

          Vojtech Mencl added a comment -

          Any news regarding this issue? I can't check in our environment, but I think it has something to do with the fact, that I am using hostname access instead of via ips .. the same as the error described above. It might be that the dns is not resolving the name as quickly as the plugin asks for. Name resolution should be run in some loop in case it fails .. there should be some threshold time.

          Vojtech Mencl added a comment - Any news regarding this issue? I can't check in our environment, but I think it has something to do with the fact, that I am using hostname access instead of via ips .. the same as the error described above. It might be that the dns is not resolving the name as quickly as the plugin asks for. Name resolution should be run in some loop in case it fails .. there should be some threshold time.

          Vojtech Mencl added a comment -

          I have probably found where the problem is. Just set Launch Timeout in seconds to 0. Problem is in src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java:

          line 321:

          if (timeout > 0 && waitTime > timeout) {
              throw new AmazonClientException("Timed out after " + (waitTime / 1000)
                  + " seconds of waiting for ssh to become available. (maximum timeout configured is "
                  + (timeout / 1000) + ")");
          }
          

          this exception is thrown when timeout is reached (that's why it should be set to 0 currently) and is not caught -> then it gets to the finally block on line 309:

          } finally {
              bootstrapConn.close();
          }
          

          Where bootstrapConn is null and NullPointerException is thrown ... francisu can you fix that?

           

          Vojtech Mencl added a comment - I have probably found where the problem is. Just set Launch Timeout in seconds to 0. Problem is in src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java: line 321: if (timeout > 0 && waitTime > timeout) { throw new AmazonClientException("Timed out after " + (waitTime / 1000) + " seconds of waiting for ssh to become available. (maximum timeout configured is " + (timeout / 1000) + ")"); } this exception is thrown when timeout is reached (that's why it should be set to 0 currently) and is not caught -> then it gets to the finally block on line 309: } finally { bootstrapConn.close(); } Where bootstrapConn is null and NullPointerException is thrown ... francisu can you fix that?  

          Code changed in jenkins
          User: Felix Belzunce Arcos
          Path:
          src/main/java/hudson/plugins/ec2/EC2ComputerLauncher.java
          src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java
          http://jenkins-ci.org/commit/ec2-plugin/7e8d576fc719e00a00db45bed730f462c60d0a65
          Log:
          [FIXED JENKINS-40223] Avoid NPE when ssh connection fails and terminate agents under any Exception (#234)

          • JENKINS-40223 Avoid NPE when ssh connection fails and terminate agents under any Exception before the idlePeriod happens.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Felix Belzunce Arcos Path: src/main/java/hudson/plugins/ec2/EC2ComputerLauncher.java src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java http://jenkins-ci.org/commit/ec2-plugin/7e8d576fc719e00a00db45bed730f462c60d0a65 Log: [FIXED JENKINS-40223] Avoid NPE when ssh connection fails and terminate agents under any Exception (#234) JENKINS-40223 Avoid NPE when ssh connection fails and terminate agents under any Exception before the idlePeriod happens. JENKINS-40223 Manage the NPE

            fbelzunc FĂ©lix Belzunce Arcos
            johansmits Johan Smits
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: