Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63261

EC2 Plugin: Linux Slave Terminates on java.io.EOFException: unexpected stream termination

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: ec2-plugin
    • Labels:
      None
    • Environment:
      Jenkins 2.249
      EC2 Plugin 1.50.3

      Master:
      Windows 2019 (also hosted on AWS)

      Slave (via SSH):
      AMI: Ubuntu Linux [ami-003634241a8fcdec0] (099720109477/ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20200408)
    • Similar Issues:

      Description

      After launching and successfully connecting to the linux slave, the connection gets terminated after attempting to launch remote agent.

       

      Logs:

      INFO: The SSH key ssh-ed25519 a0:c5:cc:b7:f8:3a:63:a5:58:92:a8:a8:b4:42:c9:44 has been successfully checked against the instance console for connections to EC2 (ubuntu-slave) - AWS Linux Jenkins Slave (i-0e5f54a8345c92453)
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: Connected via SSH.
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: connect fresh as root
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: Connecting to 10.27.252.14 on port 22, with timeout 10000.
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: Connection allowed after the host key has been verified
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: Connected via SSH.
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: Creating tmp directory (/tmp) if it does not exist
      Jul 31, 2020 1:11:37 AM hudson.plugins.ec2.EC2Cloud
      INFO: Verifying: java -fullversion
      bash: java: command not found
      Jul 31, 2020 1:11:37 AM hudson.plugins.ec2.EC2Cloud
      INFO: Installing: sudo yum install -y java-1.8.0-openjdk.x86_64
      sudo: yum: command not found
      Jul 31, 2020 1:11:38 AM hudson.plugins.ec2.EC2Cloud
      WARNING: Failed to install: sudo yum install -y java-1.8.0-openjdk.x86_64
      Jul 31, 2020 1:11:38 AM hudson.plugins.ec2.EC2Cloud
      INFO: Verifying: which scp
      /usr/bin/scp
      Jul 31, 2020 1:11:38 AM hudson.plugins.ec2.EC2Cloud
      INFO: Copying remoting.jar to: /tmp
      Jul 31, 2020 1:11:38 AM hudson.plugins.ec2.EC2Cloud
      INFO: Launching remoting agent (via Trilead SSH2 Connection):  java  -jar /tmp/remoting.jar -workDir /home/ubuntu/
      ERROR: unexpected stream termination
      java.io.EOFException: unexpected stream termination
      	at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:415)
      	at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:360)
      	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:428)
      	at hudson.plugins.ec2.ssh.EC2UnixLauncher.launchScript(EC2UnixLauncher.java:267)
      	at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:48)
      	at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:297)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

        Attachments

          Activity

          Hide
          brunoqueiros Bruno Queiros added a comment -

          Seems related to https://issues.jenkins-ci.org/browse/JENKINS-46837

          I wonder if someone is going to pick this up

          Show
          brunoqueiros Bruno Queiros added a comment - Seems related to  https://issues.jenkins-ci.org/browse/JENKINS-46837 I wonder if someone is going to pick this up
          Hide
          suthsc Scott Sutherland added a comment -

          Sam Shuzawa Would you happen to be using STS?

          Show
          suthsc Scott Sutherland added a comment - Sam Shuzawa Would you happen to be using STS?
          Hide
          sam31897 Sam Shuzawa added a comment -

          Scott Sutherland Nope, we're not

          Show
          sam31897 Sam Shuzawa added a comment - Scott Sutherland Nope, we're not
          Hide
          jstoddart Jamie Stoddart added a comment -

          Any updates on this, we are now getting this on a Jenkins upgrade:

          Jenkins - 2.249.2

          ec2-plugin - 1.51

          Show
          jstoddart Jamie Stoddart added a comment - Any updates on this, we are now getting this on a Jenkins upgrade: Jenkins - 2.249.2 ec2-plugin - 1.51
          Hide
          tsumardi thomas sumardi added a comment - - edited

          ok, we're having the same issue but with native SSH because it's the same code here: https://github.com/jenkinsci/ec2-plugin/blob/8702cc4a89bd7c096bba07211e222cde6d931f5d/src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java#L252

          what we need is perhaps retries. I believe the problem is that it checks for ssh connection but then it tries to do above section and failed but no retries.

          will below work?

          private static final String SSH_READINESS_SLEEP_MS = "jenkins.ec2.sshreadinessSleepMs";
          private static final String SSH_READINESS_TRIES= "jenkins.ec2.sshreadinessTries";
          
          ...
          
          private static int readinessSleepMs = 1000;
          private static int readinessTries = 10;static  {
              ...
              prop = System.getProperty(SSH_READINESS_TRIES);
              if (prop != null)
                  sshReadinessTries = Integer.parseInt(prop);
              prop = System.getProperty(SSH_READINESS_SLEEP_MS);
              if (prop != null)
                  sshReadinessSleepMs = Integer.parseInt(prop);
          }
          
          ...
          
          try
          {
            // use other
            int tries = sshReadinessTries;
            while (tries-- > 0) {
              if (slaveTemplate != null && slaveTemplate.isConnectBySSHProcess()) {
                  File identityKeyFile = createIdentityKeyFile(computer);        try {
                      // Obviously the master must have an installed ssh client.
                      // Depending on the strategy selected on the UI, we set the StrictHostKeyChecking flag
                      String sshClientLaunchString = String.format("ssh -o StrictHostKeyChecking=%s -i %s %s@%s -p %d %s", slaveTemplate.getHostKeyVerificationStrategy().getSshCommandEquivalentFlag(), identityKeyFile.getAbsolutePath(), node.remoteAdmin, getEC2HostAddress(computer, template), node.getSshPort(), launchString);            logInfo(computer, listener, "Launching remoting agent (via SSH client process): " + sshClientLaunchString);
                      CommandLauncher commandLauncher = new CommandLauncher(sshClientLaunchString, null);
                      commandLauncher.launch(computer, listener);
                  } finally {
                      if(!identityKeyFile.delete()) {
                          LOGGER.log(Level.WARNING, "Failed to delete identity key file");
                      }
                  }
              } else {
                  logInfo(computer, listener, "Launching remoting agent (via Trilead SSH2 Connection): " + launchString);
                  final Session sess = conn.openSession();
                  sess.execCommand(launchString);
                  computer.setChannel(sess.getStdout(), sess.getStdin(), logger, new Listener() {
                      @Override
                      public void onClosed(Channel channel, IOException cause) {
                          sess.close();
                          conn.close();
                      }
                  });
              }
            }
          } 
          catch (Exception e) {
            logInfo(computer, listener, "Node SSH still not ready, retrying..." );
            Thread.sleep(sshReadinessSleepMs);
          }
          
          Show
          tsumardi thomas sumardi added a comment - - edited ok, we're having the same issue but with native SSH because it's the same code here: https://github.com/jenkinsci/ec2-plugin/blob/8702cc4a89bd7c096bba07211e222cde6d931f5d/src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java#L252 what we need is perhaps retries. I believe the problem is that it checks for ssh connection but then it tries to do above section and failed but no retries. will below work? private static final String SSH_READINESS_SLEEP_MS = "jenkins.ec2.sshreadinessSleepMs" ; private static final String SSH_READINESS_TRIES= "jenkins.ec2.sshreadinessTries" ; ... private static int readinessSleepMs = 1000; private static int readinessTries = 10; static { ... prop = System .getProperty(SSH_READINESS_TRIES); if (prop != null ) sshReadinessTries = Integer .parseInt(prop); prop = System .getProperty(SSH_READINESS_SLEEP_MS); if (prop != null ) sshReadinessSleepMs = Integer .parseInt(prop); } ... try { // use other int tries = sshReadinessTries; while (tries-- > 0) { if (slaveTemplate != null && slaveTemplate.isConnectBySSHProcess()) { File identityKeyFile = createIdentityKeyFile(computer); try { // Obviously the master must have an installed ssh client. // Depending on the strategy selected on the UI, we set the StrictHostKeyChecking flag String sshClientLaunchString = String .format( "ssh -o StrictHostKeyChecking=%s -i %s %s@%s -p %d %s" , slaveTemplate.getHostKeyVerificationStrategy().getSshCommandEquivalentFlag(), identityKeyFile.getAbsolutePath(), node.remoteAdmin, getEC2HostAddress(computer, template), node.getSshPort(), launchString); logInfo(computer, listener, "Launching remoting agent (via SSH client process): " + sshClientLaunchString); CommandLauncher commandLauncher = new CommandLauncher(sshClientLaunchString, null ); commandLauncher.launch(computer, listener); } finally { if (!identityKeyFile.delete()) { LOGGER.log(Level.WARNING, "Failed to delete identity key file" ); } } } else { logInfo(computer, listener, "Launching remoting agent (via Trilead SSH2 Connection): " + launchString); final Session sess = conn.openSession(); sess.execCommand(launchString); computer.setChannel(sess.getStdout(), sess.getStdin(), logger, new Listener() { @Override public void onClosed(Channel channel, IOException cause) { sess.close(); conn.close(); } }); } } } catch (Exception e) { logInfo(computer, listener, "Node SSH still not ready, retrying..." ); Thread .sleep(sshReadinessSleepMs); }
          Hide
          tsumardi thomas sumardi added a comment - - edited

          this happens to us at the start not during build on 1.51. it failed launching the agent and then no retries also using "idle time" won't cleanup the offline nodes. for cleanup, going to try recent release of 1.55 with https://issues.jenkins.io/browse/JENKINS-61789 fix with "launch timeout" and/or "idle timeout" as the cleanup triggers. for this ticket case, I assume I can set "launch timeout" for cleanup, correct? this issue also hard to reproduce, we can only produce the issue when launching many builds over few hours time period, few will eventually failed.

          Show
          tsumardi thomas sumardi added a comment - - edited this happens to us at the start not during build on 1.51. it failed launching the agent and then no retries also using "idle time" won't cleanup the offline nodes. for cleanup, going to try recent release of 1.55 with https://issues.jenkins.io/browse/JENKINS-61789  fix with "launch timeout" and/or "idle timeout" as the cleanup triggers. for this ticket case, I assume I can set "launch timeout" for cleanup, correct? this issue also hard to reproduce, we can only produce the issue when launching many builds over few hours time period, few will eventually failed.
          Hide
          tsumardi thomas sumardi added a comment -

          upgraded ec2-plugin to 1.55 and been running for 3 days and seemed to cleans up offline nodes properly whenever disconnect happening compared to 1.51 using idle timeout. having retries would be nice but proper cleanup helps. 

          Show
          tsumardi thomas sumardi added a comment - upgraded ec2-plugin to 1.55 and been running for 3 days and seemed to cleans up offline nodes properly whenever disconnect happening compared to 1.51 using idle timeout. having retries would be nice but proper cleanup helps. 

            People

            Assignee:
            thoulen FABRIZIO MANFREDI
            Reporter:
            sam31897 Sam Shuzawa
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated: