Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63261

EC2 Plugin: Linux Slave Terminates on java.io.EOFException: unexpected stream termination

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • ec2-plugin
    • None
    • Jenkins 2.249
      EC2 Plugin 1.50.3

      Master:
      Windows 2019 (also hosted on AWS)

      Slave (via SSH):
      AMI: Ubuntu Linux [ami-003634241a8fcdec0] (099720109477/ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20200408)

      After launching and successfully connecting to the linux slave, the connection gets terminated after attempting to launch remote agent.

       

      Logs:

      INFO: The SSH key ssh-ed25519 a0:c5:cc:b7:f8:3a:63:a5:58:92:a8:a8:b4:42:c9:44 has been successfully checked against the instance console for connections to EC2 (ubuntu-slave) - AWS Linux Jenkins Slave (i-0e5f54a8345c92453)
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: Connected via SSH.
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: connect fresh as root
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: Connecting to 10.27.252.14 on port 22, with timeout 10000.
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: Connection allowed after the host key has been verified
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: Connected via SSH.
      Jul 31, 2020 1:11:34 AM hudson.plugins.ec2.EC2Cloud
      INFO: Creating tmp directory (/tmp) if it does not exist
      Jul 31, 2020 1:11:37 AM hudson.plugins.ec2.EC2Cloud
      INFO: Verifying: java -fullversion
      bash: java: command not found
      Jul 31, 2020 1:11:37 AM hudson.plugins.ec2.EC2Cloud
      INFO: Installing: sudo yum install -y java-1.8.0-openjdk.x86_64
      sudo: yum: command not found
      Jul 31, 2020 1:11:38 AM hudson.plugins.ec2.EC2Cloud
      WARNING: Failed to install: sudo yum install -y java-1.8.0-openjdk.x86_64
      Jul 31, 2020 1:11:38 AM hudson.plugins.ec2.EC2Cloud
      INFO: Verifying: which scp
      /usr/bin/scp
      Jul 31, 2020 1:11:38 AM hudson.plugins.ec2.EC2Cloud
      INFO: Copying remoting.jar to: /tmp
      Jul 31, 2020 1:11:38 AM hudson.plugins.ec2.EC2Cloud
      INFO: Launching remoting agent (via Trilead SSH2 Connection):  java  -jar /tmp/remoting.jar -workDir /home/ubuntu/
      ERROR: unexpected stream termination
      java.io.EOFException: unexpected stream termination
      	at hudson.remoting.ChannelBuilder.negotiate(ChannelBuilder.java:415)
      	at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:360)
      	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:428)
      	at hudson.plugins.ec2.ssh.EC2UnixLauncher.launchScript(EC2UnixLauncher.java:267)
      	at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:48)
      	at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:297)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

          [JENKINS-63261] EC2 Plugin: Linux Slave Terminates on java.io.EOFException: unexpected stream termination

          Bruno Queiros added a comment -

          Seems related to https://issues.jenkins-ci.org/browse/JENKINS-46837

          I wonder if someone is going to pick this up

          Bruno Queiros added a comment - Seems related to  https://issues.jenkins-ci.org/browse/JENKINS-46837 I wonder if someone is going to pick this up

          sam31897 Would you happen to be using STS?

          Scott Sutherland added a comment - sam31897 Would you happen to be using STS?

          Sam Shuzawa added a comment -

          suthsc Nope, we're not

          Sam Shuzawa added a comment - suthsc Nope, we're not

          Any updates on this, we are now getting this on a Jenkins upgrade:

          Jenkins - 2.249.2

          ec2-plugin - 1.51

          Jamie Stoddart added a comment - Any updates on this, we are now getting this on a Jenkins upgrade: Jenkins - 2.249.2 ec2-plugin - 1.51

          thomas sumardi added a comment - - edited

          ok, we're having the same issue but with native SSH because it's the same code here: https://github.com/jenkinsci/ec2-plugin/blob/8702cc4a89bd7c096bba07211e222cde6d931f5d/src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java#L252

          what we need is perhaps retries. I believe the problem is that it checks for ssh connection but then it tries to do above section and failed but no retries.

          will below work?

          private static final String SSH_READINESS_SLEEP_MS = "jenkins.ec2.sshreadinessSleepMs";
          private static final String SSH_READINESS_TRIES= "jenkins.ec2.sshreadinessTries";
          
          ...
          
          private static int readinessSleepMs = 1000;
          private static int readinessTries = 10;static  {
              ...
              prop = System.getProperty(SSH_READINESS_TRIES);
              if (prop != null)
                  sshReadinessTries = Integer.parseInt(prop);
              prop = System.getProperty(SSH_READINESS_SLEEP_MS);
              if (prop != null)
                  sshReadinessSleepMs = Integer.parseInt(prop);
          }
          
          ...
          
          try
          {
            // use other
            int tries = sshReadinessTries;
            while (tries-- > 0) {
              if (slaveTemplate != null && slaveTemplate.isConnectBySSHProcess()) {
                  File identityKeyFile = createIdentityKeyFile(computer);        try {
                      // Obviously the master must have an installed ssh client.
                      // Depending on the strategy selected on the UI, we set the StrictHostKeyChecking flag
                      String sshClientLaunchString = String.format("ssh -o StrictHostKeyChecking=%s -i %s %s@%s -p %d %s", slaveTemplate.getHostKeyVerificationStrategy().getSshCommandEquivalentFlag(), identityKeyFile.getAbsolutePath(), node.remoteAdmin, getEC2HostAddress(computer, template), node.getSshPort(), launchString);            logInfo(computer, listener, "Launching remoting agent (via SSH client process): " + sshClientLaunchString);
                      CommandLauncher commandLauncher = new CommandLauncher(sshClientLaunchString, null);
                      commandLauncher.launch(computer, listener);
                  } finally {
                      if(!identityKeyFile.delete()) {
                          LOGGER.log(Level.WARNING, "Failed to delete identity key file");
                      }
                  }
              } else {
                  logInfo(computer, listener, "Launching remoting agent (via Trilead SSH2 Connection): " + launchString);
                  final Session sess = conn.openSession();
                  sess.execCommand(launchString);
                  computer.setChannel(sess.getStdout(), sess.getStdin(), logger, new Listener() {
                      @Override
                      public void onClosed(Channel channel, IOException cause) {
                          sess.close();
                          conn.close();
                      }
                  });
              }
            }
          } 
          catch (Exception e) {
            logInfo(computer, listener, "Node SSH still not ready, retrying..." );
            Thread.sleep(sshReadinessSleepMs);
          }
          

          thomas sumardi added a comment - - edited ok, we're having the same issue but with native SSH because it's the same code here: https://github.com/jenkinsci/ec2-plugin/blob/8702cc4a89bd7c096bba07211e222cde6d931f5d/src/main/java/hudson/plugins/ec2/ssh/EC2UnixLauncher.java#L252 what we need is perhaps retries. I believe the problem is that it checks for ssh connection but then it tries to do above section and failed but no retries. will below work? private static final String SSH_READINESS_SLEEP_MS = "jenkins.ec2.sshreadinessSleepMs" ; private static final String SSH_READINESS_TRIES= "jenkins.ec2.sshreadinessTries" ; ... private static int readinessSleepMs = 1000; private static int readinessTries = 10; static { ... prop = System .getProperty(SSH_READINESS_TRIES); if (prop != null ) sshReadinessTries = Integer .parseInt(prop); prop = System .getProperty(SSH_READINESS_SLEEP_MS); if (prop != null ) sshReadinessSleepMs = Integer .parseInt(prop); } ... try { // use other int tries = sshReadinessTries; while (tries-- > 0) { if (slaveTemplate != null && slaveTemplate.isConnectBySSHProcess()) { File identityKeyFile = createIdentityKeyFile(computer); try { // Obviously the master must have an installed ssh client. // Depending on the strategy selected on the UI, we set the StrictHostKeyChecking flag String sshClientLaunchString = String .format( "ssh -o StrictHostKeyChecking=%s -i %s %s@%s -p %d %s" , slaveTemplate.getHostKeyVerificationStrategy().getSshCommandEquivalentFlag(), identityKeyFile.getAbsolutePath(), node.remoteAdmin, getEC2HostAddress(computer, template), node.getSshPort(), launchString); logInfo(computer, listener, "Launching remoting agent (via SSH client process): " + sshClientLaunchString); CommandLauncher commandLauncher = new CommandLauncher(sshClientLaunchString, null ); commandLauncher.launch(computer, listener); } finally { if (!identityKeyFile.delete()) { LOGGER.log(Level.WARNING, "Failed to delete identity key file" ); } } } else { logInfo(computer, listener, "Launching remoting agent (via Trilead SSH2 Connection): " + launchString); final Session sess = conn.openSession(); sess.execCommand(launchString); computer.setChannel(sess.getStdout(), sess.getStdin(), logger, new Listener() { @Override public void onClosed(Channel channel, IOException cause) { sess.close(); conn.close(); } }); } } } catch (Exception e) { logInfo(computer, listener, "Node SSH still not ready, retrying..." ); Thread .sleep(sshReadinessSleepMs); }

          thomas sumardi added a comment - - edited

          this happens to us at the start not during build on 1.51. it failed launching the agent and then no retries also using "idle time" won't cleanup the offline nodes. for cleanup, going to try recent release of 1.55 with https://issues.jenkins.io/browse/JENKINS-61789 fix with "launch timeout" and/or "idle timeout" as the cleanup triggers. for this ticket case, I assume I can set "launch timeout" for cleanup, correct? this issue also hard to reproduce, we can only produce the issue when launching many builds over few hours time period, few will eventually failed.

          thomas sumardi added a comment - - edited this happens to us at the start not during build on 1.51. it failed launching the agent and then no retries also using "idle time" won't cleanup the offline nodes. for cleanup, going to try recent release of 1.55 with https://issues.jenkins.io/browse/JENKINS-61789  fix with "launch timeout" and/or "idle timeout" as the cleanup triggers. for this ticket case, I assume I can set "launch timeout" for cleanup, correct? this issue also hard to reproduce, we can only produce the issue when launching many builds over few hours time period, few will eventually failed.

          upgraded ec2-plugin to 1.55 and been running for 3 days and seemed to cleans up offline nodes properly whenever disconnect happening compared to 1.51 using idle timeout. having retries would be nice but proper cleanup helps. 

          thomas sumardi added a comment - upgraded ec2-plugin to 1.55 and been running for 3 days and seemed to cleans up offline nodes properly whenever disconnect happening compared to 1.51 using idle timeout. having retries would be nice but proper cleanup helps. 

          Harry King added a comment -

          In the above log, you can see java is not installed (or not in the path) -
          java: command not found
          You can then see the script attempts to fix using
          INFO: Installing: sudo yum install -y java-1.8.0-openjdk.x86_64
          sudo: yum: command not found
          Which is unlikely to work because the above instance is Ubuntu based and doesn't ship with yum.

          This can be remedied by either specifying a custom tool location for the JDK, adding a tool installer which is apt based or by including a JDK in the AMI.

          Harry King added a comment - In the above log, you can see java is not installed (or not in the path) - java: command not found You can then see the script attempts to fix using INFO: Installing: sudo yum install -y java-1.8.0-openjdk.x86_64 sudo: yum: command not found Which is unlikely to work because the above instance is Ubuntu based and doesn't ship with yum. This can be remedied by either specifying a custom tool location for the JDK, adding a tool installer which is apt based or by including a JDK in the AMI.

            thoulen FABRIZIO MANFREDI
            sam31897 Sam Shuzawa
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: