Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-70832

ERROR: Socket connection to SSH server was lost

      Jenkins is crashing or terminating connection all the time:

       

      Mar 20, 2023 8:03:26 PM org.jvnet.winp.Native loadByUrl WARNING: DLL and EXE are inconsistenly present on disk ERROR: Connection terminated java.io.EOFException at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2926) at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3421) at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:959) at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:397) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49) at hudson.remoting.Command.readFrom(Command.java:142) at hudson.remoting.Command.readFrom(Command.java:128) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61) Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75) ERROR: Socket connection to SSH server was lost java.net.SocketException: Connection reset at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323) at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350) at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803) at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966) at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244) at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:343) at com.trilead.ssh2.crypto.cipher.CipherInputStream.readPlain(CipherInputStream.java:105) at com.trilead.ssh2.transport.TransportConnection.receiveMessage(TransportConnection.java:251) at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:706) at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502) at java.base/java.lang.Thread.run(Thread.java:833) Agent JVM has not reported exit code before the socket was lost [03/20/23 20:11:09] [SSH] Connection closed.

       

      Please give us a mechanist to reconnect and retry

       

      Code to create ssh agent slave:
      def createJenkinsAgent(ip, nodeName, labelNode)

      { def j = Jenkins.getInstance() def hostKeyVerificationStrategy = new NonVerifyingKeyVerificationStrategy() def launcher = new SSHLauncher( ip, // Host 22, // Port "ssh_credentials", // Credentials "-Xmx512m -Xms512m -Dhudson.remoting.Launcher.pingIntervalSec=-1", // JVM Options null, // JavaPath null, // Prefix Start Slave Command null, // Suffix Start Slave Command 60, // Connection Timeout in Seconds 10, // Maximum Number of Retries 15, // The number of seconds to wait between retries hostKeyVerificationStrategy // Host Key Verification Strategy ) def agent = new DumbSlave( nodeName, "/jenkins", launcher ) agent.nodeDescription = "Windows ARM Machine" agent.numExecutors = 4 agent.labelString = labelNode agent.mode = Node.Mode.EXCLUSIVE agent.retentionStrategy = new RetentionStrategy.Demand(1, 2) j.addNode(agent) }

          [JENKINS-70832] ERROR: Socket connection to SSH server was lost

          Rene added a comment -

          This is very critical for us, please provide us a mechanism to retry

           

          Rene added a comment - This is very critical for us, please provide us a mechanism to retry  

          Mark Waite added a comment -

          I switched from using an SSH agent to my Window Arm machine to use an inbound agent instead. The ssh agent connection would not reliably remain open on the Windows ARM machine. They work very well from my Windows 10 AMD64 agents, but not from my one Windows Arm machine.

          Inbound agent connections have been much more reliable from the Windows Arm machine.

          Mark Waite added a comment - I switched from using an SSH agent to my Window Arm machine to use an inbound agent instead. The ssh agent connection would not reliably remain open on the Windows ARM machine. They work very well from my Windows 10 AMD64 agents, but not from my one Windows Arm machine. Inbound agent connections have been much more reliable from the Windows Arm machine.

          Rene added a comment -

          Hi :markewaite  it happens even in not ARM machines, I need to create Nodes in Runtime, I would love to have a way to reconnect if disconnection happened without affecting the build

          Rene added a comment - Hi : markewaite   it happens even in not ARM machines, I need to create Nodes in Runtime, I would love to have a way to reconnect if disconnection happened without affecting the build

          Markus Winter added a comment -

          A prerequisite that your builds are not affected by agent disconnections is that you use pipelines though is is not guaranteed I think. A Freestyle job will always fail when the agent disconnects.

          Outbound agents should automatically reconnect if they are always on. As you're using the onDemand strategy I'm not 100% sure how it will behave but I would assume that also then Jenkins should be able to continue a running pipeline.

          For inbound agents I think the java process will by itself try to reconnect right away.

          If you have frequent disconnections you might have a bad network connection.

          Markus Winter added a comment - A prerequisite that your builds are not affected by agent disconnections is that you use pipelines though is is not guaranteed I think. A Freestyle job will always fail when the agent disconnects. Outbound agents should automatically reconnect if they are always on. As you're using the onDemand strategy I'm not 100% sure how it will behave but I would assume that also then Jenkins should be able to continue a running pipeline. For inbound agents I think the java process will by itself try to reconnect right away. If you have frequent disconnections you might have a bad network connection.

          Rene added a comment -

          Hi mawinter69 I believe that once the agent is connected on Demand or in runtime, it should behave in the same way than a permanent agent, the mechanisms are the same, the problem happens during bat execution in windows, bat is not risky and is just a dir, or download a file. I would like a mechanism to reconnect, it is failing very often

          Rene added a comment - Hi mawinter69 I believe that once the agent is connected on Demand or in runtime, it should behave in the same way than a permanent agent, the mechanisms are the same, the problem happens during bat execution in windows, bat is not risky and is just a dir, or download a file. I would like a mechanism to reconnect, it is failing very often

          Infrastructure issues are not issues related to the plugin. The SSH library provides the reliability measures needed to keep connections alive. Anything else should be managed at the system level, configuring the SSH server to be reliable.

          Ivan Fernandez Calvo added a comment - Infrastructure issues are not issues related to the plugin. The SSH library provides the reliability measures needed to keep connections alive. Anything else should be managed at the system level, configuring the SSH server to be reliable.

            Unassigned Unassigned
            jrpally Rene
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: