Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-49309

All SSH slaves unexpectedly disconnect when one job finishes

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • ssh-slaves-plugin
    • None

      Using SSH based slaves, if you are running two or more agents on a slave, and have concurrent builds running, all of the jobs will unexpectedly fail due to SSH disconnections when one of them finishes. 

       Example of job that was running when another finished. 

      Parameter C_MEMSTYLE bound to: 2 - type: integer 
       Parameter C_OPTIMIZATION bound to: 2 - type: integer 
       Parameter C_MEM_INIT_PREFIX bound to: MainDesign_rs_encoder_0_0 - type: string 
       Parameter C_ELABORATION_DIR bound to: ./ - type: string 
       Parameter C_XDEVICEFAMILY bound to: kintex7 - type: string 
       Parameter C_FAMILY bound to: kintex7 - type: string 
       Connection to 127.0.0.1 closed by remote host.
       [Pipeline] }
       [Pipeline] // script
       [Pipeline] }
       [Pipeline] // withEnv
       [Pipeline] }
       [Pipeline] // stage
       [Pipeline] stage
       [Pipeline] { (Deployment)
       Stage 'Deployment' skipped due to earlier failure(s)
      

      This is persistent and happens regularly.
      I've tried making two slave with one agent each (that point to the same physical slave) but the problem persists.

      This is an issue for us as builds take 3hrs on high powered machines and it's not feasible to run them one after another, we need parallel.

      Jenkins ver. 2.73.3
      ssh slaves plugin 1.24

      Attached screenshot off basic SSH slave setup.

          [JENKINS-49309] All SSH slaves unexpectedly disconnect when one job finishes

          Connection to 127.0.0.1 closed by remote host.

          The loopback device, Could you attach the config file of this Agent? Is it running on the same Jenkins instance? Did you set the Xmx and Xms JVM parameters?

          Ivan Fernandez Calvo added a comment - Connection to 127.0.0.1 closed by remote host. The loopback device, Could you attach the config file of this Agent? Is it running on the same Jenkins instance? Did you set the Xmx and Xms JVM parameters?

          Dion Gonano added a comment - - edited

          ifernandezcalvo Can you be more specific about which config or how to get it? I attached a pic of the slave config page in jenkins.

          what is the Xmx and Xms?

          Dion Gonano added a comment - - edited ifernandezcalvo Can you be more specific about which config or how to get it? I attached a pic of the slave config page in jenkins. what is the Xmx and Xms?

          the config.xml of the Agent is returned by the JENKINS_URL/computer/NAME_OF_AGENT/config.xml

          https://stackoverflow.com/questions/14763079/what-are-the-xms-and-xmx-parameters-when-starting-jvms

          Ivan Fernandez Calvo added a comment - the config.xml of the Agent is returned by the JENKINS_URL/computer/NAME_OF_AGENT/config.xml https://stackoverflow.com/questions/14763079/what-are-the-xms-and-xmx-parameters-when-starting-jvms

          Dion Gonano added a comment -

          If Xms and Xmx are the java memory parameters i haven't configured anything. I thought Jenkins ssh'd in and started the slave JVM with the correct params.

          Dion Gonano added a comment - If Xms and Xmx are the java memory parameters i haven't configured anything. I thought Jenkins ssh'd in and started the slave JVM with the correct params.

          Dion Gonano added a comment -

          slave configuration

          <slave>
           <name>vagrant</name>
           <description/>
           <remoteFS>/media/disk1/jenkins/</remoteFS>
           <numExecutors>1</numExecutors>
           <mode>NORMAL</mode>
           <retentionStrategy class="hudson.slaves.RetentionStrategy$Always"/>
           <launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.24">
             <host>192.168.11.10</host>
             <port>22</port>
             <credentialsId>vagrant</credentialsId>
             <maxNumRetries>0</maxNumRetries>
             <retryWaitTime>0</retryWaitTime>
             <sshHostKeyVerificationStrategy class="hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy">
               <requireInitialManualTrust>false</requireInitialManualTrust>
             </sshHostKeyVerificationStrategy>
           </launcher>
           <label>vagrant-slave</label>
           <nodeProperties/>
           </slave>

          Dion Gonano added a comment - slave configuration <slave> <name>vagrant</name> <description/> <remoteFS>/media/disk1/jenkins/</remoteFS> <numExecutors>1</numExecutors> <mode>NORMAL</mode> <retentionStrategy class= "hudson.slaves.RetentionStrategy$Always" /> <launcher class= "hudson.plugins.sshslaves.SSHLauncher" plugin= "ssh-slaves@1.24" > <host>192.168.11.10</host> <port>22</port> <credentialsId>vagrant</credentialsId> <maxNumRetries>0</maxNumRetries> <retryWaitTime>0</retryWaitTime> <sshHostKeyVerificationStrategy class= "hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy" > <requireInitialManualTrust> false </requireInitialManualTrust> </sshHostKeyVerificationStrategy> </launcher> <label>vagrant-slave</label> <nodeProperties/> </slave>

          Dion Gonano added a comment -

          I've been running it with one executor for a while now with no disconnection issues

          Dion Gonano added a comment - I've been running it with one executor for a while now with no disconnection issues

          try to set the JVM options to

          -Xmx512m -Xms512m

          and increase the executors, I think that the slave.jar process launches an Out Of memory error.

          Ivan Fernandez Calvo added a comment - try to set the JVM options to -Xmx512m -Xms512m and increase the executors, I think that the slave.jar process launches an Out Of memory error.

          Dion Gonano added a comment -

          Done, i'll let you know how it goes

          Dion Gonano added a comment - Done, i'll let you know how it goes

          Dion Gonano added a comment -

          ifernandezcalvo No luck, still disconnected, same message

           

          Dion Gonano added a comment - ifernandezcalvo No luck, still disconnected, same message  

          recently we detected disconnections that are related to https://wiki.jenkins.io/display/JENKINS/Slave+To+Master+Access+Control setting here we do not have the agent logs but in case that they show a serialization warning you should try to disable this feature and report a bug on the plugin that contains the class that fails to serialize. see https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#selenium-grid-agents-failed-to-connect

           

          The warning would be something like this one with a different class

          {{Apr 03, 2019 9:46:01 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
          
          WARNING: Attempt to (de-)serialize anonymous class hudson.plugins.selenium.configuration.DirectJsonInputConfiguration$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/}}
          
          

          Ivan Fernandez Calvo added a comment - recently we detected disconnections that are related to  https://wiki.jenkins.io/display/JENKINS/Slave+To+Master+Access+Control  setting here we do not have the agent logs but in case that they show a serialization warning you should try to disable this feature and report a bug on the plugin that contains the class that fails to serialize. see  https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#selenium-grid-agents-failed-to-connect   The warning would be something like this one with a different class {{Apr 03, 2019 9:46:01 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn WARNING: Attempt to (de-)serialize anonymous class hudson.plugins.selenium.configuration.DirectJsonInputConfiguration$1; see: https: //jenkins.io/redirect/serialization-of-anonymous-classes/}}

            ifernandezcalvo Ivan Fernandez Calvo
            dgonano Dion Gonano
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: