Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5055

server rejected connection: already connected to master

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • remoting
    • None

      After some idle time (no jobs running, master and slaves idle), the master showed a slave as offline.

      on the slave, I see an Error pop-up window saying:
      ...
      java.lang.Exception: The server rejected the connection: nlvhtcnxp1dt361 is
      already connected to this master. Rejecting this connection. at
      hudson.remoting.engine.Run(Engine.java:191)
      ...

      after clicking ok on pop-up windows, the hudson slave app terminates.
      restarting the hudson slave app manually seems to work fine.

          [JENKINS-5055] server rejected connection: already connected to master

          tomdevries created issue -

          The root cause of the issue appears that the socket communication between the slave and the master is lost in such a way that the master doesn't notice. So when the slave connects back, the master thinks it's a bogus attempt since the slave is already connected.

          Do you have a NAT/firewall between a master and a slave?

          One fix could be to have the master check if the slave is alive before rejecting the new incoming connection, but this may take 10s of secs as it can involve packet retransmission. Another possibility might be to let the slave send in some token so that the master can verify that it's being reconnected from what it's supposed to be currently connecting.

          Still thinking about how to fix this.

          Kohsuke Kawaguchi added a comment - The root cause of the issue appears that the socket communication between the slave and the master is lost in such a way that the master doesn't notice. So when the slave connects back, the master thinks it's a bogus attempt since the slave is already connected. Do you have a NAT/firewall between a master and a slave? One fix could be to have the master check if the slave is alive before rejecting the new incoming connection, but this may take 10s of secs as it can involve packet retransmission. Another possibility might be to let the slave send in some token so that the master can verify that it's being reconnected from what it's supposed to be currently connecting. Still thinking about how to fix this.

          vkodocha added a comment -

          I've have exactly the same issue here with our setup. We have a master node running on Mac OS X and a windows xp slave running in vmware on the same machine. Hudson version is 1.351 but the problem is basically appearing since we installed this system the first time. It does occur at least once a day.

          The network communication between the xp and the mac is done via nat.

          One workaround would be to have a possibility to disable the dialog so I could make a little test app on the win slave which could check if the slave is still running and restart it.

          vkodocha added a comment - I've have exactly the same issue here with our setup. We have a master node running on Mac OS X and a windows xp slave running in vmware on the same machine. Hudson version is 1.351 but the problem is basically appearing since we installed this system the first time. It does occur at least once a day. The network communication between the xp and the mac is done via nat. One workaround would be to have a possibility to disable the dialog so I could make a little test app on the win slave which could check if the slave is still running and restart it.

          vkodocha added a comment -

          This two issues seam to be the same

          vkodocha added a comment - This two issues seam to be the same
          vkodocha made changes -
          Link New: This issue duplicates JENKINS-5973 [ JENKINS-5973 ]

          tapiomtr added a comment - - edited

          We also have same kind of problem.

          What I notice is that our Linux based Hudson slave was started to svn checkout, but it was jam for some reason. Same time the aain Hudson indicate that "There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands".
          So then I tried to restart that Linux based Hudson slave, but it can't start up because the main Hudson still thinks that the slave is still connected. I now wait about 30min, that the main Hudson find out that the slave is gone, e.g. the slave is still idle, while it's not running at all.

          Only way to solve this problem is first restart main Hudson server.

          Hudson master running on:
          Redhat Linux running on VMware VM

          Hudson slave running on:
          Redhat Linux running on VMware VM

          tapiomtr added a comment - - edited We also have same kind of problem. What I notice is that our Linux based Hudson slave was started to svn checkout, but it was jam for some reason. Same time the aain Hudson indicate that "There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands". So then I tried to restart that Linux based Hudson slave, but it can't start up because the main Hudson still thinks that the slave is still connected. I now wait about 30min, that the main Hudson find out that the slave is gone, e.g. the slave is still idle, while it's not running at all. Only way to solve this problem is first restart main Hudson server. Hudson master running on: Redhat Linux running on VMware VM Hudson slave running on: Redhat Linux running on VMware VM
          Alan Harder made changes -
          Component/s New: master-slave [ 15489 ]

          Code changed in hudson
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          core/src/main/java/hudson/TcpSlaveAgentListener.java
          core/src/main/java/hudson/slaves/SlaveComputer.java
          remoting/src/main/java/hudson/remoting/Engine.java
          http://hudson-labs.org/commit/core/68ed742227891a3f716e4e479388c36876bb935a
          Log:
          [FIXED JENKINS-5055] allow the same JNLP slave to reconnect without getting rejected.

          SCM/JIRA link daemon added a comment - Code changed in hudson User: Kohsuke Kawaguchi Path: changelog.html core/src/main/java/hudson/TcpSlaveAgentListener.java core/src/main/java/hudson/slaves/SlaveComputer.java remoting/src/main/java/hudson/remoting/Engine.java http://hudson-labs.org/commit/core/68ed742227891a3f716e4e479388c36876bb935a Log: [FIXED JENKINS-5055] allow the same JNLP slave to reconnect without getting rejected.
          SCM/JIRA link daemon made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          Kohsuke Kawaguchi made changes -
          Link New: This issue is duplicated by JENKINS-5355 [ JENKINS-5355 ]

            Unassigned Unassigned
            tomdevries tomdevries
            Votes:
            9 Vote for this issue
            Watchers:
            21 Start watching this issue

              Created:
              Updated:
              Resolved: