Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5055

server rejected connection: already connected to master

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Component/s: remoting
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      After some idle time (no jobs running, master and slaves idle), the master showed a slave as offline.

      on the slave, I see an Error pop-up window saying:
      ...
      java.lang.Exception: The server rejected the connection: nlvhtcnxp1dt361 is
      already connected to this master. Rejecting this connection. at
      hudson.remoting.engine.Run(Engine.java:191)
      ...

      after clicking ok on pop-up windows, the hudson slave app terminates.
      restarting the hudson slave app manually seems to work fine.

        Attachments

          Issue Links

            Activity

            tomdevries tomdevries created issue -
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            The root cause of the issue appears that the socket communication between the slave and the master is lost in such a way that the master doesn't notice. So when the slave connects back, the master thinks it's a bogus attempt since the slave is already connected.

            Do you have a NAT/firewall between a master and a slave?

            One fix could be to have the master check if the slave is alive before rejecting the new incoming connection, but this may take 10s of secs as it can involve packet retransmission. Another possibility might be to let the slave send in some token so that the master can verify that it's being reconnected from what it's supposed to be currently connecting.

            Still thinking about how to fix this.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - The root cause of the issue appears that the socket communication between the slave and the master is lost in such a way that the master doesn't notice. So when the slave connects back, the master thinks it's a bogus attempt since the slave is already connected. Do you have a NAT/firewall between a master and a slave? One fix could be to have the master check if the slave is alive before rejecting the new incoming connection, but this may take 10s of secs as it can involve packet retransmission. Another possibility might be to let the slave send in some token so that the master can verify that it's being reconnected from what it's supposed to be currently connecting. Still thinking about how to fix this.
            Hide
            vkodocha vkodocha added a comment -

            I've have exactly the same issue here with our setup. We have a master node running on Mac OS X and a windows xp slave running in vmware on the same machine. Hudson version is 1.351 but the problem is basically appearing since we installed this system the first time. It does occur at least once a day.

            The network communication between the xp and the mac is done via nat.

            One workaround would be to have a possibility to disable the dialog so I could make a little test app on the win slave which could check if the slave is still running and restart it.

            Show
            vkodocha vkodocha added a comment - I've have exactly the same issue here with our setup. We have a master node running on Mac OS X and a windows xp slave running in vmware on the same machine. Hudson version is 1.351 but the problem is basically appearing since we installed this system the first time. It does occur at least once a day. The network communication between the xp and the mac is done via nat. One workaround would be to have a possibility to disable the dialog so I could make a little test app on the win slave which could check if the slave is still running and restart it.
            Hide
            vkodocha vkodocha added a comment -

            This two issues seam to be the same

            Show
            vkodocha vkodocha added a comment - This two issues seam to be the same
            vkodocha vkodocha made changes -
            Field Original Value New Value
            Link This issue duplicates JENKINS-5973 [ JENKINS-5973 ]
            Hide
            tapiomtr tapiomtr added a comment - - edited

            We also have same kind of problem.

            What I notice is that our Linux based Hudson slave was started to svn checkout, but it was jam for some reason. Same time the aain Hudson indicate that "There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands".
            So then I tried to restart that Linux based Hudson slave, but it can't start up because the main Hudson still thinks that the slave is still connected. I now wait about 30min, that the main Hudson find out that the slave is gone, e.g. the slave is still idle, while it's not running at all.

            Only way to solve this problem is first restart main Hudson server.

            Hudson master running on:
            Redhat Linux running on VMware VM

            Hudson slave running on:
            Redhat Linux running on VMware VM

            Show
            tapiomtr tapiomtr added a comment - - edited We also have same kind of problem. What I notice is that our Linux based Hudson slave was started to svn checkout, but it was jam for some reason. Same time the aain Hudson indicate that "There are more SCM polling activities scheduled than handled, so the threads are not keeping up with the demands". So then I tried to restart that Linux based Hudson slave, but it can't start up because the main Hudson still thinks that the slave is still connected. I now wait about 30min, that the main Hudson find out that the slave is gone, e.g. the slave is still idle, while it's not running at all. Only way to solve this problem is first restart main Hudson server. Hudson master running on: Redhat Linux running on VMware VM Hudson slave running on: Redhat Linux running on VMware VM
            mindless Alan Harder made changes -
            Component/s master-slave [ 15489 ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in hudson
            User: Kohsuke Kawaguchi
            Path:
            changelog.html
            core/src/main/java/hudson/TcpSlaveAgentListener.java
            core/src/main/java/hudson/slaves/SlaveComputer.java
            remoting/src/main/java/hudson/remoting/Engine.java
            http://hudson-labs.org/commit/core/68ed742227891a3f716e4e479388c36876bb935a
            Log:
            [FIXED JENKINS-5055] allow the same JNLP slave to reconnect without getting rejected.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: Kohsuke Kawaguchi Path: changelog.html core/src/main/java/hudson/TcpSlaveAgentListener.java core/src/main/java/hudson/slaves/SlaveComputer.java remoting/src/main/java/hudson/remoting/Engine.java http://hudson-labs.org/commit/core/68ed742227891a3f716e4e479388c36876bb935a Log: [FIXED JENKINS-5055] allow the same JNLP slave to reconnect without getting rejected.
            scm_issue_link SCM/JIRA link daemon made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            kohsuke Kohsuke Kawaguchi made changes -
            Link This issue is duplicated by JENKINS-5355 [ JENKINS-5355 ]
            Hide
            rradha Renuka Radhakrishnan added a comment -

            I still see the same issue in Jenkins 1.652. Our windows slaves (running Windows Server 2012) don't get connected to the master after the daily restart, as the master thinks the slave is still connected. Please note that this doesn't happen all the time, seems like there is some timing involved.

            Mar 08, 2016 6:01:44 AM hudson.remoting.jnlp.Main createEngine
            INFO: Setting up slave: ...
            Mar 08, 2016 6:01:44 AM hudson.remoting.jnlp.Main$CuiListener <init>
            INFO: Jenkins agent is running in headless mode.
            Mar 08, 2016 6:01:45 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Locating server among .......
            Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Handshaking
            Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Connecting to jenkins:......
            Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Trying protocol: JNLP2-connect
            Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Server didn't accept the handshake: ..... is already connected to this master. Rejecting this connection.
            Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Connecting to jenkins:....
            Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Trying protocol: JNLP-connect
            Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Server didn't accept the handshake: ..... is already connected to this master. Rejecting this connection.
            Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Connecting to jenkins:....
            Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener error
            SEVERE: The server rejected the connection: None of the protocols were accepted
            java.lang.Exception: The server rejected the connection: None of the protocols were accepted
                    at hudson.remoting.Engine.onConnectionRejected(Engine.java:286)
                    at hudson.remoting.Engine.run(Engine.java:262)

            Show
            rradha Renuka Radhakrishnan added a comment - I still see the same issue in Jenkins 1.652. Our windows slaves (running Windows Server 2012) don't get connected to the master after the daily restart, as the master thinks the slave is still connected. Please note that this doesn't happen all the time, seems like there is some timing involved. Mar 08, 2016 6:01:44 AM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: ... Mar 08, 2016 6:01:44 AM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Mar 08, 2016 6:01:45 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among ....... Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins:...... Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP2-connect Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server didn't accept the handshake: ..... is already connected to this master. Rejecting this connection. Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins:.... Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP-connect Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server didn't accept the handshake: ..... is already connected to this master. Rejecting this connection. Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins:.... Mar 08, 2016 6:01:47 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: The server rejected the connection: None of the protocols were accepted java.lang.Exception: The server rejected the connection: None of the protocols were accepted         at hudson.remoting.Engine.onConnectionRejected(Engine.java:286)         at hudson.remoting.Engine.run(Engine.java:262)
            Hide
            hujirong Jirong Hu added a comment -

            Same issue with Jenkins ver. 1.638, Windows 2012 R2.

            Mar 11, 2016 10:16:37 AM hudson.TcpSlaveAgentListener$ConnectionHandler run
            INFO: Accepted connection #313231 from /172.21.81.18:12801
            Mar 11, 2016 10:16:37 AM jenkins.slaves.JnlpSlaveHandshake error
            WARNING: TCP slave agent connection handler #313231 with /172.21.81.18:12801 is aborted: portal_jendevslave_1 is already connected to this master. Rejecting this connection.

            Show
            hujirong Jirong Hu added a comment - Same issue with Jenkins ver. 1.638, Windows 2012 R2. Mar 11, 2016 10:16:37 AM hudson.TcpSlaveAgentListener$ConnectionHandler run INFO: Accepted connection #313231 from /172.21.81.18:12801 Mar 11, 2016 10:16:37 AM jenkins.slaves.JnlpSlaveHandshake error WARNING: TCP slave agent connection handler #313231 with /172.21.81.18:12801 is aborted: portal_jendevslave_1 is already connected to this master. Rejecting this connection.
            dwooster Douglas Wooster made changes -
            Link This issue is related to JENKINS-28492 [ JENKINS-28492 ]
            Hide
            dwooster Douglas Wooster added a comment - - edited

            We are seeing this issue in Jenkins 1.574, after rebooting the slave machine.
            Issue is intermittent - perhaps 1/3 of the time.
            Slaves are Linux, RHEL 6.4
            The slave is started from a local /etc/init.d script.
            Master is also RHEL 6.4 running on Tomcat 7.0.63.

            JENKINS-28492 was opened last year for the same error message, so I linked the two JIRAs.

            Show
            dwooster Douglas Wooster added a comment - - edited We are seeing this issue in Jenkins 1.574, after rebooting the slave machine. Issue is intermittent - perhaps 1/3 of the time. Slaves are Linux, RHEL 6.4 The slave is started from a local /etc/init.d script. Master is also RHEL 6.4 running on Tomcat 7.0.63. JENKINS-28492 was opened last year for the same error message, so I linked the two JIRAs.
            Hide
            gtirloni Giovanni Tirloni added a comment - - edited

            We're seeing this with Jenkins 2.14 after restarting the master.

            Workaround is to manually kill the java process running on the slave.

            Show
            gtirloni Giovanni Tirloni added a comment - - edited We're seeing this with Jenkins 2.14 after restarting the master. Workaround is to manually kill the java process running on the slave.
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 135139 ] JNJira + In-Review [ 186930 ]
            Hide
            mmitche Matthew Mitchell added a comment -

            I have some other data (Jenkins 6.42.1 running on Ubuntu 16.04 Java 8).

            • This only happens on nodes that were connected via jnlp by running a command on the node (java -jar ...) and is not limited to Windows. Seen on OSX too.
            • The failure is always preceded by: INFO: Ping failed. Terminating the channel <node name>
            • The ping fails for many machines within a few seconds.
            • Only those machines connected via manual jnlp remoting call fail ping
            • In the server we have running, the nodes running that don't fail are connected via ssh and are Linux, FreeBSD, etc.
            • The nodes that fail and don't fail exist in the same locations (VMs in Azure)

            So the interesting data point here I think is the fact that the ping doesn't fail on the machines connected via SSH channel. Is it that they aren't using the JNLP remoting protocol?

            Show
            mmitche Matthew Mitchell added a comment - I have some other data (Jenkins 6.42.1 running on Ubuntu 16.04 Java 8). This only happens on nodes that were connected via jnlp by running a command on the node (java -jar ...) and is not limited to Windows. Seen on OSX too. The failure is always preceded by: INFO: Ping failed. Terminating the channel <node name> The ping fails for many machines within a few seconds. Only those machines connected via manual jnlp remoting call fail ping In the server we have running, the nodes running that don't fail are connected via ssh and are Linux, FreeBSD, etc. The nodes that fail and don't fail exist in the same locations (VMs in Azure) So the interesting data point here I think is the fact that the ping doesn't fail on the machines connected via SSH channel. Is it that they aren't using the JNLP remoting protocol?
            Hide
            cecchisandrone Alessandro Dionisi added a comment -

            Did you try to play with -Dhudson.slaves.ChannelPinger.pingInterval property?

            Show
            cecchisandrone Alessandro Dionisi added a comment - Did you try to play with -Dhudson.slaves.ChannelPinger.pingInterval property?
            Hide
            mmitche Matthew Mitchell added a comment -

            Woops I was wondering where this comment went. Was supposed to go to another issue. And yes. It doesn't have anything to do with the ping rate.

            Show
            mmitche Matthew Mitchell added a comment - Woops I was wondering where this comment went. Was supposed to go to another issue. And yes. It doesn't have anything to do with the ping rate.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              tomdevries tomdevries
              Votes:
              9 Vote for this issue
              Watchers:
              21 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: