Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28492

The server rejected the connection: *** is already connected to this master. Rejecting this connection.

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • core, remoting
    • None
    • Windows 7, Jenkins 1.613

      I am running Jenkins 1.613.

      i see the slaves log:
      四月 15, 2015 10:15:02 上午 com.youdevise.hudson.slavestatus.SlaveListener call
      信息: Slave-status listener starting
      四月 15, 2015 10:15:02 上午 com.youdevise.hudson.slavestatus.SocketHTTPListener waitForConnection
      信息: Slave-status listener ready on port 3141
      四月 15, 2015 1:41:01 下午 hudson.remoting.jnlp.Main createEngine
      信息: Setting up slave: 192.168.161.8
      四月 15, 2015 1:41:01 下午 hudson.remoting.jnlp.Main$CuiListener <init>
      信息: Jenkins agent is running in headless mode.
      四月 15, 2015 1:41:01 下午 hudson.remoting.jnlp.Main$CuiListener status
      信息: Locating server among http://192.168.95.37:8080/jenkins/
      四月 15, 2015 1:41:01 下午 hudson.remoting.jnlp.Main$CuiListener status
      信息: Connecting to 192.168.95.37:19994
      四月 15, 2015 1:41:01 下午 hudson.remoting.jnlp.Main$CuiListener status
      信息: Handshaking
      四月 15, 2015 1:41:01 下午 hudson.remoting.jnlp.Main$CuiListener error
      严重: The server rejected the connection: 192.168.161.8 is already connected to this master. Rejecting this connection.
      java.lang.Exception: The server rejected the connection: 192.168.161.8 is already connected to this master. Rejecting this connection.
      at hudson.remoting.Engine.onConnectionRejected(Engine.java:306)
      at hudson.remoting.Engine.run(Engine.java:276)

        1. channelclosedstack.txt
          1.30 MB
        2. jenkins-slave.err.log
          293 kB
        3. Slave errors
          2 kB

          [JENKINS-28492] The server rejected the connection: *** is already connected to this master. Rejecting this connection.

          FYI when this problem occurs the number of threads starts to climb from steady state at ~800-1500 up to 11k. Most of the extra threads look like they are doing nothing (just a single frame on a thread named Thread-NNNNNN).

          Matthew Mitchell added a comment - FYI when this problem occurs the number of threads starts to climb from steady state at ~800-1500 up to 11k. Most of the extra threads look like they are doing nothing (just a single frame on a thread named Thread-NNNNNN).

          mariem baccar added a comment -

          You find below some explications for my case which can help you:
          -Before installing the plugin "Docker Slave v1.0.5", the slave operates correctly without any problems.
          -After installing this plugin, I encountered many problems. One of them is about slave: I always get this message "slave agent is already connected". For more details, you find attached the whole message.
          In fact, the problem is related to JENKINS-39078: There is a problem in Docker Slave Plugin 1.0.5 (Fix: https://github.com/jenkinsci/docker-slaves-plugin/commit/451929125fd8ff39c6f84c30476c26cccb912140)
          After discovering that this plugin recently installed is the source of all my new problems in Jenkins 2.19.1 LTS, I uninstall it and all my problems are resolved.

          mariem baccar added a comment - You find below some explications for my case which can help you: -Before installing the plugin "Docker Slave v1.0.5", the slave operates correctly without any problems. -After installing this plugin, I encountered many problems. One of them is about slave: I always get this message "slave agent is already connected". For more details, you find attached the whole message. In fact, the problem is related to JENKINS-39078 : There is a problem in Docker Slave Plugin 1.0.5 (Fix: https://github.com/jenkinsci/docker-slaves-plugin/commit/451929125fd8ff39c6f84c30476c26cccb912140 ) After discovering that this plugin recently installed is the source of all my new problems in Jenkins 2.19.1 LTS, I uninstall it and all my problems are resolved.

          Do you use Job DSL too? I think there might be something there that is related too.

          Based on the data seen, I think this may be a deadlock.

          Matthew Mitchell added a comment - Do you use Job DSL too? I think there might be something there that is related too. Based on the data seen, I think this may be a deadlock.

          Oleg Nenashev added a comment - - edited

          Another known case when the issue happens - OutOfMemory exception - JENKINS-30823

          Oleg Nenashev added a comment - - edited Another known case when the issue happens - OutOfMemory exception - JENKINS-30823

          Yep, that correlates with what I've seen too. What is odd about this is that the actual usage doesn't appear that large. A heap dump before the failure doesn't show a heap that is anywhere close to full (3GB usage of 32GB). But I do see lots of "unable to allocate new native thread", etc and the number of threads grows very large from the steady state.

          Could this be related to a deadlock of some sort where we start to spawn threads till we die?

          Matthew Mitchell added a comment - Yep, that correlates with what I've seen too. What is odd about this is that the actual usage doesn't appear that large. A heap dump before the failure doesn't show a heap that is anywhere close to full (3GB usage of 32GB). But I do see lots of "unable to allocate new native thread", etc and the number of threads grows very large from the steady state. Could this be related to a deadlock of some sort where we start to spawn threads till we die?

          Oleg Nenashev added a comment -

          Maybe. I think we firstly need to implement correct handling of Errors and RuntimeExceptions in the core in order to avoid the case when the channel object leaks after the failure. It should help with some cases

          Oleg Nenashev added a comment - Maybe. I think we firstly need to implement correct handling of Errors and RuntimeExceptions in the core in order to avoid the case when the channel object leaks after the failure. It should help with some cases

          Right, for sure. I think there are a few cases:

          1) Cases where plugins/core usae computer.getChannel(). These work fine since if the listeners run properly, the channel is set to null and the code can do the right thing at that point depending on what its function is.
          2) Cases where the channel object is held on an object (Ping thread, etc.) but there is error handling. These appear okay (ping thread exits).
          3) Cases where the channel object is held on an object but no error handling - These need fixing.

          I don't know of any real cases of #3. #2 appears okay. #1 is a problem because the listeners which should null out the channel on the computer object don't actually run. That is why we see the "already connected" issue.

          Matthew Mitchell added a comment - Right, for sure. I think there are a few cases: 1) Cases where plugins/core usae computer.getChannel(). These work fine since if the listeners run properly, the channel is set to null and the code can do the right thing at that point depending on what its function is. 2) Cases where the channel object is held on an object (Ping thread, etc.) but there is error handling. These appear okay (ping thread exits). 3) Cases where the channel object is held on an object but no error handling - These need fixing. I don't know of any real cases of #3. #2 appears okay. #1 is a problem because the listeners which should null out the channel on the computer object don't actually run. That is why we see the "already connected" issue.

          This also appears to perhaps happen "near" a failure

          https://issues.jenkins-ci.org/browse/JENKINS-33358

          Matthew Mitchell added a comment - This also appears to perhaps happen "near" a failure https://issues.jenkins-ci.org/browse/JENKINS-33358

          Oleg Nenashev added a comment -

          Some bits have been addressed in JENKINS-39835

          Oleg Nenashev added a comment - Some bits have been addressed in JENKINS-39835

          Oleg Nenashev added a comment -

          Fixed in 2.50

          Oleg Nenashev added a comment - Fixed in 2.50

            oleg_nenashev Oleg Nenashev
            gaffey gaffey he
            Votes:
            9 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: