Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5973

Slaves reconnecting after restarting are rejected because Hudson thinks the slave already connected

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • core
    • None
    • Windows 2003 SE SP2, 2GB VMware VM

      [20:02] <roxspring> grrr - just noticed that the hudson has put the slaves offline
      [20:02] <roxspring> am getting this sort of thing an awful lot lately
      [20:07] <roxspring> typical slave logs go like this: (see below)
      [20:08] <roxspring> then the slave services won't start up "Error 1067: The process terminated unexpectedly"
      [20:09] <roxspring> seemingly another rejected connection
      [20:11] <@kohsuke> I know this issue
      [20:11] <@kohsuke> It's because Hudson thinks the slave is still connected even though it's not any more
      [20:11] <@kohsuke> We need to fix this
      [20:11] <@kohsuke> roxspring: if you can file a ticket, that would be great.

      15-Mar-2010 14:35:01 hudson.remoting.Channel$ReaderThread run
      SEVERE: I/O error in channel channel
      java.net.SocketException: Connection reset
      	at java.net.SocketInputStream.read(Unknown Source)
      	at java.io.BufferedInputStream.fill(Unknown Source)
      	at java.io.BufferedInputStream.read(Unknown Source)
      	at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
      	at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
      	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
      	at java.io.ObjectInputStream.readObject0(Unknown Source)
      	at java.io.ObjectInputStream.readObject(Unknown Source)
      	at hudson.remoting.Channel$ReaderThread.run(Channel.java:856)
      15-Mar-2010 14:35:01 hudson.remoting.Request$2 run
      SEVERE: Failed to send back a reply
      java.net.SocketException: Connection reset by peer: socket write error
      	at java.net.SocketOutputStream.socketWrite0(Native Method)
      	at java.net.SocketOutputStream.socketWrite(Unknown Source)
      	at java.net.SocketOutputStream.write(Unknown Source)
      	at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
      	at java.io.BufferedOutputStream.write(Unknown Source)
      	at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Unknown Source)
      	at java.io.ObjectOutputStream$BlockDataOutputStream.writeByte(Unknown Source)
      	at java.io.ObjectOutputStream.writeFatalException(Unknown Source)
      	at java.io.ObjectOutputStream.writeObject(Unknown Source)
      	at hudson.remoting.Channel.send(Channel.java:417)
      	at hudson.remoting.Request$2.run(Request.java:282)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      	at java.util.concurrent.FutureTask.run(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      	at hudson.remoting.Engine$1$1.run(Engine.java:58)
      	at java.lang.Thread.run(Unknown Source)
      15-Mar-2010 14:35:01 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      

      then the server starts up again

      15-Mar-2010 14:35:51 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      15-Mar-2010 14:35:51 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      15-Mar-2010 14:35:52 com.youdevise.hudson.slavestatus.SlaveListener call
      INFO: Slave-status listener starting
      15-Mar-2010 14:35:52 com.youdevise.hudson.slavestatus.SlaveListener$1 run
      SEVERE: Could not listen on port
      java.net.BindException: Address already in use: JVM_Bind
      	at java.net.PlainSocketImpl.socketBind(Native Method)
      	at java.net.PlainSocketImpl.bind(Unknown Source)
      	at java.net.ServerSocket.bind(Unknown Source)
      	at java.net.ServerSocket.<init>(Unknown Source)
      	at java.net.ServerSocket.<init>(Unknown Source)
      	at com.youdevise.hudson.slavestatus.SocketHTTPListener.waitForConnection(SlaveListener.java:129)
      	at com.youdevise.hudson.slavestatus.SlaveListener$1.run(SlaveListener.java:63)
      	at com.youdevise.hudson.slavestatus.Daemon.go(Daemon.java:16)
      	at com.youdevise.hudson.slavestatus.SlaveListener.call(SlaveListener.java:83)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:114)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      	at hudson.remoting.Request$2.run(Request.java:270)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      	at java.util.concurrent.FutureTask.run(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      	at hudson.remoting.Engine$1$1.run(Engine.java:58)
      	at java.lang.Thread.run(Unknown Source)
      17-Mar-2010 16:14:42 hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Hudson agent is running in headless mode.
      

      Then when connecting...

      17-Mar-2010 16:14:42 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      17-Mar-2010 16:14:42 hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: The server rejected the connection: hb-slave-trunk1 is already connected to this master. Rejecting this connection.
      java.lang.Exception: The server rejected the connection: hb-slave-trunk1 is already connected to this master. Rejecting this connection.
      	at hudson.remoting.Engine.run(Engine.java:191)
      
      
      

          [JENKINS-5973] Slaves reconnecting after restarting are rejected because Hudson thinks the slave already connected

          As a reminder to myself, the problem is that under some circumstances, a TCP connection can be broken in such a way that one peer doesn't notice it right away.

          So the slave should send in some unique identifier so that the master can verify that the same slave is reconnecting (thus the existing connection is no good.)

          This in turn involves in adding some extensibility to the bootstrap protocol of the slave connection.

          Kohsuke Kawaguchi added a comment - As a reminder to myself, the problem is that under some circumstances, a TCP connection can be broken in such a way that one peer doesn't notice it right away. So the slave should send in some unique identifier so that the master can verify that the same slave is reconnecting (thus the existing connection is no good.) This in turn involves in adding some extensibility to the bootstrap protocol of the slave connection.

          bsrinath added a comment - - edited

          Until this is permanently fixed in Hudson and if people are still struggling with this, here's a workaround that can possibly be used:

          During the Slave startup (we use JNLP), instead of starting it up as below:

           
          javaws http://your.hudson.com:8080/computer/nameofslave/slave-agent.jnlp 
          

          the slave can be started from a startup script as follows (Windows):

           
          java -jar hudson-cli.jar -s http://your.hudson.com:8080/ delete-node nameofslave
          
          if %errorlevel%==0 ( 
             javaws http://your.hudson.com:8080/computer/nameofslave/slave-agent.jnlp  
          ) 
          

          bsrinath added a comment - - edited Until this is permanently fixed in Hudson and if people are still struggling with this, here's a workaround that can possibly be used: During the Slave startup (we use JNLP), instead of starting it up as below: javaws http: //your.hudson.com:8080/computer/nameofslave/slave-agent.jnlp the slave can be started from a startup script as follows (Windows): java -jar hudson-cli.jar -s http: //your.hudson.com:8080/ delete-node nameofslave if %errorlevel%==0 ( javaws http: //your.hudson.com:8080/computer/nameofslave/slave-agent.jnlp )

          track added a comment -

          java -jar hudson-cli.jar -s http://your.hudson.com:8080/ delete-node nameofslave

          still results in

          No argument is allowed: nameofslave
          java -jar hudson-cli.jar delete-node args...
          Deletes a node

          I have replaced nameofslave with my salve name. Another issue?

          track added a comment - java -jar hudson-cli.jar -s http://your.hudson.com:8080/ delete-node nameofslave still results in No argument is allowed: nameofslave java -jar hudson-cli.jar delete-node args... Deletes a node I have replaced nameofslave with my salve name. Another issue?

          Merging with JENKINS-5055.

          Kohsuke Kawaguchi added a comment - Merging with JENKINS-5055 .

          Merging with JENKINS-5055.

          Kohsuke Kawaguchi added a comment - Merging with JENKINS-5055 .

            Unassigned Unassigned
            roxspring roxspring
            Votes:
            7 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: