-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
I have a hudson setup with a single master pc (running win-xp) and 12 slave pcs (running win-xp and a few of them win2k), running hudson version 1.334 currently.
The hudson master is running as a windows service. The hudson slaves are run as java clients, started through 'javaws http://master:8080/computer/slavename/slave-agent.jnlp'. Btw, I tried to get the slave also running as service, but I never managed to get the right permissions when I run the slave as a service.
The jobs are written in shell commands, and executed on the slaves using cygwin sh.
The hudson slave park is partitioned this way:
* 3 slave pcs contain TriMedia based hardware boards. Jobs compile
test suites on the pc and run the executables on those boards.
* The other 9 are used for pc jobs. Jobs compile test suites on the
pc and run the executables on a TriMedia simulator on the pc.
I labeled the group-of-9 'windows'. In order to run a testsuite with different configurations (optimisation level, library set, etc) I use matrix jobs on my 'windows' queue. Using matrix jobs, I ran into bug 936, so I applied the workaround by adding -Dhudson.model.Hudson.flyweightSupport=true to C:\hudson\hundson.xml on the master.
I have a hudson setup with a single master pc (running win-xp) and 12 slave pcs (running win-xp and a few of them win2k), running hudson version 1.334 currently. The hudson master is running as a windows service. The hudson slaves are run as java clients, started through 'javaws http://master:8080/computer/slavename/slave-agent.jnlp' . Btw, I tried to get the slave also running as service, but I never managed to get the right permissions when I run the slave as a service. The jobs are written in shell commands, and executed on the slaves using cygwin sh. The hudson slave park is partitioned this way: * 3 slave pcs contain TriMedia based hardware boards. Jobs compile test suites on the pc and run the executables on those boards. * The other 9 are used for pc jobs. Jobs compile test suites on the pc and run the executables on a TriMedia simulator on the pc. I labeled the group-of-9 'windows'. In order to run a testsuite with different configurations (optimisation level, library set, etc) I use matrix jobs on my 'windows' queue. Using matrix jobs, I ran into bug 936, so I applied the workaround by adding -Dhudson.model.Hudson.flyweightSupport=true to C:\hudson\hundson.xml on the master.
After some idle time (no jobs running, master and slaves idle), the master showed a slave as offline.
on the slave, I see an Error pop-up window saying:
...
java.lang.Exception: The server rejected the connection: nlvhtcnxp1dt361 is
already connected to this master. Rejecting this connection. at
hudson.remoting.engine.Run(Engine.java:191)
...
after clicking ok on pop-up windows, the hudson slave app terminates.
restarting the hudson slave app manually seems to work fine.
- duplicates
-
JENKINS-5973 Slaves reconnecting after restarting are rejected because Hudson thinks the slave already connected
-
- Closed
-
- is duplicated by
-
JENKINS-5355 Disconnected slaves cannot reconnect again
-
- Closed
-
- is related to
-
JENKINS-28492 The server rejected the connection: *** is already connected to this master. Rejecting this connection.
-
- Resolved
-
The root cause of the issue appears that the socket communication between the slave and the master is lost in such a way that the master doesn't notice. So when the slave connects back, the master thinks it's a bogus attempt since the slave is already connected.
Do you have a NAT/firewall between a master and a slave?
One fix could be to have the master check if the slave is alive before rejecting the new incoming connection, but this may take 10s of secs as it can involve packet retransmission. Another possibility might be to let the slave send in some token so that the master can verify that it's being reconnected from what it's supposed to be currently connecting.
Still thinking about how to fix this.