-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Jenkins 2.19.3
We have an issue where windows slaves fall off line every time our infrastructure team patches them. The scenario is simply this.
- The machines get patched with the lastest windows patches.
- This triggers a reboot.
- The slave service shuts down with a log entry in the jenkins-slave.wrapper log to the effect of:
2017-03-27 07:50:19 - Shutdown exception Message:A system shutdown is in progress. (Exception from HRESULT: 0x8007045B) Stacktrace: at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo) at System.Management.ManagementScope.InitializeGuts(Object o) at System.Management.ManagementScope.Initialize() at System.Management.ManagementObjectSearcher.Initialize() at System.Management.ManagementObjectSearcher.Get() at winsw.WrapperService.GetChildPids(Int32 pid) at winsw.WrapperService.StopProcessAndChildren(Int32 pid) at winsw.WrapperService.StopIt() at winsw.WrapperService.OnShutdown()
- (4) The slave restarts and we see this in the jenkins-slave_<date>.err log:
Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: sv20-jenddb-001 Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [https://jenkins.core.cvent.org/] Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins.core.cvent.org:55087 Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server reports protocol JNLP3-connect not supported, skipping Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP2-connect Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server didn't accept the handshake: sv20-jenddb-001 is already connected to this master. Rejecting this connection. Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins.core.cvent.org:55087 Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP-connect Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Server didn't accept the handshake: sv20-jenddb-001 is already connected to this master. Rejecting this connection. Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins.core.cvent.org:55087 Mar 27, 2017 7:52:52 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: The server rejected the connection: None of the protocols were accepted java.lang.Exception: The server rejected the connection: None of the protocols were accepted at hudson.remoting.Engine.onConnectionRejected(Engine.java:380) at hudson.remoting.Engine.run(Engine.java:352)
We then go in and restart the slave service manually and everything is fine.
What seems to be happening is that when the slave service shuts down due to a system shutdown request, it fails to notify the master that it is shutting down. As a result, when it starts back up after the reboot, the master still thinks it is connected and refuses to allow it to connect. By the time we get in there to manually restart the service, the master realized the slave is off line, so the service restart/reconnection works fine at that point.
It seems there are two possible solutions here:
- The slave should notify the master that it is shutting down so that the master will not still think it is 'online'.
- The master, when it receives a connection request for a slave that it thinks is 'online' should verify that the old connection is really still active before refusing to accept the new one.
Or do both?
Note we are able to reproduce this simply by rebooting a windows slave. It always fails to reconnect as described.
- duplicates
-
JENKINS-22692 Jenkins Windows-Slave throwing exception on shutdown causes connection reset issues
- Resolved