We're also seeing much the same thing - the Jenkins server is saying that the (JNLP) slave dropped the connection.
We've been looking into this and we've found that (at least in our system) the problem is coming from the Slave's Windows OS itself. On the slave, we are seeing the following error logged:
java.io.IOException: An established connection was aborted by the software in your host machine.
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:55)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:235)
at sun.nio.ch.IOUtil.read(IOUtil.java:209)
at sun.nio.ch.SockerChannelImpl.read(SocketChannelImpl.java:409)
at hudson.remoting.SocketChannelStream$1.read(SocketChannelStream.java:35)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:77)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:121)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:115)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
Some searching later, we identified that "An established connection was aborted by the software in your host machine" is the Windows socket error WSACONNABORTED (10053).
Much investigation later, we've found that various Windows services running on the Slave's OS are logging (in the Windows event log) that they're restarting at the exact same time, and (more interestingly!) we've also seen that the DHCP lease was renewed at the exact time that the slave reported the connection had died.
- Note: Windows handles TCP connections differently to other operating systems. If Windows decides that the physical network layer has gone down (however briefly), it actively kills (with a WSACONNABORTED) all TCP connections that were being routed over that network, thus turning a transient outage (that normal TCP retransmissions would handle so that the user doesn't even see the problem) into an application-level outage (as the TCP connection closes, forcing the application to deal with it, usually by reporting that the connection has failed and it's "game over"). This is why a brief network outage that should cause no operational impact will result in a flurry of service restarts as they all try to handle the connection losses. Windows' (mis?)handling of this scenario has been like this for so long that I doubt Microsoft would be willing to change it now.
So I think that something, somewhere deep within Windows, is making Windows believe that it has lost the network layer. Problem is, I don't (yet) know what's doing it - all I see is a lot of symptoms of it doing it, not the root cause.
"ipconfig /release && ipconfig /renew" will cause this (even if you immediately get the same IP address back), as will unplugging/replugging your Cat5 cable or disconnecting/reconnecting your WiFi connection, or power-saving on your NIC, or reconfiguring your NIC and thus forcing a reload of the driver, or...
We've yet to find the root cause in our setup, but investigations are ongoing.
Why are you so sure about this? Pinging does not keep a connection open.