-
Bug
-
Resolution: Fixed
-
Major
-
-
remoting:3071.v7e9b_0dc08466, 2.375.1
Static agents using websockets are sporadically disconnected and they do not reconnect.
Inspecting agent thread dumps, it is observed a potential deadlock:
java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method) - parking to wait for <0x00000000c7f230f0> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.16/LockSupport.java:194) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.16/AbstractQueuedSynchronizer.java:885) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.16/AbstractQueuedSynchronizer.java:1039) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.16/AbstractQueuedSynchronizer.java:1345) at java.util.concurrent.CountDownLatch.await(java.base@11.0.16/CountDownLatch.java:232) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusFuture.get(TyrusFuture.java:53) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusRemoteEndpoint$Basic.processFuture(TyrusRemoteEndpoint.java:149) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusRemoteEndpoint$Basic.sendBinary(TyrusRemoteEndpoint.java:131) at hudson.remoting.Engine$1AgentEndpoint$Transport.write(Engine.java:646) at hudson.remoting.AbstractByteBufferCommandTransport.write(AbstractByteBufferCommandTransport.java:303) at hudson.remoting.Channel.send(Channel.java:765) - locked <0x00000000c5f65b08> (a hudson.remoting.Channel) at hudson.remoting.Request$2.run(Request.java:389) at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78) at hudson.remoting.InterceptingExecutorService$$Lambda$90/0x000000084016c440.call(Unknown Source) at java.util.concurrent.FutureTask.run(java.base@11.0.16/FutureTask.java:264) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.16/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.16/ThreadPoolExecutor.java:628) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:126) at hudson.remoting.Engine$1$$Lambda$91/0x000000084016b840.run(Unknown Source) at java.lang.Thread.run(java.base@11.0.16/Thread.java:829) Locked ownable synchronizers: - <0x00000000c7f231d0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
And
java.lang.Thread.State: BLOCKED (on object monitor) at hudson.remoting.Channel.terminate(Channel.java:1068) - waiting to lock <0x00000000c5f65b08> (a hudson.remoting.Channel) at hudson.remoting.Channel$1.terminate(Channel.java:620) at hudson.remoting.AbstractByteBufferCommandTransport.terminate(AbstractByteBufferCommandTransport.java:314) at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:629) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1235) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:110) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:481) - locked <0x00000000c5f6c7c0> (a io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:244)
Even if the deadlock is not showing up clearly, it seems to be there. One thread is locking the `Channel` and waiting for some Tyrus stuff and the other is locking Tyrus stuff and waiting to lock the `Channel`.
- links to