Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68122

Agent connection broken (randomly) with error java.util.concurrent.TimeoutException (regression in 2.325)

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • Jenkins 2.332.1 on Ubuntu 18.04 with OpenJDk 11.0.14
      Amazon EC2 plugin 1.68
    • 2.343, 2.332.3

      After upgrade Jenkins from 2.319.2 to 2.332.1 we start experience with EC2 agent connection broken with time out ping thread error:  

      java.util.concurrent.TimeoutException: Ping started at 1648107727099 hasn't completed by 1648107967100java.util.concurrent.TimeoutException: Ping started at 1648107727099 hasn't completed by 1648107967100 at hudson.remoting.PingThread.ping(PingThread.java:132) at hudson.remoting.PingThread.run(PingThread.java:88)

      This happen randomly and the build job is hung at pipeline git check out stage. When the agent connection broken, we can re-launch and it reconnect but the build job seem no longer can access the agent and just stall until cancel. While this happen, other EC2 agent still running and ping at os level from master to agent in question still get response. We tried to disable "Response Time" in Preventive Node monitoring (manage Node and Cloud). This just delay the broken connection from 2 missing ping to 5 or 6 as the master continue to monitor disk space, swap... Kill the job and rebuild will success most of the time (some time stuck on the same broken connection). 

          [JENKINS-68122] Agent connection broken (randomly) with error java.util.concurrent.TimeoutException (regression in 2.325)

          Kapa Wo created issue -

          Kapa Wo added a comment - - edited

          jstack on failed agent found one dead lock

          Found one Java-level deadlock:
          =============================
          "RemoteInvocationHandler [#1]":
            waiting to lock monitor 0x00007fcf28002980 (object 0x0000000624c21510, a hudson.util.RingBufferLogHandler),
            which is held by "Channel reader thread: channel"
          "Channel reader thread: channel":
            waiting to lock monitor 0x00007fcf28006580 (object 0x0000000624c00ce0, a hudson.remoting.RemoteClassLoader),
            which is held by "pool-1-thread-1 for channel id=10591761"
          "pool-1-thread-1 for channel id=10591761":
            waiting to lock monitor 0x00007fcf28002980 (object 0x0000000624c21510, a hudson.util.RingBufferLogHandler),
            which is held by "Channel reader thread: channel"Java stack information for the threads listed above:
          ===================================================
          "RemoteInvocationHandler [#1]":
                  at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78)
                  - waiting to lock <0x0000000624c21510> (a hudson.util.RingBufferLogHandler)
                  at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:979)
                  at java.util.logging.Logger.doLog(java.logging@11.0.14/Logger.java:1006)
                  at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:1051)
                  at hudson.remoting.RemoteInvocationHandler$Unexporter.reportStats(RemoteInvocationHandler.java:702)
                  at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:594)
                  at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.14/Executors.java:515)
                  at java.util.concurrent.FutureTask.run(java.base@11.0.14/FutureTask.java:264)
                  at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:121)
                  at java.lang.Thread.run(java.base@11.0.14/Thread.java:829)
          "Channel reader thread: channel":
                  at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78)
                  - locked <0x0000000624c21510> (a hudson.util.RingBufferLogHandler)
                  at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:979)
                  at java.util.logging.Logger.doLog(java.logging@11.0.14/Logger.java:1006)
                  at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:1092)
                  at hudson.remoting.Channel$1.handle(Channel.java:608)
                  at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:81)
          "pool-1-thread-1 for channel id=10591761":
                  at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78)
                  - waiting to lock <0x0000000624c21510> (a hudson.util.RingBufferLogHandler)
                  at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:979)
                  at java.util.logging.Logger.doLog(java.logging@11.0.14/Logger.java:1006)
                  at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:1092)
                  at hudson.remoting.RemoteClassLoader.prefetchClassReference(RemoteClassLoader.java:387)
                  - locked <0x0000000624ad56f0> (a java.util.Collections$SynchronizedMap)
                  at hudson.remoting.RemoteClassLoader.loadWithMultiClassLoader(RemoteClassLoader.java:253)
                  at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:223)
                  at java.lang.ClassLoader.loadClass(java.base@11.0.14/ClassLoader.java:589)
                  - locked <0x0000000624c00ce0> (a hudson.remoting.RemoteClassLoader)
                  at java.lang.ClassLoader.loadClass(java.base@11.0.14/ClassLoader.java:522)
                  at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:123)
                  at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:118)
                  at hudson.remoting.UserRequest.perform(UserRequest.java:211)
                  at hudson.remoting.UserRequest.perform(UserRequest.java:54)
                  at hudson.remoting.Request$2.run(Request.java:376)
                  at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
                  at hudson.remoting.InterceptingExecutorService$$Lambda$44/0x0000000840098840.call(Unknown Source)
                  at java.util.concurrent.FutureTask.run(java.base@11.0.14/FutureTask.java:264)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.14/ThreadPoolExecutor.java:1128)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.14/ThreadPoolExecutor.java:628)
                  at java.lang.Thread.run(java.base@11.0.14/Thread.java:829)Found 1 deadlock.
          
          

          Kapa Wo added a comment - - edited jstack on failed agent found one dead lock Found one Java-level deadlock: ============================= "RemoteInvocationHandler [#1]" : waiting to lock monitor 0x00007fcf28002980 (object 0x0000000624c21510, a hudson.util.RingBufferLogHandler), which is held by "Channel reader thread: channel" "Channel reader thread: channel" : waiting to lock monitor 0x00007fcf28006580 (object 0x0000000624c00ce0, a hudson.remoting.RemoteClassLoader), which is held by "pool-1-thread-1 for channel id=10591761" "pool-1-thread-1 for channel id=10591761" : waiting to lock monitor 0x00007fcf28002980 (object 0x0000000624c21510, a hudson.util.RingBufferLogHandler), which is held by "Channel reader thread: channel" Java stack information for the threads listed above: =================================================== "RemoteInvocationHandler [#1]" : at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78) - waiting to lock <0x0000000624c21510> (a hudson.util.RingBufferLogHandler) at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:979) at java.util.logging.Logger.doLog(java.logging@11.0.14/Logger.java:1006) at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:1051) at hudson.remoting.RemoteInvocationHandler$Unexporter.reportStats(RemoteInvocationHandler.java:702) at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:594) at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.14/Executors.java:515) at java.util.concurrent.FutureTask.run(java.base@11.0.14/FutureTask.java:264) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:121) at java.lang. Thread .run(java.base@11.0.14/ Thread .java:829) "Channel reader thread: channel" : at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78) - locked <0x0000000624c21510> (a hudson.util.RingBufferLogHandler) at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:979) at java.util.logging.Logger.doLog(java.logging@11.0.14/Logger.java:1006) at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:1092) at hudson.remoting.Channel$1.handle(Channel.java:608) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:81) "pool-1-thread-1 for channel id=10591761" : at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78) - waiting to lock <0x0000000624c21510> (a hudson.util.RingBufferLogHandler) at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:979) at java.util.logging.Logger.doLog(java.logging@11.0.14/Logger.java:1006) at java.util.logging.Logger.log(java.logging@11.0.14/Logger.java:1092) at hudson.remoting.RemoteClassLoader.prefetchClassReference(RemoteClassLoader.java:387) - locked <0x0000000624ad56f0> (a java.util.Collections$SynchronizedMap) at hudson.remoting.RemoteClassLoader.loadWithMultiClassLoader(RemoteClassLoader.java:253) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:223) at java.lang. ClassLoader .loadClass(java.base@11.0.14/ ClassLoader .java:589) - locked <0x0000000624c00ce0> (a hudson.remoting.RemoteClassLoader) at java.lang. ClassLoader .loadClass(java.base@11.0.14/ ClassLoader .java:522) at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:123) at hudson.node_monitors.SwapSpaceMonitor$MonitorTask.call(SwapSpaceMonitor.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:211) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:376) at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78) at hudson.remoting.InterceptingExecutorService$$Lambda$44/0x0000000840098840.call(Unknown Source) at java.util.concurrent.FutureTask.run(java.base@11.0.14/FutureTask.java:264) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.14/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.14/ThreadPoolExecutor.java:628) at java.lang. Thread .run(java.base@11.0.14/ Thread .java:829)Found 1 deadlock.
          Kapa Wo made changes -
          Component/s New: ssh-slaves-plugin [ 15578 ]
          Labels New: plugin slave
          Kapa Wo made changes -
          Summary Original: EC2 Agent connection broken (randomly) with error java.util.concurrent.TimeoutException New: Slave connection broken (randomly) with error java.util.concurrent.TimeoutException

          Kapa Wo added a comment -

          Disable ping tread on both server and agent per document, disable node monitor in Global Setting. Still got error randomly.

          This is a Unix agent
          WARNING: An illegal reflective access operation has occurred
          WARNING: Illegal reflective access by jenkins.slaves.StandardOutputSwapper$ChannelSwapper to constructor java.io.FileDescriptor(int)
          WARNING: Please consider reporting this to the maintainers of jenkins.slaves.StandardOutputSwapper$ChannelSwapper
          WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
          WARNING: All illegal access operations will be denied in a future release
          Evacuated stdout
          Agent successfully connected and online
          ERROR: Failed to monitor for Architecture
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          ERROR: Failed to monitor for Free Swap Space
          ERROR: Failed to monitor for Clock Difference
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          ERROR: Failed to monitor for Response Time
          ERROR: Failed to monitor for Free Disk Space
          ERROR: Failed to monitor for Free Temp Space
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:56)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)ERROR: Failed to monitor for Architecture
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          ERROR: Failed to monitor for Clock Difference
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          ERROR: Failed to monitor for Free Temp Space
          ERROR: Failed to monitor for Response Time
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          ERROR: ERROR: Failed to monitor for Free Disk Space
          Failed to monitor for Free Swap Space
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:56)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:321)
          	at hudson.remoting.Request$1.get(Request.java:240)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
          	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
          	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
          
          

          Kapa Wo added a comment - Disable ping tread on both server and agent per document , disable node monitor in Global Setting. Still got error randomly. This is a Unix agent WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by jenkins.slaves.StandardOutputSwapper$ChannelSwapper to constructor java.io.FileDescriptor( int ) WARNING: Please consider reporting this to the maintainers of jenkins.slaves.StandardOutputSwapper$ChannelSwapper WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Evacuated stdout Agent successfully connected and online ERROR: Failed to monitor for Architecture java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Free Swap Space ERROR: Failed to monitor for Clock Difference java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Response Time ERROR: Failed to monitor for Free Disk Space ERROR: Failed to monitor for Free Temp Space java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:56) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)ERROR: Failed to monitor for Architecture java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Clock Difference java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Free Temp Space ERROR: Failed to monitor for Response Time java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: ERROR: Failed to monitor for Free Disk Space Failed to monitor for Free Swap Space java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:56) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)

          We're seeing what I believe is the same bug. It started after we upgraded to 2.332.1 last week.

          19 out of 20 times the agent goes into a locked state sometime after it has connected to the master but before it receives a job to execute.

          Pretty much a total showstopper for us, Jenkins is unusable at the moment.  Really hoping there is some workaround for this, like downgrading to the previous agent.jar perhaps?

          I've taken a number of thread dumps of hanging agent.jar processes. They're all slightly different. What they all have in common is multiple threads stuck waiting:
          at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78)

          I'll upload a couple of the dumps, but they're really all much the same as what kapawo has uploaded.

          Robert Andersson added a comment - We're seeing what I believe is the same bug. It started after we upgraded to 2.332.1 last week. 19 out of 20 times the agent goes into a locked state sometime after it has connected to the master but before it receives a job to execute. Pretty much a total showstopper for us, Jenkins is unusable at the moment.  Really hoping there is some workaround for this, like downgrading to the previous agent.jar perhaps? I've taken a number of thread dumps of hanging agent.jar processes. They're all slightly different. What they all have in common is multiple threads stuck waiting: at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78) I'll upload a couple of the dumps, but they're really all much the same as what kapawo  has uploaded.
          Robert Andersson made changes -
          Attachment New: java_deadlock_dump_3 [ 57623 ]
          Attachment New: java_deadlock_dump_2 [ 57624 ]
          Attachment New: java_deadlock_dump_1 [ 57625 ]
          Luca Naldini made changes -
          Component/s New: google-admin-sdk-plugin [ 23744 ]
          Luca Naldini made changes -
          Component/s Original: google-admin-sdk-plugin [ 23744 ]

          Jorge Torres Martinez added a comment - - edited

          We are also hitting this while testing upgrade to 2.332.1 from 2.319.3 . We are hitting this issue when trying to add agents via the swarm-plugin. Our pipeline hangs in the node step and this deadlock is found on the agent's side:

          Found one Java-level deadlock:
          =============================
          "pool-1-thread-16":
            waiting to lock monitor 0x00007f2f04007ab8 (object 0x000000008d73ebe8, a hudson.remoting.RemoteClassLoader),
            which is held by "pool-1-thread-8 / waiting for JNLP4-connect connection to <Jenkins Controller> id=58"
          "pool-1-thread-8 / waiting for JNLP4-connect connection to <Jenkins Controller> id=58":
            waiting to lock monitor 0x00007f2f040035f8 (object 0x00000000f1d1ba30, a hudson.util.RingBufferLogHandler),
            which is held by "pool-1-thread-16"
          
          
          Java stack information for the threads listed above:
          ===================================================
          "pool-1-thread-16":
          	at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78)
          	- locked <0x00000000f1d1ba30> (a hudson.util.RingBufferLogHandler)
          	at java.util.logging.Logger.log(Logger.java:738)
          	at java.util.logging.Logger.doLog(Logger.java:765)
          	at java.util.logging.Logger.log(Logger.java:831)
          	at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Writer.run(BIONetworkLayer.java:184)
          	- locked <0x000000008d6cfea8> (a org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Writer)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:122)
          	at hudson.remoting.Engine$1$$Lambda$12/910059620.run(Unknown Source)
          	at java.lang.Thread.run(Thread.java:748)
          "pool-1-thread-8 / waiting for JNLP4-connect connection to <Jenkins Controller> id=58":
          	at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78)	
          - waiting to lock <0x00000000f1d1ba30> (a hudson.util.RingBufferLogHandler)
          	at java.util.logging.Logger.log(Logger.java:738)
          	at java.util.logging.Logger.doLog(Logger.java:765)
          	at java.util.logging.Logger.log(Logger.java:851)
          	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:486)
          	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:246)
          	- locked <0x000000008d6cfc08> (a java.lang.Object)
          	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:198)
          	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doSend(ProtocolStack.java:700)
          	at org.jenkinsci.remoting.protocol.ApplicationLayer.write(ApplicationLayer.java:156)
          	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.write(ChannelApplicationLayer.java:325)
          	at hudson.remoting.AbstractByteBufferCommandTransport.write(AbstractByteBufferCommandTransport.java:301)
          	at hudson.remoting.Channel.send(Channel.java:766)
          	- locked <0x000000008d6a2ef8> (a hudson.remoting.Channel)
          	at hudson.remoting.Request.call(Request.java:167)
          	- locked <0x00000000dd8c3f08> (a hudson.remoting.RemoteInvocationHandler$RPCRequest)
          	- locked <0x000000008d6a2ef8> (a hudson.remoting.Channel)
          	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:289)
          	at com.sun.proxy.$Proxy6.fetch3(Unknown Source)
          	at hudson.remoting.RemoteClassLoader.prefetchClassReference(RemoteClassLoader.java:348)
          	at hudson.remoting.RemoteClassLoader.loadWithMultiClassLoader(RemoteClassLoader.java:253)
          	at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:223)
          	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
          	- locked <0x000000008d73ebe8> (a hudson.remoting.RemoteClassLoader)
          	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
          	at java.lang.Class.getDeclaringClass0(Native Method)
          	at java.lang.Class.getDeclaringClass(Class.java:1235)
          	at java.lang.Class.getEnclosingClass(Class.java:1277)
          	at java.lang.Class.getSimpleBinaryName(Class.java:1443)
          	at java.lang.Class.getSimpleName(Class.java:1309)
          	at java.lang.Class.isAnonymousClass(Class.java:1411)
          	at org.jenkinsci.remoting.util.AnonymousClassWarnings.doCheck(AnonymousClassWarnings.java:76)
          	at org.jenkinsci.remoting.util.AnonymousClassWarnings.lambda$check$0(AnonymousClassWarnings.java:66)
          	at org.jenkinsci.remoting.util.AnonymousClassWarnings$$Lambda$26/541145666.run(Unknown Source)
          	at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
          	at hudson.remoting.InterceptingExecutorService$$Lambda$27/1987554325.call(Unknown Source)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:122)
          	at hudson.remoting.Engine$1$$Lambda$12/910059620.run(Unknown Source)
          	at java.lang.Thread.run(Thread.java:748)
          
          
          Found 1 deadlock. 

          Looking at the code SSLEngineFilterLayer#L486 and BIONetworkLayer.java#L184 seems to me that this problem presents itself when logLevel is set to FINEST. I have been able to work around it by modifying my logging.properties.

          Jorge Torres Martinez added a comment - - edited We are also hitting this while testing upgrade to 2.332.1 from 2.319.3 . We are hitting this issue when trying to add agents via the swarm-plugin. Our pipeline hangs in the node step and this deadlock is found on the agent's side: Found one Java-level deadlock: ============================= "pool-1-thread-16":   waiting to lock monitor 0x00007f2f04007ab8 (object 0x000000008d73ebe8, a hudson.remoting.RemoteClassLoader),   which is held by "pool-1-thread-8 / waiting for JNLP4-connect connection to <Jenkins Controller> id=58" "pool-1-thread-8 / waiting for JNLP4-connect connection to <Jenkins Controller> id=58":   waiting to lock monitor 0x00007f2f040035f8 (object 0x00000000f1d1ba30, a hudson.util.RingBufferLogHandler),   which is held by "pool-1-thread-16" Java stack information for the threads listed above: =================================================== "pool-1-thread-16": at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78) - locked <0x00000000f1d1ba30> (a hudson.util.RingBufferLogHandler) at java.util.logging.Logger.log(Logger.java:738) at java.util.logging.Logger.doLog(Logger.java:765) at java.util.logging.Logger.log(Logger.java:831) at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Writer.run(BIONetworkLayer.java:184) - locked <0x000000008d6cfea8> (a org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Writer) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:122) at hudson.remoting.Engine$1$$Lambda$12/910059620.run(Unknown Source) at java.lang.Thread.run(Thread.java:748) "pool-1-thread-8 / waiting for JNLP4-connect connection to <Jenkins Controller> id=58": at hudson.util.RingBufferLogHandler.publish(RingBufferLogHandler.java:78) - waiting to lock <0x00000000f1d1ba30> (a hudson.util.RingBufferLogHandler) at java.util.logging.Logger.log(Logger.java:738) at java.util.logging.Logger.doLog(Logger.java:765) at java.util.logging.Logger.log(Logger.java:851) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:486) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:246) - locked <0x000000008d6cfc08> (a java.lang.Object) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:198) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doSend(ProtocolStack.java:700) at org.jenkinsci.remoting.protocol.ApplicationLayer.write(ApplicationLayer.java:156) at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.write(ChannelApplicationLayer.java:325) at hudson.remoting.AbstractByteBufferCommandTransport.write(AbstractByteBufferCommandTransport.java:301) at hudson.remoting.Channel.send(Channel.java:766) - locked <0x000000008d6a2ef8> (a hudson.remoting.Channel) at hudson.remoting.Request.call(Request.java:167) - locked <0x00000000dd8c3f08> (a hudson.remoting.RemoteInvocationHandler$RPCRequest) - locked <0x000000008d6a2ef8> (a hudson.remoting.Channel) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:289) at com.sun.proxy.$Proxy6.fetch3(Unknown Source) at hudson.remoting.RemoteClassLoader.prefetchClassReference(RemoteClassLoader.java:348) at hudson.remoting.RemoteClassLoader.loadWithMultiClassLoader(RemoteClassLoader.java:253) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:223) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) - locked <0x000000008d73ebe8> (a hudson.remoting.RemoteClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.getDeclaringClass0(Native Method) at java.lang.Class.getDeclaringClass(Class.java:1235) at java.lang.Class.getEnclosingClass(Class.java:1277) at java.lang.Class.getSimpleBinaryName(Class.java:1443) at java.lang.Class.getSimpleName(Class.java:1309) at java.lang.Class.isAnonymousClass(Class.java:1411) at org.jenkinsci.remoting.util.AnonymousClassWarnings.doCheck(AnonymousClassWarnings.java:76) at org.jenkinsci.remoting.util.AnonymousClassWarnings.lambda$check$0(AnonymousClassWarnings.java:66) at org.jenkinsci.remoting.util.AnonymousClassWarnings$$Lambda$26/541145666.run(Unknown Source) at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78) at hudson.remoting.InterceptingExecutorService$$Lambda$27/1987554325.call(Unknown Source) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:122) at hudson.remoting.Engine$1$$Lambda$12/910059620.run(Unknown Source) at java.lang.Thread.run(Thread.java:748) Found 1 deadlock. Looking at the code SSLEngineFilterLayer#L486  and BIONetworkLayer.java#L184  seems to me that this problem presents itself when logLevel is set to FINEST . I have been able to work around it by modifying my logging.properties.

            basil Basil Crow
            kapawo Kapa Wo
            Votes:
            5 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: