Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68122

Agent connection broken (randomly) with error java.util.concurrent.TimeoutException (regression in 2.325)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • Jenkins 2.332.1 on Ubuntu 18.04 with OpenJDk 11.0.14
      Amazon EC2 plugin 1.68
    • 2.343, 2.332.3

      After upgrade Jenkins from 2.319.2 to 2.332.1 we start experience with EC2 agent connection broken with time out ping thread error:  

      java.util.concurrent.TimeoutException: Ping started at 1648107727099 hasn't completed by 1648107967100java.util.concurrent.TimeoutException: Ping started at 1648107727099 hasn't completed by 1648107967100 at hudson.remoting.PingThread.ping(PingThread.java:132) at hudson.remoting.PingThread.run(PingThread.java:88)

      This happen randomly and the build job is hung at pipeline git check out stage. When the agent connection broken, we can re-launch and it reconnect but the build job seem no longer can access the agent and just stall until cancel. While this happen, other EC2 agent still running and ping at os level from master to agent in question still get response. We tried to disable "Response Time" in Preventive Node monitoring (manage Node and Cloud). This just delay the broken connection from 2 missing ping to 5 or 6 as the master continue to monitor disk space, swap... Kill the job and rebuild will success most of the time (some time stuck on the same broken connection). 

        1. java_deadlock_dump_1
          30 kB
          Robert Andersson
        2. java_deadlock_dump_2
          37 kB
          Robert Andersson
        3. java_deadlock_dump_3
          18 kB
          Robert Andersson
        4. threaddump_showing_classloader_sync_hang.txt
          19 kB
          Robert Andersson

            basil Basil Crow
            kapawo Kapa Wo
            Votes:
            5 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: