Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68122

Agent connection broken (randomly) with error java.util.concurrent.TimeoutException (regression in 2.325)

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • Jenkins 2.332.1 on Ubuntu 18.04 with OpenJDk 11.0.14
      Amazon EC2 plugin 1.68
    • 2.343, 2.332.3

      After upgrade Jenkins from 2.319.2 to 2.332.1 we start experience with EC2 agent connection broken with time out ping thread error:  

      java.util.concurrent.TimeoutException: Ping started at 1648107727099 hasn't completed by 1648107967100java.util.concurrent.TimeoutException: Ping started at 1648107727099 hasn't completed by 1648107967100 at hudson.remoting.PingThread.ping(PingThread.java:132) at hudson.remoting.PingThread.run(PingThread.java:88)

      This happen randomly and the build job is hung at pipeline git check out stage. When the agent connection broken, we can re-launch and it reconnect but the build job seem no longer can access the agent and just stall until cancel. While this happen, other EC2 agent still running and ping at os level from master to agent in question still get response. We tried to disable "Response Time" in Preventive Node monitoring (manage Node and Cloud). This just delay the broken connection from 2 missing ping to 5 or 6 as the master continue to monitor disk space, swap... Kill the job and rebuild will success most of the time (some time stuck on the same broken connection). 

          [JENKINS-68122] Agent connection broken (randomly) with error java.util.concurrent.TimeoutException (regression in 2.325)

          Kapa Wo created issue -
          Kapa Wo made changes -
          Component/s New: ssh-slaves-plugin [ 15578 ]
          Labels New: plugin slave
          Kapa Wo made changes -
          Summary Original: EC2 Agent connection broken (randomly) with error java.util.concurrent.TimeoutException New: Slave connection broken (randomly) with error java.util.concurrent.TimeoutException
          Robert Andersson made changes -
          Attachment New: java_deadlock_dump_3 [ 57623 ]
          Attachment New: java_deadlock_dump_2 [ 57624 ]
          Attachment New: java_deadlock_dump_1 [ 57625 ]
          Luca Naldini made changes -
          Component/s New: google-admin-sdk-plugin [ 23744 ]
          Luca Naldini made changes -
          Component/s Original: google-admin-sdk-plugin [ 23744 ]
          Jorge Torres Martinez made changes -
          Component/s New: remoting [ 15489 ]
          Component/s New: swarm-plugin [ 15741 ]
          Luca Naldini made changes -
          Component/s New: google-compute-engine-plugin [ 23530 ]
          Basil Crow made changes -
          Component/s New: core [ 15593 ]
          Component/s Original: ec2-plugin [ 15625 ]
          Component/s Original: google-compute-engine-plugin [ 23530 ]
          Component/s Original: remoting [ 15489 ]
          Component/s Original: ssh-slaves-plugin [ 15578 ]
          Component/s Original: swarm-plugin [ 15741 ]
          Labels Original: plugin slave New: lts-candidate regression
          Robert Andersson made changes -
          Jesse Glick made changes -
          Assignee New: Jesse Glick [ jglick ]

            basil Basil Crow
            kapawo Kapa Wo
            Votes:
            5 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: