Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63520

Agent remoting deadlock after reboot

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • remoting

    Description

      When we upgrade and reboot the Jenkins agents, sometimes they hang on startup. We have about 50 agents and we upgrade/reboot them twice a day. About 1/100 times an agent will get stuck on startup.

      On the Jenkins master, we see this error message from the hung agent's logs:

      ERROR: Connection terminated
      java.nio.channels.ClosedChannelException
      	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
      	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
      	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      From the hung agent, we see the attached jstack thread dump with deadlock found. It looks like two threads are waiting on each other which causes the deadlock. After encountering this deadlock, the agent never finishes connecting to the master. The master is unable to use the agent as a node when it reaches this hung state.

      Could the fact that the java versions are different contribute to this problem? The master has version 1.8.0_252-8u252-b09-1~18.04-b09 whereas the agents have java version 1.8.0_265-8u265-b01-0.

      Attachments

        Activity

          jthompson Jeff Thompson added a comment -

          ulrich_post, what have you observed? Is the proposed fix working well for you?

          jthompson Jeff Thompson added a comment - ulrich_post , what have you observed? Is the proposed fix working well for you?
          ulrich_post Uli Post added a comment -

          Hi Jeff, we have not seen this issue during the last 2 weeks.

          ulrich_post Uli Post added a comment - Hi Jeff, we have not seen this issue during the last 2 weeks.
          bhartshorn Brandon added a comment -

          Hi Jeff & Uli,

          Thanks for the fix and testing. I'm part of the team that reported the issue, but we don't have bandwidth to test a custom build at the moment. Is it looking this patch will make it into a release soon?

          bhartshorn Brandon added a comment - Hi Jeff & Uli, Thanks for the fix and testing. I'm part of the team that reported the issue, but we don't have bandwidth to test a custom build at the moment. Is it looking this patch will make it into a release soon?
          jthompson Jeff Thompson added a comment -

          It's expected to be in the next release (4.8) as soon as we can get the release out.

          jthompson Jeff Thompson added a comment - It's expected to be in the next release (4.8) as soon as we can get the release out.
          bhartshorn Brandon added a comment -

          I see 4.8 made it into weekly ~2 weeks ago, and this issue has the lts-candidate label. I also see that LTS 2.289.1 is due in less than a week, so obviously won't include this. Can I put in a request that it be backported soon? Again, many thanks!

          bhartshorn Brandon added a comment - I see 4.8 made it into weekly ~2 weeks ago, and this issue has the lts-candidate label. I also see that LTS 2.289.1 is due in less than a week, so obviously won't include this. Can I put in a request that it be backported soon? Again, many thanks!

          People

            jthompson Jeff Thompson
            anhuong Anh Uong
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: