Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63520

Agent remoting deadlock after reboot

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
    • Environment:
    • Similar Issues:

      Description

      When we upgrade and reboot the Jenkins agents, sometimes they hang on startup. We have about 50 agents and we upgrade/reboot them twice a day. About 1/100 times an agent will get stuck on startup.

      On the Jenkins master, we see this error message from the hung agent's logs:

      ERROR: Connection terminated
      java.nio.channels.ClosedChannelException
      	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
      	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
      	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      From the hung agent, we see the attached jstack thread dump with deadlock found. It looks like two threads are waiting on each other which causes the deadlock. After encountering this deadlock, the agent never finishes connecting to the master. The master is unable to use the agent as a node when it reaches this hung state.

      Could the fact that the java versions are different contribute to this problem? The master has version 1.8.0_252-8u252-b09-1~18.04-b09 whereas the agents have java version 1.8.0_265-8u265-b01-0.

        Attachments

          Activity

          Hide
          jthompson Jeff Thompson added a comment -

          Uli Post, what have you observed? Is the proposed fix working well for you?

          Show
          jthompson Jeff Thompson added a comment - Uli Post , what have you observed? Is the proposed fix working well for you?
          Hide
          ulrich_post Uli Post added a comment -

          Hi Jeff, we have not seen this issue during the last 2 weeks.

          Show
          ulrich_post Uli Post added a comment - Hi Jeff, we have not seen this issue during the last 2 weeks.
          Hide
          bhartshorn Brandon added a comment -

          Hi Jeff & Uli,

          Thanks for the fix and testing. I'm part of the team that reported the issue, but we don't have bandwidth to test a custom build at the moment. Is it looking this patch will make it into a release soon?

          Show
          bhartshorn Brandon added a comment - Hi Jeff & Uli, Thanks for the fix and testing. I'm part of the team that reported the issue, but we don't have bandwidth to test a custom build at the moment. Is it looking this patch will make it into a release soon?
          Hide
          jthompson Jeff Thompson added a comment -

          It's expected to be in the next release (4.8) as soon as we can get the release out.

          Show
          jthompson Jeff Thompson added a comment - It's expected to be in the next release (4.8) as soon as we can get the release out.
          Hide
          bhartshorn Brandon added a comment -

          I see 4.8 made it into weekly ~2 weeks ago, and this issue has the lts-candidate label. I also see that LTS 2.289.1 is due in less than a week, so obviously won't include this. Can I put in a request that it be backported soon? Again, many thanks!

          Show
          bhartshorn Brandon added a comment - I see 4.8 made it into weekly ~2 weeks ago, and this issue has the lts-candidate label. I also see that LTS 2.289.1 is due in less than a week, so obviously won't include this. Can I put in a request that it be backported soon? Again, many thanks!

            People

            Assignee:
            jthompson Jeff Thompson
            Reporter:
            anhuong Anh Uong
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated: