Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63520

Agent remoting deadlock after reboot

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
    • Environment:
    • Similar Issues:

      Description

      When we upgrade and reboot the Jenkins agents, sometimes they hang on startup. We have about 50 agents and we upgrade/reboot them twice a day. About 1/100 times an agent will get stuck on startup.

      On the Jenkins master, we see this error message from the hung agent's logs:

      ERROR: Connection terminated
      java.nio.channels.ClosedChannelException
      	at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
      	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
      	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      From the hung agent, we see the attached jstack thread dump with deadlock found. It looks like two threads are waiting on each other which causes the deadlock. After encountering this deadlock, the agent never finishes connecting to the master. The master is unable to use the agent as a node when it reaches this hung state.

      Could the fact that the java versions are different contribute to this problem? The master has version 1.8.0_252-8u252-b09-1~18.04-b09 whereas the agents have java version 1.8.0_265-8u265-b01-0.

        Attachments

          Activity

          anhuong Anh Uong created issue -
          docwhat Christian Höltje made changes -
          Field Original Value New Value
          Description When we upgrade and reboot the Jenkins agents, sometimes they hang on startup. We have about 50 agents and we upgrade/reboot them twice a day. About 1/100 times an agent will get stuck on startup. On Jenkins, we see the below error message from the hung agent's logs:

          {quote}
          ERROR: Connection terminated
          java.nio.channels.ClosedChannelException
          at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
          at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
          at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          {quote}

          From the hung agent, we see the attached jstack thread dump with deadlock found. It looks like two threads are waiting on each other which causes the deadlock. After encountering this deadlock, the agent never finishes connecting to the master. The master is unable to use the agent as a node when it reaches this hung state.

          Could the fact that the java versions are different contribute to this problem? The master agent has version 1.8.0_252-8u252-b09-1~18.04-b09 whereas the pickle agents have java version 1.8.0_265-8u265-b01-0.
          When we upgrade and reboot the Jenkins agents, sometimes they hang on startup. We have about 50 agents and we upgrade/reboot them twice a day. About 1/100 times an agent will get stuck on startup. On Jenkins, we see the below error message from the hung agent's logs:

          {quote}
          ERROR: Connection terminated
          java.nio.channels.ClosedChannelException
          at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
          at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
          at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          {quote}

          From the hung agent, we see the attached jstack thread dump with deadlock found. It looks like two threads are waiting on each other which causes the deadlock. After encountering this deadlock, the agent never finishes connecting to the master. The master is unable to use the agent as a node when it reaches this hung state.

          Could the fact that the java versions are different contribute to this problem? The master has version 1.8.0_252-8u252-b09-1~18.04-b09 whereas the pickle agents have java version 1.8.0_265-8u265-b01-0.
          docwhat Christian Höltje made changes -
          Description When we upgrade and reboot the Jenkins agents, sometimes they hang on startup. We have about 50 agents and we upgrade/reboot them twice a day. About 1/100 times an agent will get stuck on startup. On Jenkins, we see the below error message from the hung agent's logs:

          {quote}
          ERROR: Connection terminated
          java.nio.channels.ClosedChannelException
          at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
          at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
          at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          {quote}

          From the hung agent, we see the attached jstack thread dump with deadlock found. It looks like two threads are waiting on each other which causes the deadlock. After encountering this deadlock, the agent never finishes connecting to the master. The master is unable to use the agent as a node when it reaches this hung state.

          Could the fact that the java versions are different contribute to this problem? The master has version 1.8.0_252-8u252-b09-1~18.04-b09 whereas the pickle agents have java version 1.8.0_265-8u265-b01-0.
          When we upgrade and reboot the Jenkins agents, sometimes they hang on startup. We have about 50 agents and we upgrade/reboot them twice a day. About 1/100 times an agent will get stuck on startup. On Jenkins, we see the below error message from the hung agent's logs:

          {noformat}
          ERROR: Connection terminated
          java.nio.channels.ClosedChannelException
          at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
          at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
          at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          {noformat}

          From the hung agent, we see the attached {{jstack}} thread dump with deadlock found. It looks like two threads are waiting on each other which causes the deadlock. After encountering this deadlock, the agent never finishes connecting to the master. The master is unable to use the agent as a node when it reaches this hung state.

          Could the fact that the java versions are different contribute to this problem? The master has version {{1.8.0_252-8u252-b09-1~18.04-b09}} whereas the agents have java version {{1.8.0_265-8u265-b01-0}}.
          docwhat Christian Höltje made changes -
          Description When we upgrade and reboot the Jenkins agents, sometimes they hang on startup. We have about 50 agents and we upgrade/reboot them twice a day. About 1/100 times an agent will get stuck on startup. On Jenkins, we see the below error message from the hung agent's logs:

          {noformat}
          ERROR: Connection terminated
          java.nio.channels.ClosedChannelException
          at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
          at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
          at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          {noformat}

          From the hung agent, we see the attached {{jstack}} thread dump with deadlock found. It looks like two threads are waiting on each other which causes the deadlock. After encountering this deadlock, the agent never finishes connecting to the master. The master is unable to use the agent as a node when it reaches this hung state.

          Could the fact that the java versions are different contribute to this problem? The master has version {{1.8.0_252-8u252-b09-1~18.04-b09}} whereas the agents have java version {{1.8.0_265-8u265-b01-0}}.
          When we upgrade and reboot the Jenkins agents, sometimes they hang on startup. We have about 50 agents and we upgrade/reboot them twice a day. About 1/100 times an agent will get stuck on startup.

          On the Jenkins master, we see this error message from the hung agent's logs:

          {noformat}
          ERROR: Connection terminated
          java.nio.channels.ClosedChannelException
          at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
          at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
          at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          {noformat}

          From the hung agent, we see the attached {{jstack}} thread dump with deadlock found. It looks like two threads are waiting on each other which causes the deadlock. After encountering this deadlock, the agent never finishes connecting to the master. The master is unable to use the agent as a node when it reaches this hung state.

          Could the fact that the java versions are different contribute to this problem? The master has version {{1.8.0_252-8u252-b09-1~18.04-b09}} whereas the agents have java version {{1.8.0_265-8u265-b01-0}}.
          oleg_nenashev Oleg Nenashev made changes -
          Labels lts-candidate

            People

            Assignee:
            jthompson Jeff Thompson
            Reporter:
            anhuong Anh Uong
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated: