• Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Minor Minor
    • remoting
    • None
    • Jenkins 1.625.3
      Windows slaves (JNLP)

      Deadlock situations similar to those described in remoting#36

      However, in these cases the other side of the stack trace was:

      "NioChannelHub keys=3 gen=41: Computer.threadPoolForRemoting [#2]" id=224 (0xe0) state=BLOCKED cpu=76%
          - waiting to lock <0x28a9d2ba> (a hudson.remoting.Channel)
            owned by "Computer.threadPoolForRemoting [#5] for XXXXXXX" id=249 (0xf9)
          at hudson.remoting.Channel.terminate(Channel.java:833)
          at hudson.remoting.Channel$1.terminate(Channel.java:509)
          at hudson.remoting.AbstractByteArrayCommandTransport$1.terminate(AbstractByteArrayCommandTransport.java:71)
          at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:637)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)
      

      That is, an abort caused by an CancelledKeyException (see JENKINS-24050).

      The already merged solution in remoting#36 does not seem to cover all cases, as if there are not writable bytes and no one is reading (the channel is in an abnormal situation) the loop may keep going forever, maintaining the deadlock.

      As abort starts by closing the ends of the NIO channel, additional closed state change checks can be introduced in the loop providing a way out.

          [JENKINS-32825] Deadlock in Channel Abort

          Code changed in jenkins
          User: Andres Rodriguez
          Path:
          src/main/java/org/jenkinsci/remoting/nio/FifoBuffer.java
          http://jenkins-ci.org/commit/remoting/3ea38c8b95a8d4ca8e245fc0999d7e4a9a6b1424
          Log:
          JENKINS-32825 Deadlock in Channel Abort

          Additional `closed` state check introduced to break writing loop.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Andres Rodriguez Path: src/main/java/org/jenkinsci/remoting/nio/FifoBuffer.java http://jenkins-ci.org/commit/remoting/3ea38c8b95a8d4ca8e245fc0999d7e4a9a6b1424 Log: JENKINS-32825 Deadlock in Channel Abort Additional `closed` state check introduced to break writing loop.

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          src/main/java/org/jenkinsci/remoting/nio/FifoBuffer.java
          http://jenkins-ci.org/commit/remoting/8c652545dfd694537119943a14875db2d4e8437c
          Log:
          Merge pull request #71 from andresrc/JENKINS-32825

          JENKINS-32825 Deadlock in Channel Abort

          Compare: https://github.com/jenkinsci/remoting/compare/08a19d8962b2...8c652545dfd6

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: src/main/java/org/jenkinsci/remoting/nio/FifoBuffer.java http://jenkins-ci.org/commit/remoting/8c652545dfd694537119943a14875db2d4e8437c Log: Merge pull request #71 from andresrc/ JENKINS-32825 JENKINS-32825 Deadlock in Channel Abort Compare: https://github.com/jenkinsci/remoting/compare/08a19d8962b2...8c652545dfd6

          Oleg Nenashev added a comment - - edited

          This change does not 100% solve the issue since receive buffer may be never freed. We also depend on notifyAll() calls, which actually may not arrive in time.
          See JENKINS-25218 for details

          Oleg Nenashev added a comment - - edited This change does not 100% solve the issue since receive buffer may be never freed. We also depend on notifyAll() calls, which actually may not arrive in time. See JENKINS-25218 for details

            oleg_nenashev Oleg Nenashev
            andresrc Andres Rodriguez
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: