Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25218

Channel hangs due to the infinite loop in FifoBuffer within the lock

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core, remoting
    • None

      I noticed the following "dead lock" that prevents NioChannelHub from serving any channels, which breaks all the slaves.

      NioChannelHub thread is blocked:
      
          "NioChannelHub keys=2 gen=185197: Computer.threadPoolForRemoting [#3]" daemon prio=10 tid=0x00007f872c021800 nid=0x1585 waiting for monitor entry [0x00007f86ce2ba000]
             java.lang.Thread.State: BLOCKED (on object monitor)
      	    at hudson.remoting.Channel.terminate(Channel.java:792)
      	    - waiting to lock <0x00007f874ef76658> (a hudson.remoting.Channel)
      	    at hudson.remoting.Channel$2.terminate(Channel.java:483)
      	    at hudson.remoting.AbstractByteArrayCommandTransport$1.terminate(AbstractByteArrayCommandTransport.java:72)
      	    at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:203)
      	    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:597)
      	    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      	    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	    at java.lang.Thread.run(Thread.java:662)
      
      ... because of this guy:
      
          "Computer.threadPoolForRemoting [#216] for mac" daemon prio=10 tid=0x00007f86dc0d6800 nid=0x3f34 in Object.wait() [0x00007f87442f1000]
             java.lang.Thread.State: WAITING (on object monitor)
      	    at java.lang.Object.wait(Native Method)
      	    - waiting on <0x00007f874ef76810> (a org.jenkinsci.remoting.nio.FifoBuffer)
      	    at java.lang.Object.wait(Object.java:485)
      	    at org.jenkinsci.remoting.nio.FifoBuffer.write(FifoBuffer.java:336)
      	    - locked <0x00007f874ef76810> (a org.jenkinsci.remoting.nio.FifoBuffer)
      	    at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.writeBlock(NioChannelHub.java:215)
      	    at hudson.remoting.AbstractByteArrayCommandTransport.write(AbstractByteArrayCommandTransport.java:83)
      	    at hudson.remoting.Channel.send(Channel.java:545)
      	    - locked <0x00007f874ef76658> (a hudson.remoting.Channel)
      	    at hudson.remoting.Request$2.run(Request.java:342)
      	    - locked <0x00007f874ef76658> (a hudson.remoting.Channel)
      	    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      	    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	    at java.lang.Thread.run(Thread.java:662)
      

      Full thread dump is here

          [JENKINS-25218] Channel hangs due to the infinite loop in FifoBuffer within the lock

          Kohsuke Kawaguchi created issue -
          Oleg Nenashev made changes -
          Component/s New: remoting [ 15489 ]

          Also seeing similar live-locks with the following thread holding the lock:

          "Computer.threadPoolForRemoting [#173620] : IO ID=7851 : seq#=7850" #2080996 daemon prio=5 os_prio=0 tid=0x0000000088306800 nid=0x15e0 in Object.wait() [0x00000000f209f000]
             java.lang.Thread.State: WAITING (on object monitor)
              at java.lang.Object.wait(Native Method)
              at java.lang.Object.wait(Object.java:502)
              at org.jenkinsci.remoting.nio.FifoBuffer.write(FifoBuffer.java:336)
              - locked <0x000000044d3408b8> (a org.jenkinsci.remoting.nio.FifoBuffer)
              at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.writeBlock(NioChannelHub.java:220)
              at hudson.remoting.AbstractByteArrayCommandTransport.write(AbstractByteArrayCommandTransport.java:83)
              at hudson.remoting.Channel.send(Channel.java:576)
              - locked <0x000000044d3407a0> (a hudson.remoting.Channel)
              at hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:260)
              at hudson.remoting.PipeWriter$1.run(PipeWriter.java:158)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
              at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
              at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
          

          The NioChannelHub thread has the same stack trace (modulo line numbers)

          Stephen Connolly added a comment - Also seeing similar live-locks with the following thread holding the lock: "Computer.threadPoolForRemoting [#173620] : IO ID=7851 : seq#=7850" #2080996 daemon prio=5 os_prio=0 tid=0x0000000088306800 nid=0x15e0 in Object .wait() [0x00000000f209f000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method) at java.lang. Object .wait( Object .java:502) at org.jenkinsci.remoting.nio.FifoBuffer.write(FifoBuffer.java:336) - locked <0x000000044d3408b8> (a org.jenkinsci.remoting.nio.FifoBuffer) at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.writeBlock(NioChannelHub.java:220) at hudson.remoting.AbstractByteArrayCommandTransport.write(AbstractByteArrayCommandTransport.java:83) at hudson.remoting.Channel.send(Channel.java:576) - locked <0x000000044d3407a0> (a hudson.remoting.Channel) at hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:260) at hudson.remoting.PipeWriter$1.run(PipeWriter.java:158) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang. Thread .run( Thread .java:745) The NioChannelHub thread has the same stack trace (modulo line numbers)
          Stephen Connolly made changes -
          Link New: This issue is related to JENKINS-23043 [ JENKINS-23043 ]
          Stephen Connolly made changes -
          Link New: This issue is related to JENKINS-28826 [ JENKINS-28826 ]
          Stephen Connolly made changes -
          Link New: This issue is related to JENKINS-20947 [ JENKINS-20947 ]
          Stephen Connolly made changes -
          Link New: This issue is related to JENKINS-24155 [ JENKINS-24155 ]
          Stephen Connolly made changes -
          Link New: This issue is duplicated by JENKINS-23043 [ JENKINS-23043 ]
          R. Tyler Croy made changes -
          Workflow Original: JNJira [ 159130 ] New: JNJira + In-Review [ 179883 ]
          Oleg Nenashev made changes -
          Assignee New: Oleg Nenashev [ oleg_nenashev ]

            oleg_nenashev Oleg Nenashev
            kohsuke Kohsuke Kawaguchi
            Votes:
            7 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: