Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-40491

Preliminary FifoBuffer termination can cause outage of all JNLP1/2 agents

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • remoting
    • None

      This improvement should help with the triangilation of JENKINS-31050

      Background: I was analysing JIRA issues related to the NIOHub fatal channel termination causing massive disconnection of agents. It appears that the SingleLaneExecutor is not completely correctly used there...

      TL;DR: A single packet sent to the channel with pending shutdown may cause the termination of all remoting channels in JNLP1, JNLP2, CLI, and CLI2 protocols. JNLP4 does not seem to be affected.

          [JENKINS-40491] Preliminary FifoBuffer termination can cause outage of all JNLP1/2 agents

          Oleg Nenashev created issue -
          Oleg Nenashev made changes -
          Link New: This issue is related to JENKINS-31050 [ JENKINS-31050 ]
          Oleg Nenashev made changes -
          Epic Link New: JENKINS-38833 [ 175240 ]
          Oleg Nenashev made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          Oleg Nenashev made changes -
          Status Original: Resolved [ 5 ] New: Closed [ 6 ]
          Oleg Nenashev made changes -
          Assignee New: Oleg Nenashev [ oleg_nenashev ]
          Resolution Original: Fixed [ 1 ]
          Status Original: Closed [ 6 ] New: Reopened [ 4 ]
          Oleg Nenashev made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
          Raghu Pallikonda made changes -
          Comment [ Hi [~oleg_nenashev]

            I was getting the 'Agent offline during the build' error when I was using Jenkins v2.19.1 for the Jenkins Master and Jenkins-slave v2.62 for the slave pod.
          After reading up on your fix, upgraded the Jenkins to v 2.37 and the slave to jenkins-slave 3.4 (remoting 3.4). Now I am getting the below error


          {code:java}
          Caused by: java.io.IOException: Unexpected EOF while receiving the data from the channel. FIFO buffer has been already closed
          at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:617)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)
          Caused by: org.jenkinsci.remoting.nio.FifoBuffer$CloseCause: Buffer close has been requested
          at org.jenkinsci.remoting.nio.FifoBuffer.close(FifoBuffer.java:426)
          at org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport.closeR(NioChannelHub.java:332)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:565)
          ... 6 more
          {code}

          Let me know if I need to provide more details.
          ]
          Oleg Nenashev made changes -
          Description Original: This improvement should help with the triangilation of JENKINS-31050 New: This improvement should help with the triangilation of JENKINS-31050

          Background: I was analysing JIRA issues related to the NIOHub fatal channel termination causing massive disconnection of agents. It appears that the SingleLaneExecutor is not completely correctly used there...

          TL;DR: A single packet sent to the channel with pending shutdown may cause the termination of all remoting channels in JNLP1, JNLP2, CLI, and CLI2 protocols. JNLP4 does not seem to be affected.
          Oleg Nenashev made changes -
          Summary Original: Improve diagnostics of the preliminary FifoBuffer termination New: Preliminary FifoBuffer termination can cause outage of all JNLP1/2 agents
          Oleg Nenashev made changes -
          Issue Type Original: Improvement [ 4 ] New: Bug [ 1 ]

            oleg_nenashev Oleg Nenashev
            oleg_nenashev Oleg Nenashev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: