Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-46680

Computer offline by ping thread leaves the channel half open

      Reproducer:

      Launch a local agent over ssh/command launcher and stop its process by kill -TSTP $PID. The agent stops responding and Jenkins notices is eventually closing its connection with clear exception.

      Actual behavior:

      • The channel is never disassociated from its computer so long running operations and other clients that only care for computer.channel != null will keep using it throwing exceptions all over the place. EDIT: The computer is not even temporarily offline and it does not seem to improve after all monitors has run as they all choke on closed channel.
      • The channel is in the middle of closing procedure as it is outClosed but not inClosed. The other end does not send the close command for obvious reasons so it is never closed fully. I speculate that specifically is the reason why SlaveComputer#closeChannel() is not called thus causing the previous problem.

      Expected behavior:

      • The broken/half-closed/fully-closed channel is disassociated from computer that will therefore appear disconnected to all possible clients.

          [JENKINS-46680] Computer offline by ping thread leaves the channel half open

          Fix proposed.

          Oliver Gondža added a comment - Fix proposed.

          Code changed in jenkins
          User: Oliver Gondža
          Path:
          core/src/main/java/hudson/slaves/ChannelPinger.java
          core/src/test/java/hudson/slaves/ChannelPingerTest.java
          test/src/test/java/hudson/slaves/PingThreadTest.java
          http://jenkins-ci.org/commit/jenkins/dbb5e443b96ddc7472207862e9e60d807666f72c
          Log:
          JENKINS-46680 Disconnect computer on ping timeout (#3005)

          • [FIX JENKINS-46680] Reset SlaveComputer channel before closing it on ping timeout
          • JENKINS-46680 Attach channel termination offline cause on ping timeouts

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oliver Gondža Path: core/src/main/java/hudson/slaves/ChannelPinger.java core/src/test/java/hudson/slaves/ChannelPingerTest.java test/src/test/java/hudson/slaves/PingThreadTest.java http://jenkins-ci.org/commit/jenkins/dbb5e443b96ddc7472207862e9e60d807666f72c Log: JENKINS-46680 Disconnect computer on ping timeout (#3005) JENKINS-46680 Reproduce in unittest [FIX JENKINS-46680] Reset SlaveComputer channel before closing it on ping timeout JENKINS-46680 Attach channel termination offline cause on ping timeouts

          Postponing backport to 2.73.3 as it is fairly new for .2 and I would like to see this soaked properly.

          Oliver Gondža added a comment - Postponing backport to 2.73.3 as it is fairly new for .2 and I would like to see this soaked properly.

          Code changed in jenkins
          User: Oliver Gondža
          Path:
          core/src/main/java/hudson/slaves/ChannelPinger.java
          core/src/test/java/hudson/slaves/ChannelPingerTest.java
          test/src/test/java/hudson/slaves/PingThreadTest.java
          http://jenkins-ci.org/commit/jenkins/06b0cd637c79728d7a9b552c36ca59f5c0260e26
          Log:
          JENKINS-46680 Disconnect computer on ping timeout (#3005)

          • [FIX JENKINS-46680] Reset SlaveComputer channel before closing it on ping timeout
          • JENKINS-46680 Attach channel termination offline cause on ping timeouts

          (cherry picked from commit dbb5e443b96ddc7472207862e9e60d807666f72c)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oliver Gondža Path: core/src/main/java/hudson/slaves/ChannelPinger.java core/src/test/java/hudson/slaves/ChannelPingerTest.java test/src/test/java/hudson/slaves/PingThreadTest.java http://jenkins-ci.org/commit/jenkins/06b0cd637c79728d7a9b552c36ca59f5c0260e26 Log: JENKINS-46680 Disconnect computer on ping timeout (#3005) JENKINS-46680 Reproduce in unittest [FIX JENKINS-46680] Reset SlaveComputer channel before closing it on ping timeout JENKINS-46680 Attach channel termination offline cause on ping timeouts (cherry picked from commit dbb5e443b96ddc7472207862e9e60d807666f72c)

            olivergondza Oliver Gondža
            olivergondza Oliver Gondža
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: