Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-46680

Computer offline by ping thread leaves the channel half open

      Reproducer:

      Launch a local agent over ssh/command launcher and stop its process by kill -TSTP $PID. The agent stops responding and Jenkins notices is eventually closing its connection with clear exception.

      Actual behavior:

      • The channel is never disassociated from its computer so long running operations and other clients that only care for computer.channel != null will keep using it throwing exceptions all over the place. EDIT: The computer is not even temporarily offline and it does not seem to improve after all monitors has run as they all choke on closed channel.
      • The channel is in the middle of closing procedure as it is outClosed but not inClosed. The other end does not send the close command for obvious reasons so it is never closed fully. I speculate that specifically is the reason why SlaveComputer#closeChannel() is not called thus causing the previous problem.

      Expected behavior:

      • The broken/half-closed/fully-closed channel is disassociated from computer that will therefore appear disconnected to all possible clients.

          [JENKINS-46680] Computer offline by ping thread leaves the channel half open

          Oliver Gondža created issue -
          Oliver Gondža made changes -
          Description Original: Reproducer:

          Launch a local agent over ssh/command launcher and stop its process by {{kill -TSTP $PID}}. The agent stops responding and Jenkins notices is eventually closing its connection with clear exception.

          Actual behavior:

          - The channel is never disassociated from its computer so long running operations and other clients that only care for {{computer.channel != null}} will keep using it throwing exceptions all over the place.
          - The channel is in the middle of closing procedure as it is {{outClosed}} but not {{inClosed}}. The other end does not send the close command for obvious reasons so it is never closed fully. I speculate that specifically is the reason why {{SlaveComputer#closeChannel()}} is not called thus causing the previous problem.

          Expected behavior:

          - The broken/half-closed/fully-closed channel is disassociated from computer that will therefore appear disconnected to all possible clients.
          New: Reproducer:

          Launch a local agent over ssh/command launcher and stop its process by {{kill -TSTP $PID}}. The agent stops responding and Jenkins notices is eventually closing its connection with clear exception.

          Actual behavior:

          - The channel is never disassociated from its computer so long running operations and other clients that only care for {{computer.channel != null}} will keep using it throwing exceptions all over the place. EDIT: The computer is not even temporarily offline and it does not seem to improve after all monitors has run as they all choke on closed channel.
          - The channel is in the middle of closing procedure as it is {{outClosed}} but not {{inClosed}}. The other end does not send the close command for obvious reasons so it is never closed fully. I speculate that specifically is the reason why {{SlaveComputer#closeChannel()}} is not called thus causing the previous problem.

          Expected behavior:

          - The broken/half-closed/fully-closed channel is disassociated from computer that will therefore appear disconnected to all possible clients.
          Oliver Gondža made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Oliver Gondža made changes -
          Assignee New: Oliver Gondža [ olivergondza ]
          Oliver Gondža made changes -
          Labels New: robustness

          Fix proposed.

          Oliver Gondža added a comment - Fix proposed.
          Oliver Gondža made changes -
          Remote Link New: This issue links to "PR 3005 (Web Link)" [ 17645 ]

          Code changed in jenkins
          User: Oliver Gondža
          Path:
          core/src/main/java/hudson/slaves/ChannelPinger.java
          core/src/test/java/hudson/slaves/ChannelPingerTest.java
          test/src/test/java/hudson/slaves/PingThreadTest.java
          http://jenkins-ci.org/commit/jenkins/dbb5e443b96ddc7472207862e9e60d807666f72c
          Log:
          JENKINS-46680 Disconnect computer on ping timeout (#3005)

          • [FIX JENKINS-46680] Reset SlaveComputer channel before closing it on ping timeout
          • JENKINS-46680 Attach channel termination offline cause on ping timeouts

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oliver Gondža Path: core/src/main/java/hudson/slaves/ChannelPinger.java core/src/test/java/hudson/slaves/ChannelPingerTest.java test/src/test/java/hudson/slaves/PingThreadTest.java http://jenkins-ci.org/commit/jenkins/dbb5e443b96ddc7472207862e9e60d807666f72c Log: JENKINS-46680 Disconnect computer on ping timeout (#3005) JENKINS-46680 Reproduce in unittest [FIX JENKINS-46680] Reset SlaveComputer channel before closing it on ping timeout JENKINS-46680 Attach channel termination offline cause on ping timeouts
          SCM/JIRA link daemon made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: In Progress [ 3 ] New: Resolved [ 5 ]
          Daniel Beck made changes -
          Status Original: Resolved [ 5 ] New: In Review [ 10005 ]

            olivergondza Oliver Gondža
            olivergondza Oliver Gondža
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: