Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47965

IOHub may hang infinitely on Windows if Selector does not get data

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • remoting
    • None
    • Win 10, amd64, Oracle JDK 8u131
      Remoting 3.14

      I discovered it during the cleanup of JENKINS-38696.

      Windows implementation of NIO Channel Selector does not request "Select now" in https://github.com/jenkinsci/remoting/blob/6165d6fb6a71a7f4fce52ce5fc4cac479052ce04/src/main/java/org/jenkinsci/remoting/protocol/IOHub.java#L436 and hence calls selector#select()... and this method has no timeout.

      When the code gets into this branch, Selector will be waiting infinitely without calling selector#isOpen(). Such implementation relies on the Selector implementation, which shout interrupt the select() wait if the selector is being closed. But apparently it does not no my machine (Win 10, amd64, Oracle JDK 8u131)

      I propose to just add wait timeout and make the IOHub implementation to loop in the logic.

          [JENKINS-47965] IOHub may hang infinitely on Windows if Selector does not get data

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          Jenkinsfile
          src/main/java/org/jenkinsci/remoting/protocol/IOHub.java
          http://jenkins-ci.org/commit/remoting/a916c6474de134874818c9847253fb163d360dbc
          Log:
          [JENKINS-47965, JENKINS-38696] - IOHub should not wait infinitely for Selector in Windows. (#221)

          • [JENKINS-47965, JENKINS-38696] - IOHub should not wait infinitely for Selector in Windows.

          Windows implementation of NIO Channel Selector does not request "Select now" in https://github.com/jenkinsci/remoting/blob/6165d6fb6a71a7f4fce52ce5fc4cac479052ce04/src/main/java/org/jenkinsci/remoting/protocol/IOHub.java#L436 and hence calls selector#select()... and this method has no timeout.
          When the code gets into this branch, Selector will be waiting infinitely without calling selector#isOpen(). Such implementation relies on the Selector implementation, which shout interrupt the select() wait if the selector is being closed. But apparently it does not no my machine (Win 10, amd64, Oracle JDK 8u131)

          This change adds wait timeout and makes the IOHub implementation to loop in the logic.

          • JENKINS-47965 - Lame implementation of IOHub Selector wakeup thread for Windows
          • JENKINS-47965 - Replace Thread.sleep() by object wait()/notify()
          • JENKINS-47965 - Address comments from @jtnord, run Selector Watcher in Unix systems as well

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: Jenkinsfile src/main/java/org/jenkinsci/remoting/protocol/IOHub.java http://jenkins-ci.org/commit/remoting/a916c6474de134874818c9847253fb163d360dbc Log: [JENKINS-47965, JENKINS-38696] - IOHub should not wait infinitely for Selector in Windows. (#221) [JENKINS-47965, JENKINS-38696] - IOHub should not wait infinitely for Selector in Windows. Windows implementation of NIO Channel Selector does not request "Select now" in https://github.com/jenkinsci/remoting/blob/6165d6fb6a71a7f4fce52ce5fc4cac479052ce04/src/main/java/org/jenkinsci/remoting/protocol/IOHub.java#L436 and hence calls selector#select()... and this method has no timeout. When the code gets into this branch, Selector will be waiting infinitely without calling selector#isOpen(). Such implementation relies on the Selector implementation, which shout interrupt the select() wait if the selector is being closed. But apparently it does not no my machine (Win 10, amd64, Oracle JDK 8u131) This change adds wait timeout and makes the IOHub implementation to loop in the logic. JENKINS-47965 - Lame implementation of IOHub Selector wakeup thread for Windows JENKINS-47965 - Polish the implementation JENKINS-38696 - Add Windows back to CI and make @jtnord happy JENKINS-47965 - Simplify the logic JENKINS-47965 - Replace Thread.sleep() by object wait()/notify() JENKINS-47965 - Cleanup the locking logic JENKINS-47965 - Address comments from @jtnord, run Selector Watcher in Unix systems as well

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/src/main/java/hudson/Launcher.java
          core/src/main/java/hudson/model/Computer.java
          core/src/main/java/hudson/slaves/ChannelPinger.java
          core/src/main/java/hudson/slaves/SlaveComputer.java
          core/src/main/java/jenkins/FilePathFilter.java
          core/src/main/java/jenkins/slaves/StandardOutputSwapper.java
          pom.xml
          test/src/test/java/hudson/bugs/JnlpAccessWithSecuredHudsonTest.java
          test/src/test/java/jenkins/security/Security218CliTest.java
          http://jenkins-ci.org/commit/jenkins/cb3990a4d6094260bea4571e7079fd0e3949047f
          Log:
          Update to Remoting 3.15 and Cleanup issues in Channel#current() usages (#3145)

          Pulls in fixes for: JENKINS-48133, JENKINS-48055, JENKINS-37566, JENKINS-48309, JENKINS-47965, JENKINS-48130, JENKINS-37670, JENKINS-37566, JENKINS-46724

          This change also adds some missing null/closing channel checks in the core.
          In some cases the change prevents spawning threads if the channel is in the invalid state.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/src/main/java/hudson/Launcher.java core/src/main/java/hudson/model/Computer.java core/src/main/java/hudson/slaves/ChannelPinger.java core/src/main/java/hudson/slaves/SlaveComputer.java core/src/main/java/jenkins/FilePathFilter.java core/src/main/java/jenkins/slaves/StandardOutputSwapper.java pom.xml test/src/test/java/hudson/bugs/JnlpAccessWithSecuredHudsonTest.java test/src/test/java/jenkins/security/Security218CliTest.java http://jenkins-ci.org/commit/jenkins/cb3990a4d6094260bea4571e7079fd0e3949047f Log: Update to Remoting 3.15 and Cleanup issues in Channel#current() usages (#3145) Pulls in fixes for: JENKINS-48133 , JENKINS-48055 , JENKINS-37566 , JENKINS-48309 , JENKINS-47965 , JENKINS-48130 , JENKINS-37670 , JENKINS-37566 , JENKINS-46724 This change also adds some missing null/closing channel checks in the core. In some cases the change prevents spawning threads if the channel is in the invalid state.

          Oleg Nenashev added a comment -

          The change has been released in Remoting 3.15 and integrated towards Jenkins 2.98

          Oleg Nenashev added a comment - The change has been released in Remoting 3.15 and integrated towards Jenkins 2.98

            Unassigned Unassigned
            oleg_nenashev Oleg Nenashev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: