Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47965

IOHub may hang infinitely on Windows if Selector does not get data

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Critical
    • Resolution: Fixed
    • remoting
    • None
    • Win 10, amd64, Oracle JDK 8u131
      Remoting 3.14

    Description

      I discovered it during the cleanup of JENKINS-38696.

      Windows implementation of NIO Channel Selector does not request "Select now" in https://github.com/jenkinsci/remoting/blob/6165d6fb6a71a7f4fce52ce5fc4cac479052ce04/src/main/java/org/jenkinsci/remoting/protocol/IOHub.java#L436 and hence calls selector#select()... and this method has no timeout.

      When the code gets into this branch, Selector will be waiting infinitely without calling selector#isOpen(). Such implementation relies on the Selector implementation, which shout interrupt the select() wait if the selector is being closed. But apparently it does not no my machine (Win 10, amd64, Oracle JDK 8u131)

      I propose to just add wait timeout and make the IOHub implementation to loop in the logic.

      Attachments

        Issue Links

          Activity

            Code changed in jenkins
            User: Oleg Nenashev
            Path:
            Jenkinsfile
            src/main/java/org/jenkinsci/remoting/protocol/IOHub.java
            http://jenkins-ci.org/commit/remoting/a916c6474de134874818c9847253fb163d360dbc
            Log:
            [JENKINS-47965, JENKINS-38696] - IOHub should not wait infinitely for Selector in Windows. (#221)

            • [JENKINS-47965, JENKINS-38696] - IOHub should not wait infinitely for Selector in Windows.

            Windows implementation of NIO Channel Selector does not request "Select now" in https://github.com/jenkinsci/remoting/blob/6165d6fb6a71a7f4fce52ce5fc4cac479052ce04/src/main/java/org/jenkinsci/remoting/protocol/IOHub.java#L436 and hence calls selector#select()... and this method has no timeout.
            When the code gets into this branch, Selector will be waiting infinitely without calling selector#isOpen(). Such implementation relies on the Selector implementation, which shout interrupt the select() wait if the selector is being closed. But apparently it does not no my machine (Win 10, amd64, Oracle JDK 8u131)

            This change adds wait timeout and makes the IOHub implementation to loop in the logic.

            • JENKINS-47965 - Lame implementation of IOHub Selector wakeup thread for Windows
            • JENKINS-47965 - Replace Thread.sleep() by object wait()/notify()
            • JENKINS-47965 - Address comments from @jtnord, run Selector Watcher in Unix systems as well
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: Jenkinsfile src/main/java/org/jenkinsci/remoting/protocol/IOHub.java http://jenkins-ci.org/commit/remoting/a916c6474de134874818c9847253fb163d360dbc Log: [JENKINS-47965, JENKINS-38696] - IOHub should not wait infinitely for Selector in Windows. (#221) [JENKINS-47965, JENKINS-38696] - IOHub should not wait infinitely for Selector in Windows. Windows implementation of NIO Channel Selector does not request "Select now" in https://github.com/jenkinsci/remoting/blob/6165d6fb6a71a7f4fce52ce5fc4cac479052ce04/src/main/java/org/jenkinsci/remoting/protocol/IOHub.java#L436 and hence calls selector#select()... and this method has no timeout. When the code gets into this branch, Selector will be waiting infinitely without calling selector#isOpen(). Such implementation relies on the Selector implementation, which shout interrupt the select() wait if the selector is being closed. But apparently it does not no my machine (Win 10, amd64, Oracle JDK 8u131) This change adds wait timeout and makes the IOHub implementation to loop in the logic. JENKINS-47965 - Lame implementation of IOHub Selector wakeup thread for Windows JENKINS-47965 - Polish the implementation JENKINS-38696 - Add Windows back to CI and make @jtnord happy JENKINS-47965 - Simplify the logic JENKINS-47965 - Replace Thread.sleep() by object wait()/notify() JENKINS-47965 - Cleanup the locking logic JENKINS-47965 - Address comments from @jtnord, run Selector Watcher in Unix systems as well

            Code changed in jenkins
            User: Oleg Nenashev
            Path:
            core/src/main/java/hudson/Launcher.java
            core/src/main/java/hudson/model/Computer.java
            core/src/main/java/hudson/slaves/ChannelPinger.java
            core/src/main/java/hudson/slaves/SlaveComputer.java
            core/src/main/java/jenkins/FilePathFilter.java
            core/src/main/java/jenkins/slaves/StandardOutputSwapper.java
            pom.xml
            test/src/test/java/hudson/bugs/JnlpAccessWithSecuredHudsonTest.java
            test/src/test/java/jenkins/security/Security218CliTest.java
            http://jenkins-ci.org/commit/jenkins/cb3990a4d6094260bea4571e7079fd0e3949047f
            Log:
            Update to Remoting 3.15 and Cleanup issues in Channel#current() usages (#3145)

            Pulls in fixes for: JENKINS-48133, JENKINS-48055, JENKINS-37566, JENKINS-48309, JENKINS-47965, JENKINS-48130, JENKINS-37670, JENKINS-37566, JENKINS-46724

            This change also adds some missing null/closing channel checks in the core.
            In some cases the change prevents spawning threads if the channel is in the invalid state.

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/src/main/java/hudson/Launcher.java core/src/main/java/hudson/model/Computer.java core/src/main/java/hudson/slaves/ChannelPinger.java core/src/main/java/hudson/slaves/SlaveComputer.java core/src/main/java/jenkins/FilePathFilter.java core/src/main/java/jenkins/slaves/StandardOutputSwapper.java pom.xml test/src/test/java/hudson/bugs/JnlpAccessWithSecuredHudsonTest.java test/src/test/java/jenkins/security/Security218CliTest.java http://jenkins-ci.org/commit/jenkins/cb3990a4d6094260bea4571e7079fd0e3949047f Log: Update to Remoting 3.15 and Cleanup issues in Channel#current() usages (#3145) Pulls in fixes for: JENKINS-48133 , JENKINS-48055 , JENKINS-37566 , JENKINS-48309 , JENKINS-47965 , JENKINS-48130 , JENKINS-37670 , JENKINS-37566 , JENKINS-46724 This change also adds some missing null/closing channel checks in the core. In some cases the change prevents spawning threads if the channel is in the invalid state.
            oleg_nenashev Oleg Nenashev added a comment -

            The change has been released in Remoting 3.15 and integrated towards Jenkins 2.98

            oleg_nenashev Oleg Nenashev added a comment - The change has been released in Remoting 3.15 and integrated towards Jenkins 2.98

            People

              Unassigned Unassigned
              oleg_nenashev Oleg Nenashev
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: