• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core, remoting
    • None

      As of 1.520, AbstractNodeMonitorDescriptor monitors nodes sequentially. As the # of slaves go up, this will take a long time to complete, and this also makes the monitoring susceptive to a hang.

      While a ping thread is there to detect unresponsive nodes, its interval is 10mins and the time out is 4mins, so a few unresonsive nodes can quickly push the total running time of node monitoring beyond the default monitoring cycle of 1 hour.

      A better approach is to make asynchronous remoting calls to all the slaves at once, then wait for the results to come back. This way, we can get the result back for ones that are functioning.

          [JENKINS-18438] Node monitoring should run in parallel

          Kohsuke Kawaguchi created issue -
          Kohsuke Kawaguchi made changes -
          Link New: This issue is related to JENKINS-18152 [ JENKINS-18152 ]
          SCM/JIRA link daemon made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          Philippe Jandot made changes -
          Assignee New: Kohsuke Kawaguchi [ kohsuke ]
          Resolution Original: Fixed [ 1 ]
          Status Original: Resolved [ 5 ] New: Reopened [ 4 ]
          sogabe made changes -
          Link New: This issue is related to JENKINS-18671 [ JENKINS-18671 ]
          Jesse Glick made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
          R. Tyler Croy made changes -
          Workflow Original: JNJira [ 149745 ] New: JNJira + In-Review [ 193273 ]

            kohsuke Kohsuke Kawaguchi
            kohsuke Kohsuke Kawaguchi
            Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: