Node monitoring should run in parallel

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      As of 1.520, AbstractNodeMonitorDescriptor monitors nodes sequentially. As the # of slaves go up, this will take a long time to complete, and this also makes the monitoring susceptive to a hang.

      While a ping thread is there to detect unresponsive nodes, its interval is 10mins and the time out is 4mins, so a few unresonsive nodes can quickly push the total running time of node monitoring beyond the default monitoring cycle of 1 hour.

      A better approach is to make asynchronous remoting calls to all the slaves at once, then wait for the results to come back. This way, we can get the result back for ones that are functioning.

            Assignee:
            Kohsuke Kawaguchi
            Reporter:
            Kohsuke Kawaguchi
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Resolved:
              Archived: