Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-18438

Node monitoring should run in parallel


    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core, remoting
    • None

      As of 1.520, AbstractNodeMonitorDescriptor monitors nodes sequentially. As the # of slaves go up, this will take a long time to complete, and this also makes the monitoring susceptive to a hang.

      While a ping thread is there to detect unresponsive nodes, its interval is 10mins and the time out is 4mins, so a few unresonsive nodes can quickly push the total running time of node monitoring beyond the default monitoring cycle of 1 hour.

      A better approach is to make asynchronous remoting calls to all the slaves at once, then wait for the results to come back. This way, we can get the result back for ones that are functioning.

            kohsuke Kohsuke Kawaguchi
            kohsuke Kohsuke Kawaguchi
            2 Vote for this issue
            9 Start watching this issue