Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-18438

Node monitoring should run in parallel

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      As of 1.520, AbstractNodeMonitorDescriptor monitors nodes sequentially. As the # of slaves go up, this will take a long time to complete, and this also makes the monitoring susceptive to a hang.

      While a ping thread is there to detect unresponsive nodes, its interval is 10mins and the time out is 4mins, so a few unresonsive nodes can quickly push the total running time of node monitoring beyond the default monitoring cycle of 1 hour.

      A better approach is to make asynchronous remoting calls to all the slaves at once, then wait for the results to come back. This way, we can get the result back for ones that are functioning.

        Attachments

          Issue Links

            Activity

            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            changelog.html
            core/src/main/java/hudson/FilePath.java
            core/src/main/java/hudson/model/Node.java
            core/src/main/java/hudson/model/Slave.java
            core/src/main/java/hudson/node_monitors/AbstractAsyncNodeMonitorDescriptor.java
            core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java
            core/src/main/java/hudson/node_monitors/ArchitectureMonitor.java
            core/src/main/java/hudson/node_monitors/ClockMonitor.java
            core/src/main/java/hudson/node_monitors/DiskSpaceMonitor.java
            core/src/main/java/hudson/node_monitors/DiskSpaceMonitorDescriptor.java
            core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java
            core/src/main/java/hudson/node_monitors/SwapSpaceMonitor.java
            core/src/main/java/hudson/node_monitors/TemporarySpaceMonitor.java
            core/src/main/java/jenkins/model/Jenkins.java
            core/src/test/java/hudson/slaves/NodeListTest.java
            http://jenkins-ci.org/commit/jenkins/735713801b130fe247cf17bbca7b4561e41b1d13
            Log:
            [FIXED JENKINS-18438]

            Node monitoring should run in parallel to reduce the total round-trip
            time in large instances.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html core/src/main/java/hudson/FilePath.java core/src/main/java/hudson/model/Node.java core/src/main/java/hudson/model/Slave.java core/src/main/java/hudson/node_monitors/AbstractAsyncNodeMonitorDescriptor.java core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java core/src/main/java/hudson/node_monitors/ArchitectureMonitor.java core/src/main/java/hudson/node_monitors/ClockMonitor.java core/src/main/java/hudson/node_monitors/DiskSpaceMonitor.java core/src/main/java/hudson/node_monitors/DiskSpaceMonitorDescriptor.java core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java core/src/main/java/hudson/node_monitors/SwapSpaceMonitor.java core/src/main/java/hudson/node_monitors/TemporarySpaceMonitor.java core/src/main/java/jenkins/model/Jenkins.java core/src/test/java/hudson/slaves/NodeListTest.java http://jenkins-ci.org/commit/jenkins/735713801b130fe247cf17bbca7b4561e41b1d13 Log: [FIXED JENKINS-18438] Node monitoring should run in parallel to reduce the total round-trip time in large instances.
            Hide
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #2671
            [FIXED JENKINS-18438] (Revision 735713801b130fe247cf17bbca7b4561e41b1d13)

            Result = SUCCESS
            kohsuke : 735713801b130fe247cf17bbca7b4561e41b1d13
            Files :

            • core/src/test/java/hudson/slaves/NodeListTest.java
            • core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java
            • core/src/main/java/hudson/FilePath.java
            • core/src/main/java/hudson/node_monitors/TemporarySpaceMonitor.java
            • core/src/main/java/jenkins/model/Jenkins.java
            • core/src/main/java/hudson/model/Node.java
            • core/src/main/java/hudson/node_monitors/ClockMonitor.java
            • core/src/main/java/hudson/node_monitors/DiskSpaceMonitorDescriptor.java
            • core/src/main/java/hudson/node_monitors/DiskSpaceMonitor.java
            • core/src/main/java/hudson/node_monitors/ArchitectureMonitor.java
            • core/src/main/java/hudson/node_monitors/SwapSpaceMonitor.java
            • core/src/main/java/hudson/model/Slave.java
            • core/src/main/java/hudson/node_monitors/AbstractAsyncNodeMonitorDescriptor.java
            • changelog.html
            • core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java
            Show
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #2671 [FIXED JENKINS-18438] (Revision 735713801b130fe247cf17bbca7b4561e41b1d13) Result = SUCCESS kohsuke : 735713801b130fe247cf17bbca7b4561e41b1d13 Files : core/src/test/java/hudson/slaves/NodeListTest.java core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java core/src/main/java/hudson/FilePath.java core/src/main/java/hudson/node_monitors/TemporarySpaceMonitor.java core/src/main/java/jenkins/model/Jenkins.java core/src/main/java/hudson/model/Node.java core/src/main/java/hudson/node_monitors/ClockMonitor.java core/src/main/java/hudson/node_monitors/DiskSpaceMonitorDescriptor.java core/src/main/java/hudson/node_monitors/DiskSpaceMonitor.java core/src/main/java/hudson/node_monitors/ArchitectureMonitor.java core/src/main/java/hudson/node_monitors/SwapSpaceMonitor.java core/src/main/java/hudson/model/Slave.java core/src/main/java/hudson/node_monitors/AbstractAsyncNodeMonitorDescriptor.java changelog.html core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java
            Hide
            vlatombe Vincent Latombe added a comment -

            I already commented on https://github.com/jenkinsci/jenkins/commit/735713801b130fe247cf17bbca7b4561e41b1d13 , but this change breaks time-based monitoring (clock difference and response time) and causes some slaves to timeout whereas they used to connect in acceptable time before the change.

            Show
            vlatombe Vincent Latombe added a comment - I already commented on https://github.com/jenkinsci/jenkins/commit/735713801b130fe247cf17bbca7b4561e41b1d13 , but this change breaks time-based monitoring (clock difference and response time) and causes some slaves to timeout whereas they used to connect in acceptable time before the change.
            Hide
            zfil Philippe Jandot added a comment -

            This changes breaks node monitoring report.
            See previous comments

            Show
            zfil Philippe Jandot added a comment - This changes breaks node monitoring report. See previous comments
            Hide
            jglick Jesse Glick added a comment -

            JENKINS-18671 was filed separately; leave this closed and use linked issues for any regressions.

            Show
            jglick Jesse Glick added a comment - JENKINS-18671 was filed separately; leave this closed and use linked issues for any regressions.

              People

              Assignee:
              kohsuke Kohsuke Kawaguchi
              Reporter:
              kohsuke Kohsuke Kawaguchi
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: