Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-2548

Node does not come back online after disk space cleared

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • remoting
    • None
    • Platform: All, OS: All

      We are using Hudson as a single master server and had it go offline due to less
      than 1GB disk space being enabled.

      After we clear some disk space Hudson does not come back online until we restart
      the servlet container. Could it not detect that there is enough disk space
      available and come back online automatically?

          [JENKINS-2548] Node does not come back online after disk space cleared

          manderson23 created issue -
          Andrew Bayer made changes -
          Assignee New: Andrew Bayer [ abayer ]

          Code changed in jenkins
          User: Andrew Bayer
          Path:
          changelog.html
          core/src/main/java/hudson/node_monitors/AbstractDiskSpaceMonitor.java
          core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java
          core/src/main/resources/hudson/node_monitors/Messages.properties
          http://jenkins-ci.org/commit/jenkins/e38e687d5b66238f406d1e3642a3d5f6a02aaeb2
          Log:
          [FIXED JENKINS-2548] Slaves taken offline for low disk space will now
          come back online when disk space becomes available.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Andrew Bayer Path: changelog.html core/src/main/java/hudson/node_monitors/AbstractDiskSpaceMonitor.java core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java core/src/main/resources/hudson/node_monitors/Messages.properties http://jenkins-ci.org/commit/jenkins/e38e687d5b66238f406d1e3642a3d5f6a02aaeb2 Log: [FIXED JENKINS-2548] Slaves taken offline for low disk space will now come back online when disk space becomes available.
          SCM/JIRA link daemon made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

          dogfood added a comment -

          Integrated in jenkins_main_trunk #1334
          [FIXED JENKINS-2548] Slaves taken offline for low disk space will now

          Andrew Bayer : e38e687d5b66238f406d1e3642a3d5f6a02aaeb2
          Files :

          • core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java
          • core/src/main/java/hudson/node_monitors/AbstractDiskSpaceMonitor.java
          • changelog.html
          • core/src/main/resources/hudson/node_monitors/Messages.properties

          dogfood added a comment - Integrated in jenkins_main_trunk #1334 [FIXED JENKINS-2548] Slaves taken offline for low disk space will now Andrew Bayer : e38e687d5b66238f406d1e3642a3d5f6a02aaeb2 Files : core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java core/src/main/java/hudson/node_monitors/AbstractDiskSpaceMonitor.java changelog.html core/src/main/resources/hudson/node_monitors/Messages.properties

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          core/src/main/java/hudson/node_monitors/AbstractDiskSpaceMonitor.java
          core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java
          core/src/main/resources/hudson/node_monitors/Messages.properties
          http://jenkins-ci.org/commit/jenkins/706b2dfd71904224399e52843233c12e219803e4
          Log:
          Revert "[FIXED JENKINS-2548] Slaves taken offline for low disk space will now"

          This reverts commit e38e687d5b66238f406d1e3642a3d5f6a02aaeb2.

          Compare: https://github.com/jenkinsci/jenkins/compare/e38e687...706b2df

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html core/src/main/java/hudson/node_monitors/AbstractDiskSpaceMonitor.java core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java core/src/main/resources/hudson/node_monitors/Messages.properties http://jenkins-ci.org/commit/jenkins/706b2dfd71904224399e52843233c12e219803e4 Log: Revert " [FIXED JENKINS-2548] Slaves taken offline for low disk space will now" This reverts commit e38e687d5b66238f406d1e3642a3d5f6a02aaeb2. Compare: https://github.com/jenkinsci/jenkins/compare/e38e687...706b2df
          Kohsuke Kawaguchi made changes -
          Resolution Original: Fixed [ 1 ]
          Status Original: Resolved [ 5 ] New: Reopened [ 4 ]

          dogfood added a comment -

          Integrated in jenkins_main_trunk #1335
          Revert "[FIXED JENKINS-2548] Slaves taken offline for low disk space will now"

          Kohsuke Kawaguchi : 706b2dfd71904224399e52843233c12e219803e4
          Files :

          • core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java
          • changelog.html
          • core/src/main/resources/hudson/node_monitors/Messages.properties
          • core/src/main/java/hudson/node_monitors/AbstractDiskSpaceMonitor.java

          dogfood added a comment - Integrated in jenkins_main_trunk #1335 Revert " [FIXED JENKINS-2548] Slaves taken offline for low disk space will now" Kohsuke Kawaguchi : 706b2dfd71904224399e52843233c12e219803e4 Files : core/src/main/java/hudson/node_monitors/AbstractNodeMonitorDescriptor.java changelog.html core/src/main/resources/hudson/node_monitors/Messages.properties core/src/main/java/hudson/node_monitors/AbstractDiskSpaceMonitor.java

          Andrew Bayer added a comment -

          kohsuke - what would be the best way to record in the DiskSpace OfflineCause which specific monitor is the reason? Subclassing it further, or adding a flag of some sort?

          Andrew Bayer added a comment - kohsuke - what would be the best way to record in the DiskSpace OfflineCause which specific monitor is the reason? Subclassing it further, or adding a flag of some sort?

          I think we need Computers to treat NodeMonitors as something special. We can have Computers remember the set of NodeMonitors that raising a red flag, and isOffline() would check if this set is empty. This leaves "temporarily offline" concept for administrator's use alone.

          This also means NodeMonitors should have a backdoor to raise/drop this red flag, and existing NodeMonitors should be modified to use this mechanism so that automatic on/off and administrative manual on/off will not collide with each other.

          I think such a distinction is the only way to make it work correctly in the presence of multiple node monitors reporting problems.

          Kohsuke Kawaguchi added a comment - I think we need Computers to treat NodeMonitors as something special. We can have Computers remember the set of NodeMonitors that raising a red flag, and isOffline() would check if this set is empty. This leaves "temporarily offline" concept for administrator's use alone. This also means NodeMonitors should have a backdoor to raise/drop this red flag, and existing NodeMonitors should be modified to use this mechanism so that automatic on/off and administrative manual on/off will not collide with each other. I think such a distinction is the only way to make it work correctly in the presence of multiple node monitors reporting problems.

            abayer Andrew Bayer
            manderson23 manderson23
            Votes:
            8 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: