Disk Space AbstractNodeMonitorDescriptor is unreasonably slow to update

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      DiskSpaceMonitorDescriptor (used to check free space on the temp and workspace partitions) inherits (with a few intermediate classes) from AbstractNodeMonitorDescriptor, whose default scheduling interval is one hour. That means if an agent runs out of space it could take an hour before Jenkins detects the problem and takes the node offline. (There are some other code paths – such as onConnect – that can trigger an update, but I believe one hour remains the worst case.)

      I've tripped into this multiple times where a job fulls up an agent, a subsequent job fails, yet the agent is still marked as online.

      I believe one hour is not a reasonable modern value for such a quick check, but I am unsure how to proceed:

      • Change AbstractNodeMonitorDescriptor to a "more reasonable" default?
      • Make this a configurable value?
      • Make it a configurable value per check?
      • A fancy dynamic scheduler with backoff?

      My own inclination is that one minute would be a reasonable and unsurprising default.

            Assignee:
            Unassigned
            Reporter:
            Chris Burroughs
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Archived: