Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67979

Durable task fails to stop (cleanly) in case of disk full

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • durable-task-plugin
    • None
    • Jenkins LTS 2.319.3
      Plugin: Durable Task 493.v195aefbb0ff2

      A job using docker is running and consuming more and more disk space.
      This results that Jenkins disk partition fills up.
      Affected Jenkins setup: no separate build slave, identical partition for docker (/var/lib/docker) and jenkins (/var/lib/jenkins)

      If the job's step would exit and the script terminated, the next step would have freed disk space.
      Likely this is NOT a trivial problem to solve, as I guess executing the next step might fail (no disk space to create temp-script for next command?).
      As far as I can tell the process was stopped properly, BUT Jenkins did not acknowledges that the process failed and instead logged a lot of errors that consumed more log/disk space.

      Improvement: Better disk-space-nearly-full management avoiding starting build-jobs if there is less than x GiB of disk space.
      Docker was called on command line, so the Jenkins docker plugin does not matter (IMHO).

      I'll now change the Jenkins setup, and isolate the docker daemon's disk usage. Just thought it might help to report the problem.

      Of the build job's log showing the problem:

      wrapper script does not seem to be touching the log file in /var/lib/jenkins/jobs/jobname/workspace/source/docker/builder@tmp/durable-e1e330b5
      23:52:49 (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
      23:52:49 Cannot contact : java.io.IOException: No space left on device
      23:57:49 wrapper script does not seem to be touching the log file in /var/lib/jenkins/jobs/jobname/workspace/source/docker/builder@tmp/durable-e1e330b5
      23:57:49 (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
      00:02:49 wrapper script does not seem to be touching the log file in /var/lib/jenkins/jobs/jobname/workspace/source/docker/builder@tmp/durable-e1e330b5
       ...
      09:57:51  wrapper script does not seem to be touching the log file in /var/lib/jenkins/jobs/jobname/workspace/source/docker/builder@tmp/durable-e1e330b5
      09:57:51  (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
      

      I don't think this is security critical, but could allow denying service on Jenkins instances building public PR-requests (I don't assume that is usual use case or covered in the thread model). E.g. trust your developers to not be evil

      I can provide more information on request.

          [JENKINS-67979] Durable task fails to stop (cleanly) in case of disk full

          Mark Waite added a comment -

          Jenkins has facilities that significantly reduce the risks that you identify. Those facilities include:

          Additional agent monitoring is available from the monitoring plugin and from the open telemetry plugin.

          Mark Waite added a comment - Jenkins has facilities that significantly reduce the risks that you identify. Those facilities include: Always build on agents , not the controller Configure the "Preventive Node Monitoring" ( http://jenkins.example.com/computer/configure ) on your Jenkins controller to take an agent offline automatically if disc space falls below a threshold you define Additional agent monitoring is available from the monitoring plugin and from the open telemetry plugin.

            Unassigned Unassigned
            joda jo da
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: