Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22205

Jenkins failed to rotate log error -- unable to delete .nfsxxxxx file with NAS Build Record Root directory

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Component/s: core
    • Labels:
      None
    • Similar Issues:

      Description

      Summary:
      ========
      When Jenkins results are stored on a NAS location, we are seeing very frequent errors (100x/day) when log rotation runs after a job completes. These appear to be a very small (4-second or less) race condition when using NAS storage.

      The net effect is that log rotation does not appear to work properly, at least from the Jenkins perspective, resulting in increasing build records over time.

      The typical symptom, as seen in the Jenkins console, is:

      Mar 15, 2014 10:38:32 PM hudson.model.Run execute
      SEVERE: Failed to rotate log
      java.io.IOException: Unable to delete /zzz/jenkinsarchive/SomeJob/builds/.2014-03-15_20-32-56/.nfs000000000118864100003187
      	at hudson.Util.deleteFile(Util.java:254)
      	at hudson.Util.deleteRecursive(Util.java:301)
      	at hudson.Util.deleteContentsRecursive(Util.java:203)
      	at hudson.Util.deleteRecursive(Util.java:300)
      	at hudson.model.Run.delete(Run.java:1452)
      	at hudson.tasks.LogRotator.perform(LogRotator.java:124)
      	at hudson.model.Job.logRotate(Job.java:440)
      	at hudson.model.Run.execute(Run.java:1739)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:231)
      

      More Details:
      ==============
      We have a good-sized Jenkins install, with several hundred jobs, and due to the archiving size requirements and large # of jobs, we are archiving results to a NAS location.

      The Jenkins Build Record Root Directory option is configured to /zzz/jenkinsarchive/${ITEM_FULLNAME}/builds

      Where /zzz/jenkinsarchive is a symlink to a mount-point location, such as /mnt/foo.com/export/foo/bar/jenkinsarchive

      So records are written to /zzz/jenkinsarchive/SomeJob/builds/... which really evaluates to /mnt/foo.com/export/foo/bar/jenkinsarchive/SomeJob/builds/...

      My understanding of .nfs lock files is that they appear on the NAS when a NAS file is unlinked but is still open by a NFS client. (presumably in this case, Jenkins).

      Using lsof -n -P -N -u someUser -a I can see that normally we have 2 files in-use on the NAS – one at
      jenkinsarchive/SomeJob/builds/2014-03-15_19-19-58/log
      and one at
      jenkinsarchive/SomeJob/builds/2014-03-15_19-19-58/timestamper/timestamps

      but for very short periods of time, when log rotation runs, I see a files at a path like
      jenkinsarchive/SomeJob/builds/.2014-03-15_19-19-58/.nfs000000000009e0cc000003c4

      exist. It appears to only exist for a couple seconds.

      So it seems that something in Jenkins is keeping access open to these files, which then (per NFS semantics) is preventing deletion at the time of the stack trace above.

        Attachments

          Issue Links

            Activity

            Hide
            sroth Steve Roth added a comment -

            We have seen this issue for awhile, including using the most recent 1.554 version of Jenkins.

            Show
            sroth Steve Roth added a comment - We have seen this issue for awhile, including using the most recent 1.554 version of Jenkins.
            Hide
            sroth Steve Roth added a comment -

            Using lsof, I've noticed that most of the time, there are files open to NAS paths of the form .../jenkinsarchive/SomeJob/builds/2014-03-15_21-00-56/... but for very short periods of time, there are files open to NAS paths of the form .../jenkinsarchive/SomeJob/builds/.2014.../

            I interpret this as a potential rename from the earlier format to the secondary format. I wonder if this rename might account for the NFS lock behavior.

            Show
            sroth Steve Roth added a comment - Using lsof, I've noticed that most of the time, there are files open to NAS paths of the form .../jenkinsarchive/SomeJob/builds/2014-03-15_21-00-56/... but for very short periods of time, there are files open to NAS paths of the form .../jenkinsarchive/SomeJob/builds/.2014.../ I interpret this as a potential rename from the earlier format to the secondary format. I wonder if this rename might account for the NFS lock behavior.
            Hide
            sroth Steve Roth added a comment -

            In case it is relevant, I am seeing large numbers of these builds/.2014-xxx directories exist, across our jobs, for days/weeks-- approximately 16000. I suspect this is not expected, so I wanted to mention it.

            Show
            sroth Steve Roth added a comment - In case it is relevant, I am seeing large numbers of these builds/.2014-xxx directories exist, across our jobs, for days/weeks-- approximately 16000. I suspect this is not expected, so I wanted to mention it.
            Hide
            sroth Steve Roth added a comment -

            Given the .nfsxxxx file location at jenkinsarchive/SomeJob/builds/.2014-03-15_19-19-58/.nfs000000000009e0cc000003c4
            it appears the original file location (of the file which was removed) was in the dot-date directory at jenkinsarchive/SomeJob/builds/.2014-03-15_19-19-58

            Show
            sroth Steve Roth added a comment - Given the .nfsxxxx file location at jenkinsarchive/SomeJob/builds/.2014-03-15_19-19-58/.nfs000000000009e0cc000003c4 it appears the original file location (of the file which was removed) was in the dot-date directory at jenkinsarchive/SomeJob/builds/.2014-03-15_19-19-58
            Hide
            jglick Jesse Glick added a comment -

            Symptom sounds similar to JENKINS-12753.

            Show
            jglick Jesse Glick added a comment - Symptom sounds similar to JENKINS-12753 .
            Hide
            sroth Steve Roth added a comment -

            I have not seen this issue in more than a month – closing as CNR. Thanks!

            Show
            sroth Steve Roth added a comment - I have not seen this issue in more than a month – closing as CNR. Thanks!

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              sroth Steve Roth
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: