• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • ec2-plugin
    • Jenkins 1.164.1 - 2.204.1
      EC2 Plugin: 1.42 - 1.49.1

      When the Github Org Folder Scan runs and tries to clean up old branch and PR builds, we see errors like ".nfsXXXXX: Device or Resource Busy".  See stack trace below:

      FATAL: Failed to recompute children of Bluton Pull Requests ยป bluton
      jenkins.util.io.CompositeIOException: Unable to delete '/opt/apache/.jenkins/jobs/Bluton Pull Requests/jobs/bluton/branches/Docker123'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts.
      {{ at jenkins.util.io.PathRemover.forceRemoveRecursive(PathRemover.java:95)}}
      {{ at hudson.Util.deleteRecursive(Util.java:294)}}
      {{ at hudson.Util.deleteRecursive(Util.java:283)}}
      {{ at hudson.model.AbstractItem.performDelete(AbstractItem.java:792)}}
      {{ at org.jenkinsci.plugins.workflow.job.WorkflowJob.performDelete(WorkflowJob.java:652)}}
      {{ at hudson.model.AbstractItem.delete(AbstractItem.java:776)}}
      {{ at hudson.model.Job.delete(Job.java:677)}}
      {{ at com.cloudbees.hudson.plugins.folder.computed.ComputedFolder.updateChildren(ComputedFolder.java:290)}}
      {{ at com.cloudbees.hudson.plugins.folder.computed.FolderComputation.run(FolderComputation.java:165)}}
      {{ at jenkins.branch.MultiBranchProject$BranchIndexing.run(MultiBranchProject.java:1025)}}
      {{ at hudson.model.ResourceController.execute(ResourceController.java:97)}}
      {{ at hudson.model.Executor.run(Executor.java:429)}}
      jenkins.util.io.CompositeIOException: Unable to remove file /opt/apache/.jenkins/jobs/Bluton Pull Requests/jobs/bluton/branches/Docker123/builds/41/.nfs00000000080f52850000009d
      {{ at jenkins.util.io.PathRemover.removeOrMakeRemovableThenRemove(PathRemover.java:248)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveFile(PathRemover.java:201)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:212)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:222)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:211)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:222)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:211)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:222)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:211)}}
      {{ at jenkins.util.io.PathRemover.forceRemoveRecursive(PathRemover.java:92)}}
      {{ at hudson.Util.deleteRecursive(Util.java:294)}}
      {{ at hudson.Util.deleteRecursive(Util.java:283)}}
      {{ at hudson.model.AbstractItem.performDelete(AbstractItem.java:792)}}
      {{ at org.jenkinsci.plugins.workflow.job.WorkflowJob.performDelete(WorkflowJob.java:652)}}
      {{ at hudson.model.AbstractItem.delete(AbstractItem.java:776)}}
      {{ at hudson.model.Job.delete(Job.java:677)}}
      {{ at com.cloudbees.hudson.plugins.folder.computed.ComputedFolder.updateChildren(ComputedFolder.java:290)}}
      {{ at com.cloudbees.hudson.plugins.folder.computed.FolderComputation.run(FolderComputation.java:165)}}
      {{ at jenkins.branch.MultiBranchProject$BranchIndexing.run(MultiBranchProject.java:1025)}}
      {{ at hudson.model.ResourceController.execute(ResourceController.java:97)}}
      {{ at hudson.model.Executor.run(Executor.java:429)}}
      java.nio.file.FileSystemException: /opt/apache/.jenkins/jobs/Bluton Pull Requests/jobs/bluton/branches/Docker123/builds/41/.nfs00000000080f52850000009d: Device or resource busy
      {{ at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)}}
      {{ at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)}}
      {{ at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)}}
      {{ at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)}}
      {{ at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)}}
      {{ at java.nio.file.Files.deleteIfExists(Files.java:1165)}}
      {{ at jenkins.util.io.PathRemover.removeOrMakeRemovableThenRemove(PathRemover.java:233)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveFile(PathRemover.java:201)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:212)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:222)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:211)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:222)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:211)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:222)}}
      {{ at jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:211)}}
      {{ at jenkins.util.io.PathRemover.forceRemoveRecursive(PathRemover.java:92)}}
      {{ at hudson.Util.deleteRecursive(Util.java:294)}}
      {{ at hudson.Util.deleteRecursive(Util.java:283)}}
      {{ at hudson.model.AbstractItem.performDelete(AbstractItem.java:792)}}
      {{ at org.jenkinsci.plugins.workflow.job.WorkflowJob.performDelete(WorkflowJob.java:652)}}
      {{ at hudson.model.AbstractItem.delete(AbstractItem.java:776)}}
      {{ at hudson.model.Job.delete(Job.java:677)}}
      {{ at com.cloudbees.hudson.plugins.folder.computed.ComputedFolder.updateChildren(ComputedFolder.java:290)}}
      {{ at com.cloudbees.hudson.plugins.folder.computed.FolderComputation.run(FolderComputation.java:165)}}
      {{ at jenkins.branch.MultiBranchProject$BranchIndexing.run(MultiBranchProject.java:1025)}}
      {{ at hudson.model.ResourceController.execute(ResourceController.java:97)}}
      {{ at hudson.model.Executor.run(Executor.java:429)}}

       

      When I go look at the open files on the Jenkins java process, I see a bunch of .nfs files which appear to be all related to EC2 instances managed by the EC2 Plugin.

       

      root@foo:/proc/15266/fd# ls -l | grep -i \.nfs
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 101 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-03fd7b1343b0782b8)/.nfs00000000080a6a830000009a
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 108 -> /srv/jenkins/jobs/Bluton Pull Requests/jobs/bluton/branches/Docker123/builds/41/.nfs00000000080f52850000009d
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 109 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-07baa1e7f3d95a94e)/.nfs00000000080ae75e00000095
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 110 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-019098927c3f4c8a9)/.nfs000000000805bd9000000097
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 1413 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-03460b045be5a9faa)/.nfs000000000807291b00000088
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 1423 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-0cfbcbfdf761f9d07)/.nfs000000000807599500000094
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 1450 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-029f40ef47d320089)/.nfs000000000807203f00000087
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 1454 -> /srv/jenkins/jobs/Bluton Pull Requests/jobs/tools/branches/PR-799/builds/24/.nfs0000000008056f6500000092
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 1463 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-0cfbcbfdf761f9d07)/.nfs000000000807598f00000093
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 1487 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-05c354bbe104d7f4b)/.nfs0000000008017b5b00000090
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 1497 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-05f3b733d4a484bb1)/.nfs00000000080429ae0000008d
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 1536 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-0afc0ecc2e3f5e1f6)/.nfs00000000080752030000009b
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 1559 -> /srv/jenkins/jobs/FOO Org (synapse)/jobs/bluton/branches/TCAR12345-Bla.7k9tn5.onfiguration/builds/2/.nfs00000000080ae4fc000000a1
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 1562 -> /srv/jenkins/jobs/FOO Org (synapse)/jobs/bluton/branches/TCAR12345-Bla.7k9tn5.onfiguration/builds/2/.nfs00000000080ae4fd000000a2
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 66 -> /srv/jenkins/jobs/Bluton Pull Requests/jobs/tools/branches/PR-799/builds/24/.nfs0000000008081d1b00000091
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 67 -> /srv/jenkins/jobs/FOO Org (synapse)/jobs/bluton/branches/TCAR12345-Bla.7k9tn5.onfiguration/builds/1/.nfs00000000080b960d0000009f
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 70 -> /srv/jenkins/jobs/FOO Org (synapse)/jobs/bluton/branches/TCAR12345-Bla.7k9tn5.onfiguration/builds/1/.nfs00000000080b9611000000a0
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 79 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-07baa1e7f3d95a94e)/.nfs00000000080ae75800000096
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 83 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-05f3b733d4a484bb1)/.nfs00000000080429aa0000008e
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 84 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-019098927c3f4c8a9)/.nfs000000000805bd9600000098
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 87 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-019098927c3f4c8a9)/.nfs000000000805bd8f00000099
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 93 -> /srv/jenkins/logs/slaves/foo.bluton-large (i-00db25a98b6ffee4c)/.nfs00000000080b9e7800000089
      l-wx------ 1 tomcat tomcat 64 Sep 4 08:58 95 -> /srv/jenkins/jobs/Bluton Pull Requests/jobs/bluton/branches/Docker123/builds/41/.nfs00000000080f52840000009c

          [JENKINS-59233] EC2 Plugin leaking file handles

          Any update on fixing this ticket?  Still seeing this issue running EC2 plugin v1.49.1 and Jenkins 2.204.1

          John Lengeling added a comment - Any update on fixing this ticket?  Still seeing this issue running EC2 plugin v1.49.1 and Jenkins 2.204.1

          Brent added a comment -

          I am having a similar issue on my end, but with `slave.log`

          Jenkins Version 2.222.1

          EC2 Plugin v1.49.1

           

          I did some digging into the code and the `slave.log` is opened as part of the SlaveComputer.java constructor. There is no `try-with-resources` here as is typical when opening an OutputStream. The code appears to rely on the `kill` method being called on the Computer object (which EC2Computer is). This method is called as part of the cleanup code in `updateComputerList` under AbstractCIBase. https://github.com/jenkinsci/jenkins/blob/223f41b371b3f55d4f54b759e2a7b0ecbe0744e7/core/src/main/java/hudson/model/AbstractCIBase.java#L243

           

          The issue occurs when we have a number of computers getting disowned in rapid succession due to failure to connect. Most likely this causes the OutputStream to be deleted before the resources (like file descriptor) can be released correctly.

          Brent added a comment - I am having a similar issue on my end, but with `slave.log` Jenkins Version 2.222.1 EC2 Plugin v1.49.1   I did some digging into the code and the `slave.log` is opened as part of the SlaveComputer.java constructor. There is no `try-with-resources` here as is typical when opening an OutputStream. The code appears to rely on the `kill` method being called on the Computer object (which EC2Computer is). This method is called as part of the cleanup code in `updateComputerList` under AbstractCIBase. https://github.com/jenkinsci/jenkins/blob/223f41b371b3f55d4f54b759e2a7b0ecbe0744e7/core/src/main/java/hudson/model/AbstractCIBase.java#L243   The issue occurs when we have a number of computers getting disowned in rapid succession due to failure to connect. Most likely this causes the OutputStream to be deleted before the resources (like file descriptor) can be released correctly.

          Are the instances terminated in AWS?

          Raihaan Shouhell added a comment - Are the instances terminated in AWS?

          Yes they are terminated through the 5 minute idle time.

          John Lengeling added a comment - Yes they are terminated through the 5 minute idle time.

          BTW the problem seems unrelated to EC2 plugin, we do not open the builds.

          If the instance is terminated thru idletime it calls an internal terminate function which calls removeNode which will close the log https://github.com/jenkinsci/ec2-plugin/blob/61d5aaad3e13fa12416dbe5d8a049468344ca15b/src/main/java/hudson/plugins/ec2/EC2OndemandSlave.java#L92 thru https://github.com/jenkinsci/jenkins/blob/23bfee2fa93279e061553a1394dd6e5ddb967695/core/src/main/java/jenkins/model/Nodes.java#L287

          Is the number of open files increasing to the point where it needs to be restarted?

          Raihaan Shouhell added a comment - BTW the problem seems unrelated to EC2 plugin, we do not open the builds. If the instance is terminated thru idletime it calls an internal terminate function which calls removeNode which will close the log https://github.com/jenkinsci/ec2-plugin/blob/61d5aaad3e13fa12416dbe5d8a049468344ca15b/src/main/java/hudson/plugins/ec2/EC2OndemandSlave.java#L92 thru https://github.com/jenkinsci/jenkins/blob/23bfee2fa93279e061553a1394dd6e5ddb967695/core/src/main/java/jenkins/model/Nodes.java#L287 Is the number of open files increasing to the point where it needs to be restarted?

          Brent added a comment -

          My issue happens when the connection times out. The EC2 is spun up, but Jenkins fails to connect to it, leading to the termination of the EC2. This happens in rapid succession across multiple EC2s until the build queue demand is resolved. The master Jenkins node contains several `slave.log` file handle relics, one for each terminated EC2. This builds up to a critical level and requires a full restart of Jenkins.

          Brent added a comment - My issue happens when the connection times out. The EC2 is spun up, but Jenkins fails to connect to it, leading to the termination of the EC2. This happens in rapid succession across multiple EC2s until the build queue demand is resolved. The master Jenkins node contains several `slave.log` file handle relics, one for each terminated EC2. This builds up to a critical level and requires a full restart of Jenkins.

          raihaan, files handles are leaked every time an EC2 node spins up/shuts down.   

          Yes, we have to restart Jenkins periodically to clean up the .nfs file handles.  I have systems with 130K-500K leaked file handles.  This is very annoying when you have 50 jenkins servers and every one has this issue.

          I assumed this was an issue with EC2 plugin since most of them are related to the EC2 slave log. (For example: /srv/jenkins/logs/slaves/foo.bluton-large (i-07baa1e7f3d95a94e)/.nfs00000000080ae75800000096)   If it isn't the EC2 Plugin, please assign ticket to appropriate component.

           

          John Lengeling added a comment - raihaan , files handles are leaked every time an EC2 node spins up/shuts down.    Yes, we have to restart Jenkins periodically to clean up the .nfs file handles.  I have systems with 130K-500K leaked file handles.  This is very annoying when you have 50 jenkins servers and every one has this issue. I assumed this was an issue with EC2 plugin since most of them are related to the EC2 slave log. (For example: /srv/jenkins/logs/slaves/foo.bluton-large (i-07baa1e7f3d95a94e)/.nfs00000000080ae75800000096)   If it isn't the EC2 Plugin, please assign ticket to appropriate component.  

          Jonas Lind added a comment -

          brentspector , we're seeing the same symptoms with open file descriptors on "slave.log" files and my analysis of source code corresponds with yours.

          We're not using the ec2-plugin but the kubernetes-plugin, but it's so similar to what you're describing that I'm sure it's the same thing.

          Did you make any further progress on fixing or working around the issue?

          Jonas Lind added a comment - brentspector , we're seeing the same symptoms with open file descriptors on "slave.log" files and my analysis of source code corresponds with yours. We're not using the ec2-plugin but the kubernetes-plugin, but it's so similar to what you're describing that I'm sure it's the same thing. Did you make any further progress on fixing or working around the issue?

          Brent added a comment -

          jonaslind we updated to 1.56 (and are currently running 1.63) of the EC2 plugin, and have seen a significant reduction in this issue.

          Brent added a comment - jonaslind we updated to 1.56 (and are currently running 1.63) of the EC2 plugin, and have seen a significant reduction in this issue.

          Jonas Lind added a comment -

          Just for reference in case someone reads this thread and wonders what happened:

          I've created JENKINS-69534 to report the issue we're seeing.

          Jonas Lind added a comment - Just for reference in case someone reads this thread and wonders what happened: I've created JENKINS-69534 to report the issue we're seeing.

            thoulen FABRIZIO MANFREDI
            johnlengeling John Lengeling
            Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: