Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-45057

"too many files open": file handles leak, job output file not closed

      Jenkins seems to keep a open file handle to the log file (job output) for every single build, even those who have been discarded by the "Discard old build policy".

       

      This is a sample of the lsof output (whole file attached)

      java 8870 jenkins 941w REG 252,0 1840 1332171 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/.50063/log (deleted)
      java 8870 jenkins 942w REG 252,0 2023 402006 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/.50044/log (deleted)
      java 8870 jenkins 943w REG 252,0 2193 1332217 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/50101/log
      java 8870 jenkins 944w REG 252,0 2512 1332247 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/50106/log
      java 8870 jenkins 945w REG 252,0 1840 1703994 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/.50067/log (deleted)
      java 8870 jenkins 946w REG 252,0 2350 1332230 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/.50092/log (deleted)
      java 8870 jenkins 947w REG 252,0 1840 402034 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/.50049/log (deleted)
      java 8870 jenkins 948w REG 252,0 1840 927855 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/.50080/log (deleted)
      java 8870 jenkins 949w REG 252,0 2195 1332245 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/.50095/log (deleted)
      java 8870 jenkins 950w REG 252,0 2326 1332249 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/50107/log
      java 8870 jenkins 952w REG 252,0 2195 1332227 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/50102/log
      java 8870 jenkins 953w REG 252,0 2154 1332254 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/50109/log
      java 8870 jenkins 954w REG 252,0 2356 1332282 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/50105/log
      

       

          [JENKINS-45057] "too many files open": file handles leak, job output file not closed

          Oleg Nenashev added a comment -

          Which Build Discarder do you use in your job?

          Oleg Nenashev added a comment - Which Build Discarder do you use in your job?

          Bruno Bonacci added a comment -

          I'm using the default build discarder.

          Bruno Bonacci added a comment - I'm using the default build discarder.

          Daniel Beck added a comment -

          Please provide a list of installed plugins, and a sample configuration file of an affected job. Does this happen with all jobs?

          Daniel Beck added a comment - Please provide a list of installed plugins, and a sample configuration file of an affected job. Does this happen with all jobs?

          Jonas Jonsson added a comment - - edited

          Here's a very simple way of getting this:

          Create a simple FreeStyle job (NOTHING else but default settings) in Jenkins that only contains the following System Groovy Script:
          /*
           * See if Jenkins/Groovy leaves files open.
           */
          import hudson.model.*

          def thr = Thread.currentThread()
          def build = thr?.executable
          def jobName = build.parent.builds[0].properties.get("envVars").get("JOB_NAME")
          def jobNr = build.parent.builds[0].properties.get("envVars").get("BUILD_NUMBER")
          println "This is " + jobName + " running for the $jobNr:th time"
           

          That's it.  For every time I run this job, I get three (3!!) new open files in /proc/$PID_OF_JENKINS that points to the "log" file of the job.

          Linux (Ubuntu-14.04.5 LTS) 4.4.0-79 kernel
          Java version: 1.8.0_131-b11
          Jenkins-version: 2.66
          Groovy-plugin: 2.0
          System groovy version: 1.8.6

          Jonas Jonsson added a comment - - edited Here's a very simple way of getting this: Create a simple FreeStyle job (NOTHING else but default settings) in Jenkins that only contains the following System Groovy Script : /*  * See if Jenkins/Groovy leaves files open.  */ import hudson.model.* def thr = Thread.currentThread() def build = thr?.executable def jobName = build.parent.builds [0] .properties.get("envVars").get("JOB_NAME") def jobNr = build.parent.builds [0] .properties.get("envVars").get("BUILD_NUMBER") println "This is " + jobName + " running for the $jobNr:th time"   That's it.  For every time I run this job, I get three (3!!) new open files in /proc/$PID_OF_JENKINS that points to the "log" file of the job. Linux (Ubuntu-14.04.5 LTS) 4.4.0-79 kernel Java version: 1.8.0_131-b11 Jenkins-version: 2.66 Groovy-plugin: 2.0 System groovy version: 1.8.6

          Jonas Jonsson added a comment -

          The rationale for JENKINS-42934 was to avoid using close() on files, was this change taken a bit to far?

          Jonas Jonsson added a comment - The rationale for JENKINS-42934 was to avoid using close() on files, was this change taken a bit to far?

          Jonas Jonsson added a comment -

          Our problems started when stepping the Jenkins-version from 2.51 to 2.58.

          Currently our production Jenkins must be restarted after about ten days.

          Jonas Jonsson added a comment - Our problems started when stepping the Jenkins-version from 2.51 to 2.58. Currently our production Jenkins must be restarted after about ten days.

          Daniel Beck added a comment -

          jonasatwork Would be helpful if you could narrow this down further.

          Daniel Beck added a comment - jonasatwork Would be helpful if you could narrow this down further.

          Jonas Jonsson added a comment -

          I also notice that at the same time, (May 3), we also updated the Groovy plugin from 1.30 to 2.0.  A co-worker has tried the script above on an other Jenkins, running 2.37 but with Groovy 1.30, without having this issue.

          Unfortunately I will probably not have more time before August to try to find out what's going on.

          Jonas Jonsson added a comment - I also notice that at the same time, (May 3), we also updated the Groovy plugin from 1.30 to 2.0.  A co-worker has tried the script above on an other Jenkins, running 2.37 but with Groovy 1.30, without having this issue. Unfortunately I will probably not have more time before August to try to find out what's going on.

          Daniel Beck added a comment -

          Our problems started when stepping the Jenkins-version from 2.51 to 2.58.

          and

          tried the script above on an other Jenkins, running 2.37 but with Groovy 1.30, without having this issue

          This does not look like adding useful data. Are any of the versions wrong here?

          Daniel Beck added a comment - Our problems started when stepping the Jenkins-version from 2.51 to 2.58. and tried the script above on an other Jenkins, running 2.37 but with Groovy 1.30, without having this issue This does not look like adding useful data. Are any of the versions wrong here?

          Jonas Jonsson added a comment -

          We (me & colleagues) start to believe that this is caused by the groovy plugin.  I'll know more tomorrow morning.

          Jonas Jonsson added a comment - We (me & colleagues) start to believe that this is caused by the groovy plugin.  I'll know more tomorrow morning.

          Mike Delaney added a comment -

          I'm seeing this as well on Jenkins 2.60.1 LTS on Ubuntu 14.04. When using Jenkins 2.46.2 LTS

          Mike Delaney added a comment - I'm seeing this as well on Jenkins 2.60.1 LTS on Ubuntu 14.04. When using Jenkins 2.46.2 LTS

          Oleg Nenashev added a comment -

          It may possible happen if Groovy overrides Build Log appenders via log decorators, but I am not sure why Groovy plugin would need it

          Oleg Nenashev added a comment - It may possible happen if Groovy overrides Build Log appenders via log decorators, but I am not sure why Groovy plugin would need it

          Jonas Jonsson added a comment -

          From a colleague:  The problem doesn't exist in Jenkins-2.51 (and Groovy-2.0).  Jenkins-2.52 has the problem.

          Hi Jonas, I didn't have a problem with Jenkins 2.51 and Groovy 2.0, but the problem occurred with Jenkins 2.52 and Groovy 2.0. I will downgrade Groovy to a previous version and try these two versions of Jenkins to work out the differences. Regards

          Jonas Jonsson added a comment - From a colleague:  The problem doesn't exist in Jenkins-2.51 (and Groovy-2.0).  Jenkins-2.52 has the problem. Hi Jonas, I didn't have a problem with Jenkins 2.51 and Groovy 2.0, but the problem occurred with Jenkins 2.52 and Groovy 2.0. I will downgrade Groovy to a previous version and try these two versions of Jenkins to work out the differences. Regards

          Daniel Beck added a comment -

          Notably there's absolutely nothing of interest in 2.52: Just a major overhaul of the German localization, other localization fixes, removal of the most incomplete localizations, and this one change in the actual code:

          https://github.com/jenkinsci/jenkins/compare/jenkins-2.51...jenkins-2.52#diff-9fafdcd0712c5a5dab3acb4ea168515aR272

          So this seems to be unrelated to core.

          Daniel Beck added a comment - Notably there's absolutely nothing of interest in 2.52: Just a major overhaul of the German localization, other localization fixes, removal of the most incomplete localizations, and this one change in the actual code: https://github.com/jenkinsci/jenkins/compare/jenkins-2.51...jenkins-2.52#diff-9fafdcd0712c5a5dab3acb4ea168515aR272 So this seems to be unrelated to core.

          Adam Leggo added a comment -

          I have found a solution for the code Jonas provided, I am not sure if it fixes the problem for Bruno since no groovy example has been provided.

          Problem code:

          import hudson.model.*

          def thr = Thread.currentThread()
          def build = thr?.executable
          def jobName = build.parent.builds[0].properties.get("envVars").get("JOB_NAME")
          def jobNr = build.parent.builds[0].properties.get("envVars").get("BUILD_NUMBER")
          println "This is " + jobName + " running for the $jobNr:th time"

           

          Fixed code:

          import hudson.model.*

          def jobName = build.environment.get("JOB_NAME")
          def jobNr = build.environment.get("BUILD_NUMBER")
          println "This is " + jobName + " running for the $jobNr:th time"

           

          No open files found after the fixed job is run.

          The build object is already available for the script to use, so getting it from the currentThread causes a problem. Not sure why.

          Adam Leggo added a comment - I have found a solution for the code Jonas provided, I am not sure if it fixes the problem for Bruno since no groovy example has been provided. Problem code: import hudson.model.* def thr = Thread.currentThread() def build = thr?.executable def jobName = build.parent.builds [0] .properties.get("envVars").get("JOB_NAME") def jobNr = build.parent.builds [0] .properties.get("envVars").get("BUILD_NUMBER") println "This is " + jobName + " running for the $jobNr:th time"   Fixed code: import hudson.model.* def jobName = build.environment.get("JOB_NAME") def jobNr = build.environment.get("BUILD_NUMBER") println "This is " + jobName + " running for the $jobNr:th time"   No open files found after the fixed job is run. The build object is already available for the script to use, so getting it from the currentThread causes a problem. Not sure why.

          Bruno Bonacci added a comment -

          Hi jonasatwork i've tried your test and what I get is 4 new open files rather than the 3 you suggested.

           

          This is the output of the diff between two lsof execution interleaved by one job run with your code

          > java 19008 jenkins 587r REG 252,0 503 395865 /data/jenkins/jobs/automation/jobs/test-open-files/builds/7/log
          > java 19008 jenkins 589r REG 252,0 503 395865 /data/jenkins/jobs/automation/jobs/test-open-files/builds/7/log
          > java 19008 jenkins 590r REG 252,0 503 395865 /data/jenkins/jobs/automation/jobs/test-open-files/builds/7/log
          > java 19008 jenkins 592r REG 252,0 503 395865 /data/jenkins/jobs/automation/jobs/test-open-files/builds/7/log
          

          Bruno Bonacci added a comment - Hi jonasatwork i've tried your test and what I get is 4 new open files rather than the 3 you suggested.   This is the output of the diff between two lsof execution interleaved by one job run with your code > java 19008 jenkins 587r REG 252,0 503 395865 /data/jenkins/jobs/automation/jobs/test-open-files/builds/7/log > java 19008 jenkins 589r REG 252,0 503 395865 /data/jenkins/jobs/automation/jobs/test-open-files/builds/7/log > java 19008 jenkins 590r REG 252,0 503 395865 /data/jenkins/jobs/automation/jobs/test-open-files/builds/7/log > java 19008 jenkins 592r REG 252,0 503 395865 /data/jenkins/jobs/automation/jobs/test-open-files/builds/7/log

          Adam Leggo added a comment -

          Hi bbonacci,

          Can you post an example of your emr-termination-policy groovy code?
          Please provide a list of installed plugins and a sample configuration file of an affected job.

          Adam Leggo added a comment - Hi bbonacci , Can you post an example of your emr-termination-policy groovy code? Please provide a list of installed plugins and a sample configuration file of an affected job.

          Bruno Bonacci added a comment - - edited

          Hi adamleggo,
          the emr-termination-policy is a Freestyle job with a simple (bash) shell script.
          So I've been digging down and I've narrowed down the problem.
          It looks like when the option Use secret text(s) or file(s) is active the file handle leaks.

          Steps to reproduce:

          1. create free style project
          2. add one step with shell script running "echo test"
          3. click on Use secret text(s) or file(s)
          4. save job
          5. count numbers of open files with lsof -p <pid> | wc -l
          6. build job
          7. count numbers of open files with lsof -p <pid> | wc -l
          8. repeat last two steps.

          In my environment 1 file (the build log) handle is always leaked.

          Bruno Bonacci added a comment - - edited Hi adamleggo , the emr-termination-policy is a Freestyle job with a simple (bash) shell script. So I've been digging down and I've narrowed down the problem. It looks like when the option Use secret text(s) or file(s) is active the file handle leaks. Steps to reproduce: create free style project add one step with shell script running "echo test" click on Use secret text(s) or file(s) save job count numbers of open files with lsof -p <pid> | wc -l build job count numbers of open files with lsof -p <pid> | wc -l repeat last two steps. In my environment 1 file (the build log) handle is always leaked.

          Bruno Bonacci added a comment - - edited

          The "secrets" extensions has a feature for which if the secrets appear as output in the log they are replaced with "*******".
          I guess somewhere in there, the log file isn't closed properly and the file handle leaks.

          Bruno Bonacci added a comment - - edited The "secrets" extensions has a feature for which if the secrets appear as output in the log they are replaced with "*******". I guess somewhere in there, the log file isn't closed properly and the file handle leaks.

          Mike Delaney added a comment -

          I see this behavior without using "secrets" extension.

          Mike Delaney added a comment - I see this behavior without using "secrets" extension.

          Abhishek Mukherjee added a comment - - edited

          We're also seeing this behavior on our jenkins master running 2.60.1. Happy to provide any relevant information if I can be of help, just not sure what to get. Just to put in perspective, we're having to restart our master every ~4 days for one of our very busy jobs, with FD limit already increased to 10k. I believe we are seeing the same thing as Bruno, as we also have secrets bound to these jobs

          Abhishek Mukherjee added a comment - - edited We're also seeing this behavior on our jenkins master running 2.60.1. Happy to provide any relevant information if I can be of help, just not sure what to get. Just to put in perspective, we're having to restart our master every ~4 days for one of our very busy jobs, with FD limit already increased to 10k. I believe we are seeing the same thing as Bruno, as we also have secrets bound to these jobs

          We appear to have upgraded the relevant plugin (Credentials Binding Plugin) to 1.12, if that's relevant

          Abhishek Mukherjee added a comment - We appear to have upgraded the relevant plugin (Credentials Binding Plugin) to 1.12, if that's relevant

          kyle evans added a comment - - edited

          We are also seeing this behavior on Jenkins 2.54 with credentials binding plugin 1.12.

           

          Edit: this also seems to be the same issue: https://issues.jenkins-ci.org/browse/JENKINS-43199

          Also, there is a discussion around a pull request here: https://github.com/jenkinsci/credentials-binding-plugin/pull/37

          kyle evans added a comment - - edited We are also seeing this behavior on Jenkins 2.54 with credentials binding plugin 1.12.   Edit: this also seems to be the same issue: https://issues.jenkins-ci.org/browse/JENKINS-43199 Also, there is a discussion around a pull request here:  https://github.com/jenkinsci/credentials-binding-plugin/pull/37

          Oleg Nenashev added a comment -

          There are more and more reports in JENKINS-43199, and the maintainer rejects to apply the hotfix in his plugin. So we may have to fix it in the core ("we" === "Jenkins community", feel free to contribute)

          Oleg Nenashev added a comment - There are more and more reports in JENKINS-43199 , and the maintainer rejects to apply the hotfix in his plugin. So we may have to fix it in the core ("we" === "Jenkins community", feel free to contribute)

          mishal shah added a comment -

          Is it possible to downgrade a plugin to resolve the fd leak? Has anyone tried downgrading from credentials binding plugin 1.12 to credentials binding plugin 1.11, did it help resolve this issue? Thanks! We have to restart our Jenkins once a day.

          mishal shah added a comment - Is it possible to downgrade a plugin to resolve the fd leak? Has anyone tried downgrading from credentials binding plugin 1.12 to credentials binding plugin 1.11, did it help resolve this issue? Thanks! We have to restart our Jenkins once a day.

          My team has tried downgrading to 1.11, but it did not help any. We're hoping to take our first stab at making a pull request for this sometime this week – we've never done any jenkins core changes, however, so no promises

          Abhishek Mukherjee added a comment - My team has tried downgrading to 1.11, but it did not help any. We're hoping to take our first stab at making a pull request for this sometime this week – we've never done any jenkins core changes, however, so no promises

          mishal shah added a comment -

          abhishekmukherg Good Luck! Looking forward to your fix and Thanks! 

          mishal shah added a comment - abhishekmukherg Good Luck! Looking forward to your fix and Thanks! 

          We are also hit hard by this running LTS 2.60.1. I wonder if this Blocker was fixed in 2.60.2? The Changelog does not look promising, but how can a LTS Version be released with this known issue . We do NOT have the Credentials Binding Plugin installed at all.

          Andreas Mandel added a comment - We are also hit hard by this running LTS 2.60.1. I wonder if this Blocker was fixed in 2.60.2? The Changelog does not look promising, but how can a LTS Version be released with this known issue . We do NOT have the Credentials Binding Plugin installed at all.

          mishal shah added a comment - - edited

          bbonacci, What is the process on getting this issue escalated to be fixed soon?

          mishal shah added a comment - - edited bbonacci , What is the process on getting this issue escalated to be fixed soon?

          Oleg Nenashev added a comment -

          shahmishal This is an open-source project, there is no escalation process. The best way to help with this issue is to Participate and Contribute. In the Jenkins community we always encourage it.

          P.S: If you want to do escalations, there are companies offering commercial support

           

          Oleg Nenashev added a comment - shahmishal This is an open-source project, there is no escalation process. The best way to help with this issue is to Participate and Contribute . In the Jenkins community we always encourage it. P.S: If you want to do escalations, there are companies offering commercial support  

          mishal shah added a comment -

          oleg_nenashev Thanks!

          mishal shah added a comment - oleg_nenashev Thanks!

          Alex Raddas added a comment -

          I am also encountering this issue in our production environment, the master is hitting 16k FDs open every 5hrs which requires a restart of the jenkins service prior to that.

          Alex Raddas added a comment - I am also encountering this issue in our production environment, the master is hitting 16k FDs open every 5hrs which requires a restart of the jenkins service prior to that.

          We started seeing the issue after upgrading Jenkins from 2.46.2 LTS to 2.60.2 LTS and SSH Slaves plugin from 1.9 to 1.20. Credentials Binding plugin was NOT upgraded.

          If the bug is in Credentials Binding plugin why it didn't appear before?

          Volodymyr Sobotovych added a comment - We started seeing the issue after upgrading Jenkins from 2.46.2 LTS to 2.60.2 LTS and SSH Slaves plugin from 1.9 to 1.20. Credentials Binding plugin was NOT upgraded. If the bug is in Credentials Binding plugin why it didn't appear before?

          Daniel Beck added a comment -

          Would be interesting to know whether this started between 2.52 (unaffected) and 2.53 (affected). If so, JENKINS-42934 would be a likely culprit. jonasatwork reported that 2.52 was the first to be affected, I wonder whether that report was off by one.

          Daniel Beck added a comment - Would be interesting to know whether this started between 2.52 (unaffected) and 2.53 (affected). If so, JENKINS-42934 would be a likely culprit. jonasatwork reported that 2.52 was the first to be affected, I wonder whether that report was off by one.

          Sascha Retter added a comment -

          Currently I have no answer for that question.

          What I can say the problem is reproducible on 2.60.2 but it isn't on 2.50.

          On both versions if I start build of a job with the above mentioned groovy the number of file handles used by jenkins process increases but on 2.50 it also decreases after a while - it looks like not immediately after the job finished - but on 2.60.2 it only increases and never decreases.

          I'll try to find some time to check for 2.52 and 2.53 or ask a colleague to do so.

          Sascha Retter added a comment - Currently I have no answer for that question. What I can say the problem is reproducible on 2.60.2 but it isn't on 2.50. On both versions if I start build of a job with the above mentioned groovy the number of file handles used by jenkins process increases but on 2.50 it also decreases after a while - it looks like not immediately after the job finished - but on 2.60.2 it only increases and never decreases. I'll try to find some time to check for 2.52 and 2.53 or ask a colleague to do so.

          Carles Capdevila added a comment - - edited

          EDIT: Earlier in this comment I was saying that the problem could be reproduced in 2.52. I was wrong. I accidentally shuffled the war's name and didn't notice the version. I apologize to anyone who took the time to verify this.

           

          I tested this on 2.52 and 2.53 as saretter asked me:

          In 2.52 could not reproduce the problem.

          In 2.53 could reproduce it.

          As danielbeck suggested, https://issues.jenkins-ci.org/browse/JENKINS-42934 might be related to this.

          Carles Capdevila added a comment - - edited EDIT: Earlier in this comment I was saying that the problem could be reproduced in 2.52. I was wrong. I accidentally shuffled the war's name and didn't notice the version. I apologize to anyone who took the time to verify this.   I tested this on 2.52 and 2.53 as saretter asked me: In 2.52 could not reproduce the problem. In 2.53 could reproduce it. As danielbeck suggested, https://issues.jenkins-ci.org/browse/JENKINS-42934 might be related to this.

          bbonacci carlescapdevila would be great if you can use git bisect to find out even more precisely which commit introduced this. If you're unclear on how to use it, I can provide/write a documentation for it. Thanks!

          Baptiste Mathus added a comment - bbonacci carlescapdevila would be great if you can use git bisect to find out even more precisely which commit introduced this. If you're unclear on how to use it, I can provide/write a documentation for it. Thanks!

          Daniel Beck added a comment -

          use git bisect to find out even more precisely which commit introduced this

          Or just test https://github.com/jenkinsci/jenkins/commit/bde09f70afaf10d5e1453c257058a56b07556e8e which is assumed to break, and https://github.com/jenkinsci/jenkins/commit/0ddf2d5be77072264845a5f4cf197d91d32e4695 which is assumed to not break, to begin with, and see whether that's the cause.

          Daniel Beck added a comment - use git bisect to find out even more precisely which commit introduced this Or just test https://github.com/jenkinsci/jenkins/commit/bde09f70afaf10d5e1453c257058a56b07556e8e which is assumed to break, and https://github.com/jenkinsci/jenkins/commit/0ddf2d5be77072264845a5f4cf197d91d32e4695 which is assumed to not break, to begin with, and see whether that's the cause.

          Carles Capdevila added a comment - - edited

          Tested https://github.com/jenkinsci/jenkins/commit/bde09f70afaf10d5e1453c257058a56b07556e8e and it did indeed break, this one https://github.com/jenkinsci/jenkins/commit/0ddf2d5be77072264845a5f4cf197d91d32e4695 was OK.

           

          By the way, I'm using Windows' "handle -s -p <jenkinsPID>" command to detect the file handles. The https://wiki.jenkins.io/display/JENKINS/File+Leak+Detector+Plugin does not show anything when the builds are over, but handle does show an increment of the files long after the builds are over.

           

          UPDATE: Tested https://github.com/jenkinsci/jenkins/commit/a3ef5b6048d66e59e48455b48623e30c14be8df4  - OK

          and then the next https://github.com/jenkinsci/jenkins/commit/f0cd7ae8ff269dd738e3377a62f3fbebebf9aef6 - has the issue, so this commit introduces the leak

          Carles Capdevila added a comment - - edited Tested https://github.com/jenkinsci/jenkins/commit/bde09f70afaf10d5e1453c257058a56b07556e8e and it did indeed break, this one https://github.com/jenkinsci/jenkins/commit/0ddf2d5be77072264845a5f4cf197d91d32e4695 was OK.   By the way, I'm using Windows' "handle -s -p <jenkinsPID>" command to detect the file handles. The https://wiki.jenkins.io/display/JENKINS/File+Leak+Detector+Plugin does not show anything when the builds are over, but handle does show an increment of the files long after the builds are over.   UPDATE: Tested https://github.com/jenkinsci/jenkins/commit/a3ef5b6048d66e59e48455b48623e30c14be8df4  - OK and then the next https://github.com/jenkinsci/jenkins/commit/f0cd7ae8ff269dd738e3377a62f3fbebebf9aef6 - has the issue, so this commit introduces the leak

          Daniel Beck added a comment -

          Daniel Beck added a comment - stephenconnolly PTAL

          carlescapdevila any chance you could try the attached patch against HEAD (probably easy to apply to most versions) and see if that resolves the issue. Seems like there may be some paths where the run's log stream does not get closed correctly

          jenkins-45057.patch

          Stephen Connolly added a comment - carlescapdevila any chance you could try the attached patch against HEAD (probably easy to apply to most versions) and see if that resolves the issue. Seems like there may be some paths where the run's log stream does not get closed correctly jenkins-45057.patch

          Carles Capdevila added a comment - - edited

          Tested against HEAD with the patch applied and no luck. Tested too against 2.53 and 2.60.1 (both of them with the patch) and same, the leak doesn't go away. Thank you very much for the effort nevertheless.

          EDIT: I'm reproducing the issue according to jonasatwork's comment: https://issues.jenkins-ci.org/browse/JENKINS-45057?focusedCommentId=304877&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-304877 (but in my case I'm using windows so I check the file usage with the "handle" command). So could it be due to some kind of interaction with the Groovy Plugin?

          Carles Capdevila added a comment - - edited Tested against HEAD with the patch applied and no luck. Tested too against 2.53 and 2.60.1 (both of them with the patch) and same, the leak doesn't go away. Thank you very much for the effort nevertheless. EDIT: I'm reproducing the issue according to jonasatwork 's comment: https://issues.jenkins-ci.org/browse/JENKINS-45057?focusedCommentId=304877&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-304877 (but in my case I'm using windows so I check the file usage with the "handle" command). So could it be due to some kind of interaction with the Groovy Plugin?

          carlescapdevila so one interesting thing is that most of the file handles look ok except for the emr-termination-policy files.

          There are 406 file handles open of the type:

          java    8870 jenkins  991w   REG              252,0        2194 1332198 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/.50086/log (deleted)

          So these are file handles open on a file that appears to be deleted!

          406 of them to be precise:

          $ grep "(deleted)" filesopen.txt | wc -l
               406

          And all but two of them are emr-termination-policy 

          $ grep "(deleted)" filesopen.txt | grep emr-termination-policy | wc -l
               404
          $ grep "(deleted)" filesopen.txt | grep optimus | wc -l
                 2

          When we look at the file handles, these are WRITE file handles, so the file handle has to be opened inside Run.execute()

          Just to confirm, these two jobs are Freestyle jobs and not Pipeline jobs?

           

          Stephen Connolly added a comment - carlescapdevila so one interesting thing is that most of the file handles look ok except for the emr-termination-policy files. There are 406 file handles open of the type: java 8870 jenkins 991w REG 252,0 2194 1332198 /data/jenkins/jobs/automation/jobs/emr-termination-policy/builds/.50086/log (deleted) So these are file handles open on a file that appears to be deleted! 406 of them to be precise: $ grep "(deleted)" filesopen.txt | wc -l 406 And all but two of them are emr-termination-policy   $ grep "(deleted)" filesopen.txt | grep emr-termination-policy | wc -l 404 $ grep "(deleted)" filesopen.txt | grep optimus | wc -l 2 When we look at the file handles, these are WRITE file handles, so the file handle has to be opened inside Run.execute() Just to confirm, these two jobs are Freestyle jobs and not Pipeline jobs?  

          Hmmm... digging some more, the Run.delete() method will do a rename of the build directory from /XXX/ to /.XXX so this looks very much like delete() is not waiting for the running job to complete

          Stephen Connolly added a comment - Hmmm... digging some more, the Run.delete() method will do a rename of the build directory from /XXX/ to /.XXX so this looks very much like delete() is not waiting for the running job to complete

          Hmmm I wonder if emr-termination-policy has:

          • a short log rotation period,
          • runs almost continually
          • uses a Notifier that perhaps should be a Publisher?

          What Post-build actions do you have configured carlescapdevila?

          Stephen Connolly added a comment - Hmmm I wonder if emr-termination-policy has: a short log rotation period, runs almost continually uses a Notifier that perhaps should be a Publisher? What Post-build actions do you have configured carlescapdevila ?

          bbonacci sorry just realized that you are the one with the emr-termination-policy job

          Stephen Connolly added a comment - bbonacci sorry just realized that you are the one with the emr-termination-policy job

          So https://github.com/jenkinsci/jenkins/pull/2953 should fix the credentials binding plugin issue in any case... though I think that it should be part of the contract of plugins annotating the console that they pass the close through, so in a sense it is a defensive core change and jglick should just merge https://github.com/jenkinsci/credentials-binding-plugin/pull/37

          Stephen Connolly added a comment - So  https://github.com/jenkinsci/jenkins/pull/2953  should fix the credentials binding plugin issue in any case... though I think that it should be part of the contract of plugins annotating the console that they pass the close through, so in a sense it is a defensive core change and jglick should just merge  https://github.com/jenkinsci/credentials-binding-plugin/pull/37

          Jonas Jonsson added a comment -

          Hi, finally back after some holiday

          Yes, the job is a freestyle job, just as I wrote.  I took a quick look at the changes that was done in JENKINS-42934 and noticed that on a few places, the "close()" calls on created files has been removed completely, earlier on they lived in some "finally{}" blocks in the code.  The reason for the change is ([https://bugs.openjdk.java.net/browse/JDK-8080225|https://bugs.openjdk.java.net/browse/JDK-8080225)]. I don't interpret that the calls to close() should be removed just because of this change.

          Jonas Jonsson added a comment - Hi, finally back after some holiday Yes, the job is a freestyle job, just as I wrote.  I took a quick look at the changes that was done in JENKINS-42934 and noticed that on a few places, the "close()" calls on created files has been removed completely, earlier on they lived in some "finally{}" blocks in the code.  The reason for the change is ([https://bugs.openjdk.java.net/browse/JDK-8080225| https://bugs.openjdk.java.net/browse/JDK-8080225 )]. I don't interpret that the calls to close() should be removed just because of this change.

          jonasatwork pointers to the cases you believe a handle is escaping?

          Stephen Connolly added a comment - jonasatwork pointers to the cases you believe a handle is escaping?

          jonasatwork keep in mind that we moved from 

          InputStream is = ...;
          try {
            ...
          } finally {
            is.close();
          }

          to try-with-resources:

          try (InputStream is = ...) {
            ...
          }

          So expect those close calls to be handled by try-with-resources

           

          Stephen Connolly added a comment - jonasatwork keep in mind that we moved from  InputStream is = ...; try { ... } finally { is.close(); } to try-with-resources: try (InputStream is = ...) { ... } So expect those close calls to be handled by try-with-resources  

          Jonas Jonsson added a comment -

          Sorry, I'm not really up to date with java ...

          Jonas Jonsson added a comment - Sorry, I'm not really up to date with java ...

          Ok, so the Groovy leak appears to be an issue with Stapler!!!

          Groovy is querying the properties and discovers the `getLogText()` method which results in stapler opening the read handle using a FileInputStream... which is then pending finalization at which point the file handle will be released...

          IOW this is a replica of JENKINS-42934 only against Stapler... it being https://github.com/stapler/stapler/blob/3ac71dce264da052186956ef06b772a91ca74d5e/core/src/main/java/org/kohsuke/stapler/framework/io/LargeText.java#L457-L467 that is responsible for the leak!!!

          Stephen Connolly added a comment - Ok, so the Groovy leak appears to be an issue with Stapler!!! Groovy is querying the properties and discovers the `getLogText()` method which results in stapler opening the read handle using a FileInputStream... which is then pending finalization at which point the file handle will be released... IOW this is a replica of JENKINS-42934 only against Stapler... it being https://github.com/stapler/stapler/blob/3ac71dce264da052186956ef06b772a91ca74d5e/core/src/main/java/org/kohsuke/stapler/framework/io/LargeText.java#L457-L467  that is responsible for the leak!!!

          Jesse Glick added a comment -

          Well I suppose the workaround is to use the less obtuse

          def jobName = build.parent.builds[0].envVars.JOB_NAME

          If you are using a sandboxed script, well DefaultGroovyMethods.getProperties(Object) is already blacklisted so you could not make this mistake to begin with.

          Jesse Glick added a comment - Well I suppose the workaround is to use the less obtuse def jobName = build.parent.builds[0].envVars.JOB_NAME If you are using a sandboxed script, well DefaultGroovyMethods.getProperties(Object) is already blacklisted so you could not make this mistake to begin with.

          Loos like we had been hit by a different side-effect of the identified change. All "OutputStream" returned from instance of hudson.console.ConsoleLogFilter must close the wrapped OutputStream, when close is called. Could be it was expected before - now with core 2.53ff it leads to leak of file handles if you miss this.

          This change fixed the issue for our plugin: https://github.com/SoftwareBuildService/log-file-filter-plugin/commit/c1148435a454aa5a3a72bab05c3a6996ea5f42f5

          Andreas Mandel added a comment - Loos like we had been hit by a different side-effect of the identified change. All "OutputStream" returned from instance of hudson.console.ConsoleLogFilter must close the wrapped OutputStream, when close is called. Could be it was expected before - now with core 2.53ff it leads to leak of file handles if you miss this. This change fixed the issue for our plugin: https://github.com/SoftwareBuildService/log-file-filter-plugin/commit/c1148435a454aa5a3a72bab05c3a6996ea5f42f5

          Oleg Nenashev added a comment - - edited

          It should be solved by https://github.com/jenkinsci/jenkins/pull/2954 in the 2.73

          Oleg Nenashev added a comment - - edited It should be solved by https://github.com/jenkinsci/jenkins/pull/2954 in the 2.73

          Jesse Glick added a comment -

          Should be fixed, but needs verification if there is a reproducible test case.

          Jesse Glick added a comment - Should be fixed, but needs verification if there is a reproducible test case.

          Oleg Nenashev added a comment -

          It didn't get into 2.60.3 since it was fixed/integrated too late. It will be a candidate for the next baseline

          Oleg Nenashev added a comment - It didn't get into 2.60.3 since it was fixed/integrated too late. It will be a candidate for the next baseline

          Oleg Nenashev added a comment -

          As jglick says, the patch in credentials binding 1.13 has been released, so the partial fix can be applied via the plugin update.

           

          Oleg Nenashev added a comment - As jglick says, the patch in credentials binding 1.13 has been released, so the partial fix can be applied via the plugin update.  

            jglick Jesse Glick
            bbonacci Bruno Bonacci
            Votes:
            13 Vote for this issue
            Watchers:
            28 Start watching this issue

              Created:
              Updated:
              Resolved: