I have a job that dynamically creates jobs using the CLI. I have noticed that since installing the job, which verifies the existing of jobs by calling 'get-job', Jenkins is leaking file descriptors. I am currently making around 40 calls per build, which runs on every CVS commit. I have a job setup to monitor the number of FD's in /proc/$jenkins_pid/fd. Calling garbage collection in the JVM doesn't release the FD's and thus the only cure is to restart Jenkins before the number of files reaches the Open file ulimit. I have set my ulimit to 65356 so I don't have to reboot so frequently. I restarted Jenkins at 7:49 this morning and the file descriptor count is currently at 6147 files; it's now 12:10 in the afternoon, so it's been steadily leaking FD's at approximately 1500 per hour.

          [JENKINS-23248] CLI calls are causing file descriptor leaks.

          Craig Phillips created issue -

          When you look at ls -la /proc/$PID/fd output, can you tell which files are left open? Can you take a diff between two points of time and give us the delta?

          Kohsuke Kawaguchi added a comment - When you look at ls -la /proc/$PID/fd output, can you tell which files are left open? Can you take a diff between two points of time and give us the delta?

          — /tmp/jenk_fd.1 2014-06-01 21:03:34.006155887 +0100
          +++ /tmp/jenk_fd.2 2014-06-01 21:09:27.053382015 +0100
          @@ -52342 +52341,0 @@
          -l-wx------ 1 kcc users 64 Jun 1 20:45 57103 -> socket:[282820885]
          @@ -55657 +55655,0 @@
          lr-x----- 1 kcc users 64 Jun 1 20:59 60087 -> /user1/jenkins/jobs/scm-poll-jenkins-branch-monitor/builds/2014-06-01_21-01-01/log
          @@ -55662 +55660 @@
          lr-x----- 1 kcc users 64 Jun 1 21:01 60091 -> pipe:[282533977]
          +lr-x------ 1 kcc users 64 Jun 1 21:01 60091 -> socket:[283188722]
          @@ -55747 +55745 @@
          lrwx----- 1 kcc users 64 Jun 1 21:03 60169 -> socket:[282813330]
          +lrwx------ 1 kcc users 64 Jun 1 21:03 60169 -> socket:[283188811]
          @@ -55807 +55804,0 @@
          lrwx----- 1 kcc users 64 Jun 1 21:03 60222 -> socket:[282819646]
          @@ -55824,0 +55822 @@
          +lrwx------ 1 kcc users 64 Jun 1 21:04 60239 -> socket:[282821225]
          @@ -55825,0 +55824,5 @@
          +lrwx------ 1 kcc users 64 Jun 1 21:04 60240 -> socket:[282821552]
          +lrwx------ 1 kcc users 64 Jun 1 21:05 60241 -> socket:[282821858]
          +l-wx------ 1 kcc users 64 Jun 1 21:05 60242 -> socket:[282947156]
          +lrwx------ 1 kcc users 64 Jun 1 21:07 60243 -> socket:[282947065]
          +lr-x------ 1 kcc users 64 Jun 1 21:05 60244 -> socket:[283309559]

          This is just a small sample within several minutes of one another. I can send a larger delta if needs be. The leak is now at >60000 files, which you can see from the line numbers in the diff.

          Craig Phillips added a comment - — /tmp/jenk_fd.1 2014-06-01 21:03:34.006155887 +0100 +++ /tmp/jenk_fd.2 2014-06-01 21:09:27.053382015 +0100 @@ -52342 +52341,0 @@ -l-wx------ 1 kcc users 64 Jun 1 20:45 57103 -> socket: [282820885] @@ -55657 +55655,0 @@ lr-x ----- 1 kcc users 64 Jun 1 20:59 60087 -> /user1/jenkins/jobs/scm-poll-jenkins-branch-monitor/builds/2014-06-01_21-01-01/log @@ -55662 +55660 @@ lr-x ----- 1 kcc users 64 Jun 1 21:01 60091 -> pipe: [282533977] +lr-x------ 1 kcc users 64 Jun 1 21:01 60091 -> socket: [283188722] @@ -55747 +55745 @@ lrwx ----- 1 kcc users 64 Jun 1 21:03 60169 -> socket: [282813330] +lrwx------ 1 kcc users 64 Jun 1 21:03 60169 -> socket: [283188811] @@ -55807 +55804,0 @@ lrwx ----- 1 kcc users 64 Jun 1 21:03 60222 -> socket: [282819646] @@ -55824,0 +55822 @@ +lrwx------ 1 kcc users 64 Jun 1 21:04 60239 -> socket: [282821225] @@ -55825,0 +55824,5 @@ +lrwx------ 1 kcc users 64 Jun 1 21:04 60240 -> socket: [282821552] +lrwx------ 1 kcc users 64 Jun 1 21:05 60241 -> socket: [282821858] +l-wx------ 1 kcc users 64 Jun 1 21:05 60242 -> socket: [282947156] +lrwx------ 1 kcc users 64 Jun 1 21:07 60243 -> socket: [282947065] +lr-x------ 1 kcc users 64 Jun 1 21:05 60244 -> socket: [283309559] This is just a small sample within several minutes of one another. I can send a larger delta if needs be. The leak is now at >60000 files, which you can see from the line numbers in the diff.

          Running garbage collection in the JVM doesn't clear them down either. I think I mentioned this already.

          Craig Phillips added a comment - Running garbage collection in the JVM doesn't clear them down either. I think I mentioned this already.

          Daniel Beck added a comment -

          Jenkins version?

          Daniel Beck added a comment - Jenkins version?

          1.564

          Craig Phillips added a comment - 1.564

          Just upgraded to 1.565 and the issue still persists.

          Craig Phillips added a comment - Just upgraded to 1.565 and the issue still persists.

          Here is a delta with approximately an hour between each snapshot.

          Craig Phillips added a comment - Here is a delta with approximately an hour between each snapshot.
          Craig Phillips made changes -
          Attachment New: jkfd.txt [ 26025 ]

          I know what the cause was. I found this article: https://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build

          I'd say the article needs updating. It assumes that the bug/feature is only prevalent for sub processes that hold pipes open that become detached from the main process. This is not entirely true, since I was able to reproduce the same effect by running the CLI utility as a foreground process from a job spawned on the Jenkins server. The output and input from the CLI was attached directly to the build pipeline. There was no detaching or backgrounding going on, simply execute and return the exit code. However, what I think may be happening (I am a Java noob, so this is complete speculation) is that the same bug/feature found in Java that causes the problem for build processes, was emitted by the Java process spawned from running the CLI utility. I suspect this created some kind of circular file descriptor reference inside the JVM, preventing the EOF from being transmitted by the CLI utility when it exited.

          I have cured the problem completely, by modifying my wrapper script to explicitly close all descriptors apart from stdin, stdout and stderr on invocation, before running the Java util. Also, after running the util, stdin, stdout and stderr are then explicitly closed before the shell script exits. Since making this change, the file descriptor leak has been stable at 823 total file descriptors open.

          My guess is that the CLI utility should be doing something similar.

          Craig Phillips added a comment - I know what the cause was. I found this article: https://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build I'd say the article needs updating. It assumes that the bug/feature is only prevalent for sub processes that hold pipes open that become detached from the main process. This is not entirely true, since I was able to reproduce the same effect by running the CLI utility as a foreground process from a job spawned on the Jenkins server. The output and input from the CLI was attached directly to the build pipeline. There was no detaching or backgrounding going on, simply execute and return the exit code. However, what I think may be happening (I am a Java noob, so this is complete speculation) is that the same bug/feature found in Java that causes the problem for build processes, was emitted by the Java process spawned from running the CLI utility. I suspect this created some kind of circular file descriptor reference inside the JVM, preventing the EOF from being transmitted by the CLI utility when it exited. I have cured the problem completely, by modifying my wrapper script to explicitly close all descriptors apart from stdin, stdout and stderr on invocation, before running the Java util. Also, after running the util, stdin, stdout and stderr are then explicitly closed before the shell script exits. Since making this change, the file descriptor leak has been stable at 823 total file descriptors open. My guess is that the CLI utility should be doing something similar.

            kohsuke Kohsuke Kawaguchi
            iwonbigbro Craig Phillips
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: