Hudson 1.284 quickly dies due to 'Too many open files'

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      Very soon after upgrading from 1.283 to 1.284, Hudson became unresponsive, and
      all builds started failing with exceptions like:

      Feb 18, 2009 2:46:35 PM hudson.triggers.SCMTrigger$Runner runPolling
      SEVERE: Failed to record SCM polling
      java.io.FileNotFoundException:
      /buildareas/lh3/hudson/jobs/lh3-int-INTEL-ALL/scm-polling.log (Too many open files)
      at java.io.FileOutputStream.open(Native Method)
      at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
      at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
      at hudson.util.StreamTaskListener.<init>(StreamTaskListener.java:68)
      at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:385)
      at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:426)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
      at java.util.concurrent.FutureTask.run(FutureTask.java:123)
      at
      java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
      at
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
      at java.lang.Thread.run(Thread.java:595)

      ls -la /proc/<hudson pid>/fd output looks like (most of the 1024 file
      descriptors look like temp files):

      hudson argon 64 Feb 18 14:42 1011 ->
      /buildareas/lh3/hudson/jobs/lh3-dev-archive-INTEL-ALL/builds/2009-02-18_13-35-37/workspace-files/c0f5eae8.tmp
      l-wx------ 1 hudson argon 64 Feb 18 14:42 1012 ->
      /buildareas/lh3/hudson/jobs/lh3-dev-archive-INTEL-ALL/builds/2009-02-18_13-35-37/workspace-files/4795c97a.tmp
      l-wx------ 1 hudson argon 64 Feb 18 14:42 1013 ->
      /buildareas/lh3/hudson/jobs/lh3-dev-archive-INTEL-ALL/builds/2009-02-18_13-35-37/workspace-files/5db8770.tmp
      l-wx------ 1 hudson argon 64 Feb 18 14:42 1014 ->
      /buildareas/lh3/hudson/jobs/lh3-dev-archive-INTEL-ALL/builds/2009-02-18_13-35-37/workspace-files/bcee8a1a.tmp
      l-wx------ 1 hudson argon 64 Feb 18 14:42 1015 ->
      /buildareas/lh3/hudson/jobs/lh3-dev-archive-INTEL-ALL/builds/2009-02-18_13-35-37/workspace-files/8ab5224.tmp
      l-wx------ 1 hudson argon 64 Feb 18 14:42 1016 ->
      /buildareas/lh3/hudson/jobs/lh3-dev-archive-INTEL-ALL/builds/2009-02-18_13-35-37/workspace-files/da564121.tmp
      l-wx------ 1 hudson argon 64 Feb 18 14:42 1017 ->
      /buildareas/lh3/hudson/jobs/lh3-dev-archive-INTEL-ALL/builds/2009-02-18_13-35-37/workspace-files/e3edbcfa.tmp

      sudo /usr/sbin/lsof |grep hudson output looks like:

      java 19623 hudson 474w REG 0,21 1945
      5574866
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/ddac3f83.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 475w REG 0,21 3862
      5574868
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/87467c29.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 476w REG 0,21 1617
      5574869
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/aaaec5a5.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 477w REG 0,21 5333
      2419662
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/de42030e.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 478w REG 0,21 7299
      2419663
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/f2996b97.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 479w REG 0,21 4000
      2419664
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/88f7fa2c.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 480w REG 0,21 6024
      2419665
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/325358f9.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 481w REG 0,21 1731
      2419666
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/a9447082.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 482w REG 0,21 12711
      2419667
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/78014870.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 483w REG 0,21 1680
      2419668
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/503b6c8.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 484w REG 0,21 1299
      2419669
      /buildareas/lh3/hudson/jobs/lh3-dev-tasking-INTEL/builds/2009-02-18_13-02-29/workspace-files/445d54b5.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 485w REG 0,21 6004
      18348749
      /buildareas/lh3/hudson/jobs/lh3-dev-results-INTEL-ALL/builds/2009-02-18_13-03-00/workspace-files/4ba6c88d.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 486w REG 0,21 61559
      18348745
      /buildareas/lh3/hudson/jobs/lh3-dev-results-INTEL-ALL/builds/2009-02-18_13-03-00/workspace-files/f66174f.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 487w REG 0,21 152820
      18348746
      /buildareas/lh3/hudson/jobs/lh3-dev-results-INTEL-ALL/builds/2009-02-18_13-03-00/workspace-files/d32ae088.tmp
      (apollo1a:/vol/vol1/buildareas)
      java 19623 hudson 488w REG 0,21 71104
      18348747
      /buildareas/lh3/hudson/jobs/lh3-dev-results-INTEL-ALL/builds/2009-02-18_13-03-00/workspace-files/c8ad4ec5.tmp
      (apollo1a:/vol/vol1/buildareas)

      I'm running several RHEL5 blades in a master/slave configuration, and recently
      experienced a memory leak that was resolved by adding a finalizer.
      (http://www.nabble.com/Possible-memory-leak-in-hudson.remoting.ExportTable-td12000299.html)
      Another "leaked file descriptors" Hudson issue was caused by finalizers, so I
      suspect these issues may be related? I'll play around with garbage collection to
      see if this is the case.

            Assignee:
            Alan Harder
            Reporter:
            bbarlow
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Resolved:
              Archived: