Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64669

Jenkins 2.263.2 (LTS): "Too many open files" error; files not closed after downloading archived artifacts

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • core
    • 2.276 and 2.263.3

      The Jenkins master does not close files / release file descriptors after archived artifacts have been downloaded. When the OS hard limit for open files have been reached, the Jenkins master starts to fail, giving 404-responses to several components in the UI, and eventually becomes unresponsive.

      In the Jenkins system log, an increasing number "Too many open files" messages can be observed in various contexts:

      2021-01-17 03:12:56.755+0000 [id=9]     WARNING o.e.j.server.AbstractConnector#handleAcceptFailure:
      java.io.IOException: Too many open files
              at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
              at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:421)
              at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:249)
              at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:388)
              at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:702)
              at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
              at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
              at java.lang.Thread.run(Thread.java:748)
      2021-01-17 03:12:57.888+0000 [id=104]   WARNING o.j.p.s.EventHistoryStore$DeleteStaleHistoryTask#run: Error deleting stale/expired events from EventHistoryStore.
      java.nio.file.FileSystemException: /var/lib/jenkins/logs/sse-events: Too many open files
              at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
              at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
              at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
              at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
              at java.nio.file.Files.newDirectoryStream(Files.java:457)
              at org.jenkinsci.plugins.ssegateway.EventHistoryStore.createDirectoryStream(EventHistoryStore.java:176)
              at org.jenkinsci.plugins.ssegateway.EventHistoryStore.deleteAllFilesInDir(EventHistoryStore.java:221)
              at org.jenkinsci.plugins.ssegateway.EventHistoryStore.deleteStaleHistory(EventHistoryStore.java:196)
              at org.jenkinsci.plugins.ssegateway.EventHistoryStore$DeleteStaleHistoryTask.run(EventHistoryStore.java:270)
              at java.util.TimerThread.mainLoop(Timer.java:555)
              at java.util.TimerThread.run(Timer.java:505)
      

      The bug have been observed in multiple production masters with slightly different configurations (major components are all of the same version as specified in environment above). The bug have been reproduced in a newly installed VM and Jenkins with default plugins, as well as the official Jenkins Docker image jenkins/jenkins:lts.

      The open files are not affected by a forced (.../gc) garbage collect. A full .../safeRestart is needed to close the files and release the file descriptors. Archived artifacts can be viewed via the "view" link without leaking resources.

        1. 404s.png
          404s.png
          204 kB
        2. file-leak-detector_logs.zip
          926 kB
        3. jenkins.zip
          23 kB
        4. jobs_to_reproduce.tgz
          2 kB
        5. Plot.png
          Plot.png
          23 kB

          [JENKINS-64669] Jenkins 2.263.2 (LTS): "Too many open files" error; files not closed after downloading archived artifacts

          Added some jobs used to reproduce the bug in a fresh "vanilla" setup. The jobs are based on the "diagnosis" part of this article https://wiki.jenkins.io/display/JENKINS/I%27m+getting+too+many+open+files+error . The plot-job requires the Plot plugin.

          Also added a jenkins.log and a File Leak Detector plugin trace log and error dump of a crash in the test/reproduction env.

           

          The default "Max open files" hard limit per process in CentOS is 4096, which can be reached rather quickly. The containerized version seems to have a limit of 1048576 open files, which will take longer - if nothing else breaks on the way. The plot of files kept open by the Jenkins master Java process shows the same pattern however, so the resource leakage is there as well.

          (The plateau at 1600 open files was caused by the VM-based master crashing; the two test env's were downloading artifacts from eachother  )

          Jesper Andersson added a comment - Added some jobs used to reproduce the bug in a fresh "vanilla" setup. The jobs are based on the "diagnosis" part of this article https://wiki.jenkins.io/display/JENKINS/I%27m+getting+too+many+open+files+error  . The plot-job requires the Plot plugin. Also added a jenkins.log and a File Leak Detector plugin trace log and error dump of a crash in the test/reproduction env.   The default "Max open files" hard limit per process in CentOS is 4096, which can be reached rather quickly. The containerized version seems to have a limit of 1048576 open files, which will take longer - if nothing else breaks on the way. The plot of files kept open by the Jenkins master Java process shows the same pattern however, so the resource leakage is there as well. (The plateau at 1600 open files was caused by the VM-based master crashing; the two test env's were downloading artifacts from eachother  )

            Unassigned Unassigned
            njesper Jesper Andersson
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: