Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-56625

Too many open file descriptors from embedded build status usage

    XMLWordPrintable

Details

    • v2.0.1

    Description

      We have some users with many Jenkins Jobs they monitor using a custom HTML dashboard full of 100+ embedded build status links, refreshing once a minute. After upgrading from Build status 1.9 to 2.0 we started seeing tomcat/jenkins running into a lot of "too many open files" errors like the following:

      Mar 19, 2019 6:25:55 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor run
      SEVERE: Socket accept failed
      java.io.IOException: Too many open files
              at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
              at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
              at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
              at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:482)
              at java.lang.Thread.run(Thread.java:748)
      
      Mar 19, 2019 6:25:56 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop
      WARNING: RMI TCP Accept-9009: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=9009] throws
      java.net.SocketException: Too many open files (Accept failed)
              at java.net.PlainSocketImpl.socketAccept(Native Method)
              at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
              at java.net.ServerSocket.implAccept(ServerSocket.java:545)
              at java.net.ServerSocket.accept(ServerSocket.java:513)
              at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:405)
              at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:377)
              at java.lang.Thread.run(Thread.java:748)
      

      We looked into the tomcat process's open FDs using lsof and found /srv/jenkins/plugins/embeddable-build-status/fonts/verdana.ttf is the thing getting accessed so much

      # lsof -p $(pgrep java) | awk '{ print $9 }' | sort | uniq -c | sort -n | tail
            2 /usr/java/jre1.8.0_202/lib/ext/sunjce_provider.jar
            2 /usr/java/jre1.8.0_202/lib/ext/sunpkcs11.jar
            2 /usr/java/jre1.8.0_202/lib/jce.jar
            2 /usr/java/jre1.8.0_202/lib/jsse.jar
            2 /usr/java/jre1.8.0_202/lib/resources.jar
            2 /usr/java/jre1.8.0_202/lib/rt.jar
            5 /dev/urandom
            8 anon_inode
           16 pipe
          836 /srv/jenkins/plugins/embeddable-build-status/fonts/verdana.ttf
      

      This is a lower example, we did see up to 6200 open verdana.ttf file descriptors out of the 8192 tomcat & system wide limits.

      To reproduce this I used a test jenkins system which didn't have any of the verdana open FDs, wrote a quick loop to hammer a buildStatus link and watched the tomcat total file descriptors shoot up. I downgraded the plugin 1.9 and confirmed it didn't have this problem, so this is new behavior in 2.0, probably due to the custom text features.

      Jenkins hitting the open file descriptor limits is not always fatal in low doses but when it does hit 8k it often seems to leave the system unresponsive for extended periods of time. These open FDs do get cleaned up when garbage collection runs, however it seems that in our use case it is not always garbage collecting the FDs prior to hitting the limit of 8k.

      We runs lots of jenkins systems and have seen this problem on more than one of them. I bring this up just to point out we don't have one massive monolithic Jenkins, we've scaled horizontally to try to spread jobs out across systems yet we're still encountering this problem on at least 2 of our production instances.

      Attachments

        Activity

          diginc Adam BH created issue -
          diginc Adam BH made changes -
          Field Original Value New Value
          Description We have some users with many Jenkins Jobs they monitor using a custom HTML dashboard full of 100+ embedded build status links, refreshing once a minute. After upgrading from Build status 1.9 to 2.0 we started seeing tomcat/jenkins running into a lot of "too many open files" errors like the following:

          ```
          Mar 19, 2019 6:25:55 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor run
          SEVERE: Socket accept failed
          java.io.IOException: Too many open files
                  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
                  at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
                  at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
                  at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:482)
                  at java.lang.Thread.run(Thread.java:748)

          Mar 19, 2019 6:25:56 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop
          WARNING: RMI TCP Accept-9009: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=9009] throws
          java.net.SocketException: Too many open files (Accept failed)
                  at java.net.PlainSocketImpl.socketAccept(Native Method)
                  at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
                  at java.net.ServerSocket.implAccept(ServerSocket.java:545)
                  at java.net.ServerSocket.accept(ServerSocket.java:513)
                  at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:405)
                  at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:377)
                  at java.lang.Thread.run(Thread.java:748)
          ```

          We looked into the tomcat process's open FDs using lsof and found [fonts/verdana.ttf](https://github.com/jenkinsci/embeddable-build-status-plugin/blob/240917ee3884de43faaab6f12453f52830c12b17/src/main/java/org/jenkinsci/plugins/badge/StatusImage.java#L184) is the thing getting accessed so much

          ```
          # lsof -p $(pgrep java) | awk '{ print $9 }' | sort | uniq -c | sort -n | tail
                2 /usr/java/jre1.8.0_202/lib/ext/sunjce_provider.jar
                2 /usr/java/jre1.8.0_202/lib/ext/sunpkcs11.jar
                2 /usr/java/jre1.8.0_202/lib/jce.jar
                2 /usr/java/jre1.8.0_202/lib/jsse.jar
                2 /usr/java/jre1.8.0_202/lib/resources.jar
                2 /usr/java/jre1.8.0_202/lib/rt.jar
                5 /dev/urandom
                8 anon_inode
               16 pipe
              836 /srv/jenkins/plugins/embeddable-build-status/fonts/verdana.ttf
          ```

          This is a lower example, we did see up to 6200 open verdana.ttf file descriptors out of the 8192 tomcat & system wide limits.

          To reproduce this I used a test jenkins system which didn't have any of the verdana open FDs, wrote a quick loop to hammer a buildStatus link and watched the tomcat total file descriptors shoot up. I downgraded the plugin 1.9 and confirmed it didn't have this problem, so this is new behavior in 2.0, probably due to the custom text features.

          Jenkins hitting the open file descriptor limits is not always fatal in low doses but when it does hit 8k it often seems to leave the system unresponsive for extended periods of time. These open FDs do get cleaned up when garbage collection runs, however it seems that in our use case it is not always garbage collecting the FDs prior to hitting the limit of 8k.

          We runs lots of jenkins systems and have seen this problem on more than one of them. I bring this up just to point out we don't have one massive monolithic Jenkins, we've scaled horizontally to try to spread jobs out across systems yet we're still encountering this problem on at least 2 of our production instances.
          We have some users with many Jenkins Jobs they monitor using a custom HTML dashboard full of 100+ embedded build status links, refreshing once a minute. After upgrading from Build status 1.9 to 2.0 we started seeing tomcat/jenkins running into a lot of "too many open files" errors like the following:

          {code:java}
          Mar 19, 2019 6:25:55 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor run
          SEVERE: Socket accept failed
          java.io.IOException: Too many open files
                  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
                  at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
                  at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
                  at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:482)
                  at java.lang.Thread.run(Thread.java:748)

          Mar 19, 2019 6:25:56 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop
          WARNING: RMI TCP Accept-9009: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=9009] throws
          java.net.SocketException: Too many open files (Accept failed)
                  at java.net.PlainSocketImpl.socketAccept(Native Method)
                  at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
                  at java.net.ServerSocket.implAccept(ServerSocket.java:545)
                  at java.net.ServerSocket.accept(ServerSocket.java:513)
                  at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:405)
                  at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:377)
                  at java.lang.Thread.run(Thread.java:748)
          {code}



          We looked into the tomcat process's open FDs using lsof and found [https://github.com/jenkinsci/embeddable-build-status-plugin/blob/240917ee3884de43faaab6f12453f52830c12b17/src/main/java/org/jenkinsci/plugins/badge/StatusImage.java#L184|http://example.com] is the thing getting accessed so much


          {code:java}
          # lsof -p $(pgrep java) | awk '{ print $9 }' | sort | uniq -c | sort -n | tail
                2 /usr/java/jre1.8.0_202/lib/ext/sunjce_provider.jar
                2 /usr/java/jre1.8.0_202/lib/ext/sunpkcs11.jar
                2 /usr/java/jre1.8.0_202/lib/jce.jar
                2 /usr/java/jre1.8.0_202/lib/jsse.jar
                2 /usr/java/jre1.8.0_202/lib/resources.jar
                2 /usr/java/jre1.8.0_202/lib/rt.jar
                5 /dev/urandom
                8 anon_inode
               16 pipe
              836 /srv/jenkins/plugins/embeddable-build-status/fonts/verdana.ttf
          {code}

          This is a lower example, we did see up to 6200 open verdana.ttf file descriptors out of the 8192 tomcat & system wide limits.

          To reproduce this I used a test jenkins system which didn't have any of the verdana open FDs, wrote a quick loop to hammer a buildStatus link and watched the tomcat total file descriptors shoot up. I downgraded the plugin 1.9 and confirmed it didn't have this problem, so this is new behavior in 2.0, probably due to the custom text features.

          Jenkins hitting the open file descriptor limits is not always fatal in low doses but when it does hit 8k it often seems to leave the system unresponsive for extended periods of time. These open FDs do get cleaned up when garbage collection runs, however it seems that in our use case it is not always garbage collecting the FDs prior to hitting the limit of 8k.

          We runs lots of jenkins systems and have seen this problem on more than one of them. I bring this up just to point out we don't have one massive monolithic Jenkins, we've scaled horizontally to try to spread jobs out across systems yet we're still encountering this problem on at least 2 of our production instances.
          diginc Adam BH made changes -
          Description We have some users with many Jenkins Jobs they monitor using a custom HTML dashboard full of 100+ embedded build status links, refreshing once a minute. After upgrading from Build status 1.9 to 2.0 we started seeing tomcat/jenkins running into a lot of "too many open files" errors like the following:

          {code:java}
          Mar 19, 2019 6:25:55 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor run
          SEVERE: Socket accept failed
          java.io.IOException: Too many open files
                  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
                  at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
                  at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
                  at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:482)
                  at java.lang.Thread.run(Thread.java:748)

          Mar 19, 2019 6:25:56 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop
          WARNING: RMI TCP Accept-9009: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=9009] throws
          java.net.SocketException: Too many open files (Accept failed)
                  at java.net.PlainSocketImpl.socketAccept(Native Method)
                  at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
                  at java.net.ServerSocket.implAccept(ServerSocket.java:545)
                  at java.net.ServerSocket.accept(ServerSocket.java:513)
                  at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:405)
                  at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:377)
                  at java.lang.Thread.run(Thread.java:748)
          {code}



          We looked into the tomcat process's open FDs using lsof and found [https://github.com/jenkinsci/embeddable-build-status-plugin/blob/240917ee3884de43faaab6f12453f52830c12b17/src/main/java/org/jenkinsci/plugins/badge/StatusImage.java#L184|http://example.com] is the thing getting accessed so much


          {code:java}
          # lsof -p $(pgrep java) | awk '{ print $9 }' | sort | uniq -c | sort -n | tail
                2 /usr/java/jre1.8.0_202/lib/ext/sunjce_provider.jar
                2 /usr/java/jre1.8.0_202/lib/ext/sunpkcs11.jar
                2 /usr/java/jre1.8.0_202/lib/jce.jar
                2 /usr/java/jre1.8.0_202/lib/jsse.jar
                2 /usr/java/jre1.8.0_202/lib/resources.jar
                2 /usr/java/jre1.8.0_202/lib/rt.jar
                5 /dev/urandom
                8 anon_inode
               16 pipe
              836 /srv/jenkins/plugins/embeddable-build-status/fonts/verdana.ttf
          {code}

          This is a lower example, we did see up to 6200 open verdana.ttf file descriptors out of the 8192 tomcat & system wide limits.

          To reproduce this I used a test jenkins system which didn't have any of the verdana open FDs, wrote a quick loop to hammer a buildStatus link and watched the tomcat total file descriptors shoot up. I downgraded the plugin 1.9 and confirmed it didn't have this problem, so this is new behavior in 2.0, probably due to the custom text features.

          Jenkins hitting the open file descriptor limits is not always fatal in low doses but when it does hit 8k it often seems to leave the system unresponsive for extended periods of time. These open FDs do get cleaned up when garbage collection runs, however it seems that in our use case it is not always garbage collecting the FDs prior to hitting the limit of 8k.

          We runs lots of jenkins systems and have seen this problem on more than one of them. I bring this up just to point out we don't have one massive monolithic Jenkins, we've scaled horizontally to try to spread jobs out across systems yet we're still encountering this problem on at least 2 of our production instances.
          We have some users with many Jenkins Jobs they monitor using a custom HTML dashboard full of 100+ embedded build status links, refreshing once a minute. After upgrading from Build status 1.9 to 2.0 we started seeing tomcat/jenkins running into a lot of "too many open files" errors like the following:

          {code:java}
          Mar 19, 2019 6:25:55 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor run
          SEVERE: Socket accept failed
          java.io.IOException: Too many open files
                  at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
                  at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
                  at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
                  at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:482)
                  at java.lang.Thread.run(Thread.java:748)

          Mar 19, 2019 6:25:56 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop
          WARNING: RMI TCP Accept-9009: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=9009] throws
          java.net.SocketException: Too many open files (Accept failed)
                  at java.net.PlainSocketImpl.socketAccept(Native Method)
                  at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
                  at java.net.ServerSocket.implAccept(ServerSocket.java:545)
                  at java.net.ServerSocket.accept(ServerSocket.java:513)
                  at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:405)
                  at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:377)
                  at java.lang.Thread.run(Thread.java:748)
          {code}



          We looked into the tomcat process's open FDs using lsof and found [/srv/jenkins/plugins/embeddable-build-status/fonts/verdana.ttf|https://github.com/jenkinsci/embeddable-build-status-plugin/blob/240917ee3884de43faaab6f12453f52830c12b17/src/main/java/org/jenkinsci/plugins/badge/StatusImage.java#L184] is the thing getting accessed so much


          {code:java}
          # lsof -p $(pgrep java) | awk '{ print $9 }' | sort | uniq -c | sort -n | tail
                2 /usr/java/jre1.8.0_202/lib/ext/sunjce_provider.jar
                2 /usr/java/jre1.8.0_202/lib/ext/sunpkcs11.jar
                2 /usr/java/jre1.8.0_202/lib/jce.jar
                2 /usr/java/jre1.8.0_202/lib/jsse.jar
                2 /usr/java/jre1.8.0_202/lib/resources.jar
                2 /usr/java/jre1.8.0_202/lib/rt.jar
                5 /dev/urandom
                8 anon_inode
               16 pipe
              836 /srv/jenkins/plugins/embeddable-build-status/fonts/verdana.ttf
          {code}

          This is a lower example, we did see up to 6200 open verdana.ttf file descriptors out of the 8192 tomcat & system wide limits.

          To reproduce this I used a test jenkins system which didn't have any of the verdana open FDs, wrote a quick loop to hammer a buildStatus link and watched the tomcat total file descriptors shoot up. I downgraded the plugin 1.9 and confirmed it didn't have this problem, so this is new behavior in 2.0, probably due to the custom text features.

          Jenkins hitting the open file descriptor limits is not always fatal in low doses but when it does hit 8k it often seems to leave the system unresponsive for extended periods of time. These open FDs do get cleaned up when garbage collection runs, however it seems that in our use case it is not always garbage collecting the FDs prior to hitting the limit of 8k.

          We runs lots of jenkins systems and have seen this problem on more than one of them. I bring this up just to point out we don't have one massive monolithic Jenkins, we've scaled horizontally to try to spread jobs out across systems yet we're still encountering this problem on at least 2 of our production instances.
          oleg_nenashev Oleg Nenashev added a comment - I believe it comes from here: https://github.com/jenkinsci/embeddable-build-status-plugin/blob/ac894bdf0953c82bbd193005f9e9cff121b77ae2/src/main/java/org/jenkinsci/plugins/badge/StatusImage.java#L182-L198  . Indeed streams are not handled correctly there when baseUrl fomes from a file .
          diginc Adam BH added a comment -

          This has become higher priority for us so I made the fix required and tested a SNAPSHOT hpi

          https://github.com/jenkinsci/embeddable-build-status-plugin/pull/42

          diginc Adam BH added a comment - This has become higher priority for us so I made the fix required and tested a SNAPSHOT hpi https://github.com/jenkinsci/embeddable-build-status-plugin/pull/42

          diginc Thank you. I just merged your pull request. A v2.0.1 bugfix release will be released any time soon.

          thomas_dee Thomas Döring added a comment - diginc Thank you. I just merged your pull request. A v2.0.1 bugfix release will be released any time soon.
          thomas_dee Thomas Döring made changes -
          Labels embeddable-build-status plugin embeddable-build-status plugin v2.0.1
          thomas_dee Thomas Döring made changes -
          Assignee Antonio Muñiz [ amuniz ] Thomas Döring [ thomas_dee ]
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Fixed but Unreleased [ 10203 ]

          Thanks to diginc and oleg_nenashev

          thomas_dee Thomas Döring added a comment - Thanks to diginc and oleg_nenashev
          thomas_dee Thomas Döring made changes -
          Released As v2.0.1
          Status Fixed but Unreleased [ 10203 ] Resolved [ 5 ]
          markewaite Mark Waite made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

          People

            thomas_dee Thomas Döring
            diginc Adam BH
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: