-
Bug
-
Resolution: Fixed
-
Major
-
Jenkins ver. 2.164.1
Embeddable Build Status 2.0
-
-
v2.0.1
We have some users with many Jenkins Jobs they monitor using a custom HTML dashboard full of 100+ embedded build status links, refreshing once a minute. After upgrading from Build status 1.9 to 2.0 we started seeing tomcat/jenkins running into a lot of "too many open files" errors like the following:
Mar 19, 2019 6:25:55 AM org.apache.tomcat.util.net.NioEndpoint$Acceptor run SEVERE: Socket accept failed java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:482) at java.lang.Thread.run(Thread.java:748) Mar 19, 2019 6:25:56 AM sun.rmi.transport.tcp.TCPTransport$AcceptLoop executeAcceptLoop WARNING: RMI TCP Accept-9009: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=9009] throws java.net.SocketException: Too many open files (Accept failed) at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:405) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:377) at java.lang.Thread.run(Thread.java:748)
We looked into the tomcat process's open FDs using lsof and found /srv/jenkins/plugins/embeddable-build-status/fonts/verdana.ttf is the thing getting accessed so much
# lsof -p $(pgrep java) | awk '{ print $9 }' | sort | uniq -c | sort -n | tail
2 /usr/java/jre1.8.0_202/lib/ext/sunjce_provider.jar
2 /usr/java/jre1.8.0_202/lib/ext/sunpkcs11.jar
2 /usr/java/jre1.8.0_202/lib/jce.jar
2 /usr/java/jre1.8.0_202/lib/jsse.jar
2 /usr/java/jre1.8.0_202/lib/resources.jar
2 /usr/java/jre1.8.0_202/lib/rt.jar
5 /dev/urandom
8 anon_inode
16 pipe
836 /srv/jenkins/plugins/embeddable-build-status/fonts/verdana.ttf
This is a lower example, we did see up to 6200 open verdana.ttf file descriptors out of the 8192 tomcat & system wide limits.
To reproduce this I used a test jenkins system which didn't have any of the verdana open FDs, wrote a quick loop to hammer a buildStatus link and watched the tomcat total file descriptors shoot up. I downgraded the plugin 1.9 and confirmed it didn't have this problem, so this is new behavior in 2.0, probably due to the custom text features.
Jenkins hitting the open file descriptor limits is not always fatal in low doses but when it does hit 8k it often seems to leave the system unresponsive for extended periods of time. These open FDs do get cleaned up when garbage collection runs, however it seems that in our use case it is not always garbage collecting the FDs prior to hitting the limit of 8k.
We runs lots of jenkins systems and have seen this problem on more than one of them. I bring this up just to point out we don't have one massive monolithic Jenkins, we've scaled horizontally to try to spread jobs out across systems yet we're still encountering this problem on at least 2 of our production instances.