50 + ssh ubuntu 12.04 slaves.
We can run for about 2 days and then the system basically becomes unresponsive and needs to be restarted.
Everything was going fine for many months at version 1.5.46 but once we went to 1.5.66 or 1.5.68 we begin to see too many open files exceptions until jenkins needed a restart. Attached are the stack traces. I didn't choose a specific component because I wasn't sure which component to log it against. I also don't see any existing open issues that match this. Hope opening this was valid.
I've attached a list of all files opened (using lsof) by jenkins as well as the jenkins log containing the stack traces.
I only included a couple of the stack traces because I have a feeling the output of lsof may be more illuminating.
Happy to provide anything more you need. This is one of my first defects so I hope I'm giving enough detail without it already being a known issue.
We did an experiment where we repeatedly ran the jenkins cli with the option to set a build description. We did this because our jenkins jobs utilize it heavily and we were suspicious it might be related. For every jenkins cli command executed generated another open file reference that doesn't appear to get closed.
It corresponds to entries in lsof like this:
java 1230 jenkins 3772u sock 0,7 0t0 53062 can't identify protocol
It doesn't look like this files are ever getting closed. If you look in that lsof file you'll see roughly 11,000 lines just like the one above.