Details
-
Type:
Bug
-
Status: Resolved (View Workflow)
-
Priority:
Critical
-
Resolution: Duplicate
-
Component/s: ssh-agent-plugin
-
Labels:
-
Environment:
-
Similar Issues:
Description
After running for 2-3 days, jenkins jobs no longer launch.
The console outputs usually just say that fetching from git failed, but sometimes contain other unusual errors.
The system log for jenkins reports
java.lang.OutOfMemoryError: unable to create new native thread
I was able to get a heap dump but due to the potential inclusion of sensitive data cannot post it.
In VisualVM analysis of the heap dump, I noticed that there are almost 1000 instances of AgentServer and AgentServer$1. The threads don't show up in the thread monitor, but are still referenced somehow.
Unfortunately the parent references are numerous and hard to decipher. The proximate parent is the ThreadGroup.threads array in the main ThreadGroup instance. This seems unlikely to be the true root cause.
I also noticed about the same number of ThreadLocalMap instances, so the leak may be related to incorrect use of ThreadLocal.
Attached a screenshot of the AgentServer$1 instances in VisualVM, and the jenkins system log.
Please let me know if there is any other analysis I can provide.
I am entering this bug as blocker because I don't currently have a workaround. I am using jenkins in conjunction with an external php application that needs to post jobs to the jenkins build queue. Therefore, in order to workaround, I need to implement a controlled shutdown process and restart jenkins at a daily or semi-daily interval. This will ultimately require the calling application to retry, which is probably a good idea anyway, but is not yet implemented.
Attachments
Issue Links
- duplicates
-
JENKINS-27555 ssh-agent plugin leaking file descriptors leaving behind jenkinsXXXXXX.jnr socket files
-
- Resolved
-
I do generally see the Stopped line on builds, but I don't watch every build. I checked using a grep on files no more than 3 days old and found a small discrepancy of 20 stray "Started" lines:
[/var/lib/jenkins/jobs]$ find . -mtime -3 -type f > /tmp/recent_logs
[/var/lib/jenkins/jobs]$ grep -l '[ssh-agent] Started.' `cat /tmp/recent_logs` > /tmp/agent-started-logs
[/var/lib/jenkins/jobs]$ grep -l '[ssh-agent] Stopped.' `cat /tmp/recent_logs` > /tmp/agent-stopped-logs
[/var/lib/jenkins/jobs]$ wc -l /tmp/agent-st*
2578 /tmp/agent-started-logs
2558 /tmp/agent-stopped-logs
5136 total
Regarding use of SSH Agent, it is configured for all builds, since the git plugin fails to work in my environment if SSH Agent is not running. (I spent a few hours trying to debug this months ago, but don't really remember the details.) Most of the builds don't require an agent other than for the git plugin.