-
Bug
-
Resolution: Unresolved
-
Minor
-
None
We recently noticed that our whole Jenkins instance would go completely unresponsive for dozens of minutes, meaning all jobs get frozen.
During these episodes, the Jenkins log is filled with the following messages
20240201 09:46:37.511+0000 [id={}}}{{{}172{}}}{{{}] INFO o.j.p.w.s.concurrent.Timeout#lambda$ping$0: Running CpsFlowExecution[Owner[<REDACTED>{}}}{{{}]] unresponsive for 12 min
{{}}
If we pause the threads and review the stacks, we can notice the following:
- All unresponsive threads are blocked waiting for the job queue (see Running CpsFlowExecution.txt)
- The thread Periodic Jenkins queue maintenance has the lock on the job queue (see AtmostOneTaskExecutor.txt)
- This thread is blocked waiting for a reply from the LDAP server
The culprit is probably the LDAP server, but it doesn't seems robust that LDAP queries don't have a configured timeout. According to Oracle's documentation, one could set the timeout using the property com.sun.jndi.ldap.connect.timeout.