-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
jenkins 2.107.3
We have seen a couple of jobs which appear to have got executors stuck while starting the build.
For example one buildhas console output (build #428)
Started by upstream project "<folder>/<job1>" build number 181
originally caused by:
{{ Started by upstream project "<folder>/<job2>" build number 852}}
{{ originally caused by:}}
{{ Started by upstream project "<folder>/<job3>" build number 1569}}
{{ originally caused by:}}
{{ Started by user <user>}}
{{ Replayed #1568}}
Resume disabled by user, switching to high-performance, low-durability mode.
[Pipeline] node
Still waiting to schedule task
Aborted by <user>
Click here to forcibly terminate running steps
Click here to forcibly kill entire build
Hard kill!
Finished: ABORTED
And according the build history of that job it has been aborted. However, in the Build Executor Status an executor has been allocated and it believe that it is still running. Trying to abort from Build Executor Status doesn't do anything.
This happened across 4 jobs which were run in parallel, triggered by another job and were started within 2 seconds of each other.
I could not see any blocked threads on the node (and we have tried restarting the Jenkins agent service) to no avail.
There are some blocked threads on the master, see threadDump.txt. It looks like this is some kind of deadlock condition with multiple jobs being triggered in a short space of time. (Can provide a fuller thread dump if useful but would take time to obfuscate).
Note that it does not look like it was killing the job that caused it, it looks like it was on build startup, before or when it was being allocated a node, and killing the job didn't help.