-
Bug
-
Resolution: Unresolved
-
Critical
-
Jenkins 2.19.1 LTS
Jenkins 2.19.2 LTS
Jenkins 2.26
Durable Task Plugin 1.12
I hate to create a general "core" bug, as I wish I could redirect this to the correct component. Unfortunately, I can not identify which component is hanging and why, so I do not know how to direct this problem.
This problem started about 2 weeks ago, as we have been adding new Pipeline builds to our build server. So it could be related to one of the pipeline plugins.
The behavior is the following:
- 1 to 2 times a day, all builds on all build slaves will hang. The console log of the build just stops moving forward, and stays stuck at the last line executed / last line returned.
- Once this occurs, attempting to stop a build fails. Clicking stop results in no change in the build status or console log output
- New builds will not start. They sit in the queue, but the slaves will not be started.
- The UI continues to function, so it is possible to view config, get threaddumps, etc.
The only resolution is to restart the Jenkins server.
We are using the vCenter plugin to dynamically start all build slaves. Though, we have been using this configuration for months, and the problem just started.
We have recreated this on both latest Jenkins level (2.26) and Jenkins LTS version 2.19.1
I am attaching a threaddump of the server at the time of one of these hangs.
I can provide any other information that might help in diagnosing this problem
- is duplicated by
-
JENKINS-22824 Jenkins freezes at startup on ensureLoad call.
-
- Resolved
-
- is related to
-
JENKINS-38834 Freestyle jobs hang in 2.19.1 on Windows 10 Nodes
-
- Resolved
-
-
JENKINS-36088 Use NIO rather than JNR whenever possible
-
- Resolved
-
-
JENKINS-19445 Jobs randomly stuck with "building remotely on slave-name" message
-
- Reopened
-
-
JENKINS-16070 Deadlock using Windows native calls
-
- Resolved
-
- links to
[JENKINS-39179] All builds hang, JNA load deadlock on Windows slave
Component/s | New: durable-task-plugin [ 18622 ] | |
Component/s | Original: core [ 15593 ] |
Environment |
Original:
Jenkins 2.19.1 LTS and Jenkins 2.26 |
New:
Jenkins 2.19.1 LTS and Jenkins 2.26 Durable Task Plugin 1.12 |
Summary | Original: All builds hang, Builds cannot be stopped, only restart solves | New: All builds hang, Builds cannot be stopped, hung FileMonitoringCleanup |
Summary | Original: All builds hang, Builds cannot be stopped, hung FileMonitoringCleanup | New: All builds hang, hung FileMonitoringTask.cleanup / get attributes on Windows 10 |
Component/s | New: core [ 15593 ] |
Component/s | Original: core [ 15593 ] |
I have been tracking this down via the /threadDump, to figure out what was hung. And it seems that the problem lies in the attempt to deleteRecursive within the FileMonitoringController on Windows 10.
Here is an example part of the thread dump that lead me to look there:
To summarize the steps that I took:
If you go look at the thread dump on the machine that was hung (before I logged in and killed it) – it looks like this:
That is very interesting – as we have not had this problem before, but just last week I updated these Windows 10 machines to the Windows 10 anniversary update.
That was right around the time that our "all builds hang" problem started.
I do think that this could be 2 problems: