-
Bug
-
Resolution: Unresolved
-
Critical
-
Jenkins 2.150
Windows 7
cygwin
-
Powered by SuggestiMate
After executing successfully the shell script the workers remain stuck for 10 minutes on the final "exit 0".
There hasn't been any other failure that I could find: all the jobs run exactly as planned, they just don't seem to exit.
The fact that the jobs remain stuck for exactly 600 seconds makes me think of a timeout of some sort.
Reverting to 2.138 fixed the issue, that's why I am marking it as a regression.
[JENKINS-55106] Build stuck on final "exit 0"
We are seeing this issue as well on version 2.150.1 running on Windows Server 2012 R2. Builds that took 4 minutes prior to the upgrade were taking 18 minutes afterward. We have reverted to version 2.138.3, which resolved the issue.
If there's information that I can provide to help pin this down, please let me know.
Reverting the jenkins version to 2.138.3 fixed the issue. Hope it is fixed in next Jenkins LTS version.
Thank you Sean.
Confirmed. The same issue with 2.150.1 on Windows Server 2003, JDK 8. As you see exactly 10 minutes before the finish:
19:59:42 D:\Jenkins\jobs\product\workspace>echo done 19:59:42 done 19:59:42 19:59:42 D:\Jenkins\jobs\product\workspace>exit 0 20:09:44 Finished: SUCCESS
We have reverted to the 2.138.2 version.
I can also confirmed this on Windows Server 2012 R2, JDK 8 on Jenkins 2.150.2. After the Build portion of the configuration has completed, there is a 10 minute delay before the Post-build Actions begin.
I reverted back to 2.138.4.
I tried upgrading my instance to 2.164 and I can still reproduce it. I'll revert again to 2.138 for the moment.
I'll lose access to this instance soon (~3 weeks) so if anyone needs me to try stuff, now's the time.
Added "-DSoftKillWaitSeconds=0 " in jenkins.xml before the -jar option. Now jobs execute normally with 2.150.2 version
Reference: https://stackoverflow.com/questions/54039226/jenkins-hangs-between-build-and-post-build/54072987#54072987
https://issues.jenkins-ci.org/browse/JENKINS-55422?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
Thank you very much for pointing this out, Guru. We tried this and are now running 2.150.2 without the delay.
Have a great day!
Still doesn't work properly, had to roll back from 2.164.1 LTS to 2.138.4 LTS. Workaround mentioned above doesn't work for me either, builds are getting stuck.
Workaround mentioned above doesn't work for me either
Are you sure you applied it correctly? Check the /systemInfo to see whether the system property is defined in Jenkins?
danielbeck yes, it's applied properly and still various builds are randomly getting stuck.
kredens While you're waiting for the build to finish, check what Jenkins is doing: https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump
Does this work for Jenkins slaves? This fixed our master instance, but it seems like a similar issue is present with jobs that run on slaves.
Adding `-DSoftKillWaitSeconds=0` to jenkins-slave.xml and restarting the service adds it to the command line, but it doesn't seem to have any effect. Any ideas?
We also experience this but only on slave machines, and adding -DSoftKillWaitSeconds=0 has no affect on slave nodes.
We also experience this issue with jobs that run on slave machines. Adding -DSoftKillWaitSeconds did not affect the issue.
Tried rolling back to Jenkins version 2.150.3, but the bug was still there.
Then, rolled back to Jenkins version 2.138.4, and the bug is now gone.
We will have to stay on 2.138.4 until this bug is resolved.
To clarify, are you setting the system property on agent processes? I.e. as additional launch arguments to java -jar agent.jar?
I only set it on the master launch process. From what I have read, it has no effect on slaves.
Right, Josh wrote that. Would still like explicit confirmation from someone affected that setting it doesn't work, including confirmation that it appears correctly on the URL /computer/name_here/systemInfo in the list of system properties, since it's easy to get the Java invocation wrong.
Here's the slave command line:
And from jenkins-slave.xml
I'm pretty confident that this invocation is correct, as it's copied from our master agent where this parameter is working fine.
Any progress with this? I understand that there is a workaround, but shouldn't the commit that broke it be looked at to at least see why it's broken?
We've had this problem for 6 months or more, and have been searching high and low for a solution, without finding this issue.
Just applied the workaround on one of our agents, and immediately cut down the build-time of one of our jobs by 25 minutes!!!!!!!!!!
I can't wait to see how much server time will be freed by this, but it looks like a LOT!
This issue is preventing me from upgrading my Jenkins, and the plugin to Jenkins version gap is getting harder and harder to deal with.
Is this issue going to be looked at? And has anyone had success with a workaround for a Jenkins instance that uses only slave machines?
Just wanted to add a "me too" to Andy Lin's comment.
I used to be very diligent about keeping my Jenkins and all the plugins up to date.
This bug, however, has everything stuck with what works using Jenkins 2.138.4.
loafloaf, what scenario are you encountering this under? I had a similar problem when using MS Visual Studio on a slave.
In my case the problem is that the slave waits for remote processes to close, and has a timeout of ~2 minutes per process. I found that I had parallel compiles enabled and 6 remote VS compile session on the slave. When it finished, those VS processes would not go away, and every 2 minutes jenknis would kill one of them.
I learned that MS causes compile processes to stick around once they are started. The idea being that when a new compile is needed it can grab one of the idle processes. However, in my case Jenkins doesn't need/want any more compiles and is stuck waiting for the VS processes to go away.
There is a flag that can be used at the command line that informs VS to not keep the processes alive. Details for this can be found in a similar issue I logged JENKINS-59400
rocha_stratovan, I do use MS Visual Studio on some of my slave machines, but I don't think it does parallel compilation. I'll be sure to try out what you suggested. Do you experience the issue if you don't do parallel compiles?
I have Mac slave machines for the other half. The solution you had might apply in some way so I'll have to investigate if xcodebuild also does something similar with lingering processes. Thanks!
loafloaf, I didn't seem to notice it when I did simple compiles without parallel compilation. Although I honestly would expect there to be at least a 2 minute delay even if there is just one compile process. But I don't know.
Good luck.
Well, time passes, and some jobs still get stuck on the FINAL stage for about two minutes before finally letting go. How hard can it be to fix this?
danielbeck I'm not a developer on this project, neither am I using it by choice. Also - it used to work fine until someone changed something and can't be bothered to fix it.
We are seeing the same issue in regular builds and pull requests, build stucks on exit 0 for more than 5 minutes and reports the status.