-
Bug
-
Resolution: Fixed
-
Critical
-
Jenkins 1.553 or later on CentOS 6.3 with Oracle Java 7 JDK
-
Powered by SuggestiMate
Starting at version 1.553, Jenkins no longer seems to kill running processes after a build failure.
We have several jobs that start a Tomcat instance and run various end-to-end-tests; if the build fails Jenkins doesn't execute the shutdown scripts and we rely on the process killer to clean up the Tomcat instance.
This can be duplicated more easily by creating a free-form job and adding two shell scripts, the first that starts a simple command such as "nohup sleep 10000 &" and the second "/bin/false". After the job exits the sleep process is still running. Prior to version 1.553, it would be killed.
There are no log messages to indicate a problem.
I can reproduce this on CentOS 6, Red Hat EL 5 and Red Hat EL 4, both with a job running on the local master, and on a slave node. Also tested with both 32-bit and 64-bit Oracle Java 7 JDKs.
We're using the built-in Winstone container.
- depends on
-
JENKINS-26048 Jenkins no longer cleaning up child processes when build stopped - as of 1.587
-
- Reopened
-
- is related to
-
JENKINS-28968 Aborting builds does not kill surefire sub-process
-
- Reopened
-
[JENKINS-22641] Jenkins no longer kills running processes after job fails
An update: apparently it doesn't matter whether the job failed or not and this is not necessary to reproduce it. The problem is only critical to us for failed jobs, since a failed job doesn't run the final step which shuts down the running servlet container.
I can confirm: this is affecting me too, with succeeding and failing jobs. One of the job has been relying on the process reaping mechanism to clean up (started a redis-server daemon before the build and relied on Jenkins killing it afterwards). That was working fine on Jenkins 1.554, and stopped working with upgrade to 1.558; this issue still occurs on 1.559. Ubuntu 12.04, x86_64, Oracle JDK 7.
I was able to reproduce the issue in 1.553, and 1.552 wasn't affected. So...
$ git bisect bad 4c32649db54a4a4f6162793179143fb12ae9e521 is the first bad commit commit 4c32649db54a4a4f6162793179143fb12ae9e521 Author: christ66 <...> Date: Thu Feb 20 21:50:14 2014 -0800 Upgrade to commons-io 2.4 In order to maintain backwards compatibility we need to keep IOUtils the same as in commons-io 1.4. This code is backwards compatible, however most of the methods have been deprecated and should instead use the org.apache.commons.io.IOUtils version instead. :040000 040000 d42c82833d3b7661a5fdd449e8b436817079b889 4c49b27df1a7adcbc2c58ea9d15535936916fdaa M core :100644 100644 3c88f96220b54a8690cea0d272cd4447bfa98ffb cebccfcc1a20211b732fe96aaf90725ca0194fc8 M pom.xml
$ git bisect log
git bisect start
# bad: [1f28286ee683470a0b0a15b9425046292cc7d2a5] [maven-release-plugin] prepare release jenkins-1.553
git bisect bad 1f28286ee683470a0b0a15b9425046292cc7d2a5
# good: [0738314cb054f287cb1c66232672a57140987f2b] [maven-release-plugin] prepare release jenkins-1.552
git bisect good 0738314cb054f287cb1c66232672a57140987f2b
# good: [b09f828706c9a16b8acc4158f2161117f9359047] [JENKINS-21159] Noting #663 merge.
git bisect good b09f828706c9a16b8acc4158f2161117f9359047
# bad: [5d91ed77f7402133fd7c9e77b1a6676c6695ec5a] Merge branch 'JENKINS-20965' of github.com:christ66/jenkins
git bisect bad 5d91ed77f7402133fd7c9e77b1a6676c6695ec5a
# bad: [8482756860d8cb838e20f7007c241aa77f2524b7] typo in changelog
git bisect bad 8482756860d8cb838e20f7007c241aa77f2524b7
# good: [1b2ac716ba14945fd0654376d44a35feef62d531] Slave started from JNLP can now install itself as systemd service.
git bisect good 1b2ac716ba14945fd0654376d44a35feef62d531
# bad: [3b472e5b867a4664f2cd40633cb197d09f470a16] Merge pull request #1135 from christ66/commons-io-up
git bisect bad 3b472e5b867a4664f2cd40633cb197d09f470a16
How I reproduced it:
Built and ran Jenkins on OS X master: mvn -DskipTests=true clean verify
Added an SSH slave (Linux CentOS 6)
Created a freestyle job. Two shell build steps: one nohup sleep 10000 &, the other /bin/false.
After every build, ps aux | grep sleep on the slave. If it's there, kill it and git bisect bad on the master, else git bisect good. Repeat.
I can report the same issue. The problem does not occured in version 1.556. But after the upgrade to version 1.563, the issue occurs (both succeeding and failing jobs)
I'm trying to find out the cause for the different versions reported here. It would help if you report all of the following in an unambiguous format:
- last known good Jenkins version
- first known broken Jenkins version
- OS of the master and whether this issue appears with jobs executed there
- OS of all relevant slaves (i.e. slaves the issue appears, or is known not to appear on), and how they're started (e.g. JNLP, SSH slaves)
- Whether anything is reported in the logs
- The simplest reproducible job the issue appears with. does two shell build steps, one nohup sleep 10000 &, the other /bin/false do the job? If so, on which nodes? More/fewer than your real issue?
We may be experiencing the same bug in out jeknins setup - although I have seen orphaned processes left behind by jenkins before, but now it's become a plague.
- last known good Jenkins version:
no idea - first known broken Jenkins version:
1.554.1 - OS of the master and whether this issue appears with jobs executed there:
Ubuntu 12.04.4 LTS, does not run jobs - OS of all relevant slaves (i.e. slaves the issue appears, or is known not to appear on), and how they're started (e.g. JNLP, SSH slaves)
Ubuntu 10.04, 12.04, 13.10; OS X 10.7 (SSH slaves) - Whether anything is reported in the logs
could not find anything relevant
Note that I have the feeling this only happens to aborted jobs, but I can't confirm it ATM.
I was able to reproduce this issue locally and I am working on a fix.
I created a pull request: https://github.com/jenkinsci/jenkins/pull/1322 to resolve this issue.
This looks to be a regression in commons-io library. I logged IO-453 to track the changes for the commons-io library.
schristou: I merged your fix into a private build of 1.554.3 and verified this fixes the issue as described in my earlier comment from 09/May/14 11:42 PM. Unpatched 1.554.3 doesn't kill 'sleep' on the slave, patched does.
Code changed in jenkins
User: christ66
Path:
core/src/main/java/hudson/util/ProcessTree.java
http://jenkins-ci.org/commit/jenkins/410f06adfa798d29118c77ed01c5c02fc207cb02
Log:
[FIXED JENKINS-22641] FileUtils.readFileToByteArray behavior has changed in the latest version of commons-io.
Code changed in jenkins
User: christ66
Path:
test/src/test/java/hudson/util/ProcessTreeKillerTest.java
http://jenkins-ci.org/commit/jenkins/0d4c0cb6b274bfa18c810fcf761e3ad8b27ceb34
Log:
Add test unit for JENKINS-22641
Code changed in jenkins
User: christ66
Path:
core/src/main/java/hudson/util/ProcessTree.java
http://jenkins-ci.org/commit/jenkins/9ac68b92dcc0bc093c7983cdb0aab72342165df4
Log:
Merge branch 'JENKINS-22641' of github.com:christ66/jenkins into JENKINS-22641
Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/util/ProcessTree.java
test/src/test/java/hudson/util/ProcessTreeKillerTest.java
http://jenkins-ci.org/commit/jenkins/f03e67a49bf01343635c60e894b9617483a03011
Log:
Merge branch 'JENKINS-22641' of github.com:christ66/jenkins
Code changed in jenkins
User: Jesse Glick
Path:
changelog.html
http://jenkins-ci.org/commit/jenkins/14a147eda8eebfd78c1cc53e1d59232d720de6b7
Log:
JENKINS-22641 Noting merge of #1322.
Compare: https://github.com/jenkinsci/jenkins/compare/a8756c6c0ddc...14a147eda8ee
Integrated in jenkins_main_trunk #3538
[FIXED JENKINS-22641] FileUtils.readFileToByteArray behavior has changed in the latest version of commons-io. (Revision 410f06adfa798d29118c77ed01c5c02fc207cb02)
Add test unit for JENKINS-22641 (Revision 0d4c0cb6b274bfa18c810fcf761e3ad8b27ceb34)
JENKINS-22641 Noting merge of #1322. (Revision 14a147eda8eebfd78c1cc53e1d59232d720de6b7)
Result = SUCCESS
schristou88 : 410f06adfa798d29118c77ed01c5c02fc207cb02
Files :
- core/src/main/java/hudson/util/ProcessTree.java
schristou88 : 0d4c0cb6b274bfa18c810fcf761e3ad8b27ceb34
Files :
- test/src/test/java/hudson/util/ProcessTreeKillerTest.java
Jesse Glick : 14a147eda8eebfd78c1cc53e1d59232d720de6b7
Files :
- changelog.html
Code changed in jenkins
User: Jesse Glick
Path:
test/src/test/java/hudson/util/ProcessTreeKillerTest.java
http://jenkins-ci.org/commit/jenkins/0f3574bbc109935e37bbd473c2a9e7b7625e3ced
Log:
Merge branch 'JENKINS-22641' of github.com:christ66/jenkins
Compare: https://github.com/jenkinsci/jenkins/compare/37c5bc2fd861...0f3574bbc109
Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
aggregator/src/test/java/org/jenkinsci/plugins/workflow/steps/durable_task/ShellStepTest.java
http://jenkins-ci.org/commit/workflow-plugin/e74fd8349c15568354fc3934010713503d22761f
Log:
added a test bug this is blocked by JENKINS-22641
Code changed in jenkins
User: christ66
Path:
test/src/test/java/hudson/util/ProcessTreeKillerTest.java
http://jenkins-ci.org/commit/jenkins/86c44d8f5fa7d98246bda76d9fb22fd60a4f8530
Log:
Add test unit for JENKINS-22641
(cherry picked from commit 0d4c0cb6b274bfa18c810fcf761e3ad8b27ceb34)
This issue has returned as of version 1.587 (through current version - 1.593)
It was fixed in 1.565.3 through 1.586
Same behavior as original bug, leaves processes running when job is killed.
dbogardus then your issue is probably a distinct bug with a similar symptom but potentially distinct preconditions and root cause. Better to file it as a new ticket, with any steps to reproduce you can muster, and mark it as “blocking” this one.
Dilip M: That's probably JENKINS-26048. Maven project type works differently from freestyle.
Code changed in jenkins
User: Jesse Glick
Path:
pom.xml
src/test/java/org/jenkinsci/plugins/durabletask/BourneShellScriptTest.java
http://jenkins-ci.org/commit/durable-task-plugin/fc48f447763cb69d6ebf7c67e54e476d9e903bbd
Log:
Added test for stop.
JENKINS-22641 means that this does not work in 1.554.3, so need to bump up the dependency to 1.565.3.
I can confirm this issue with 0.558 on a Debian 7.4 and Ubuntu 12.04 with Open/Oracle 6/7 JDKs, on master/slave.
And I can also confirm that downgrading to 0.552 solved it (didn't have the time to do a "bisect" on intermediary versions).
Thanks for reporting the issue! I thought I'm misunderstanding the processtreekiller feature