Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22641

Jenkins no longer kills running processes after job fails

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • core
    • Jenkins 1.553 or later on CentOS 6.3 with Oracle Java 7 JDK

      Starting at version 1.553, Jenkins no longer seems to kill running processes after a build failure.

      We have several jobs that start a Tomcat instance and run various end-to-end-tests; if the build fails Jenkins doesn't execute the shutdown scripts and we rely on the process killer to clean up the Tomcat instance.

      This can be duplicated more easily by creating a free-form job and adding two shell scripts, the first that starts a simple command such as "nohup sleep 10000 &" and the second "/bin/false". After the job exits the sleep process is still running. Prior to version 1.553, it would be killed.

      There are no log messages to indicate a problem.

      I can reproduce this on CentOS 6, Red Hat EL 5 and Red Hat EL 4, both with a job running on the local master, and on a slave node. Also tested with both 32-bit and 64-bit Oracle Java 7 JDKs.

      We're using the built-in Winstone container.

          [JENKINS-22641] Jenkins no longer kills running processes after job fails

          Todd Perry created issue -

          I can confirm this issue with 0.558 on a Debian 7.4 and Ubuntu 12.04 with Open/Oracle 6/7 JDKs, on master/slave.

          And I can also confirm that downgrading to 0.552 solved it (didn't have the time to do a "bisect" on intermediary versions).

          Thanks for reporting the issue! I thought I'm misunderstanding the processtreekiller feature

          Andrei Neculau added a comment - I can confirm this issue with 0.558 on a Debian 7.4 and Ubuntu 12.04 with Open/Oracle 6/7 JDKs, on master/slave. And I can also confirm that downgrading to 0.552 solved it (didn't have the time to do a "bisect" on intermediary versions). Thanks for reporting the issue! I thought I'm misunderstanding the processtreekiller feature

          Todd Perry added a comment -

          An update: apparently it doesn't matter whether the job failed or not and this is not necessary to reproduce it. The problem is only critical to us for failed jobs, since a failed job doesn't run the final step which shuts down the running servlet container.

          Todd Perry added a comment - An update: apparently it doesn't matter whether the job failed or not and this is not necessary to reproduce it. The problem is only critical to us for failed jobs, since a failed job doesn't run the final step which shuts down the running servlet container.

          Maciej Pasternacki added a comment - - edited

          I can confirm: this is affecting me too, with succeeding and failing jobs. One of the job has been relying on the process reaping mechanism to clean up (started a redis-server daemon before the build and relied on Jenkins killing it afterwards). That was working fine on Jenkins 1.554, and stopped working with upgrade to 1.558; this issue still occurs on 1.559. Ubuntu 12.04, x86_64, Oracle JDK 7.

          Maciej Pasternacki added a comment - - edited I can confirm: this is affecting me too, with succeeding and failing jobs. One of the job has been relying on the process reaping mechanism to clean up (started a redis-server daemon before the build and relied on Jenkins killing it afterwards). That was working fine on Jenkins 1.554, and stopped working with upgrade to 1.558; this issue still occurs on 1.559. Ubuntu 12.04, x86_64, Oracle JDK 7.

          Daniel Beck added a comment - - edited

          I was able to reproduce the issue in 1.553, and 1.552 wasn't affected. So...

          $ git bisect bad
          4c32649db54a4a4f6162793179143fb12ae9e521 is the first bad commit
          commit 4c32649db54a4a4f6162793179143fb12ae9e521
          Author: christ66 <...>
          Date:   Thu Feb 20 21:50:14 2014 -0800
          
              Upgrade to commons-io 2.4
              
              In order to maintain backwards compatibility we need to keep IOUtils the
              same as in commons-io 1.4. This code is backwards compatible, however
              most of the methods have been deprecated and should instead use the
              org.apache.commons.io.IOUtils version instead.
          
          :040000 040000 d42c82833d3b7661a5fdd449e8b436817079b889 4c49b27df1a7adcbc2c58ea9d15535936916fdaa M	core
          :100644 100644 3c88f96220b54a8690cea0d272cd4447bfa98ffb cebccfcc1a20211b732fe96aaf90725ca0194fc8 M	pom.xml
          $ git bisect log
          git bisect start
          # bad: [1f28286ee683470a0b0a15b9425046292cc7d2a5] [maven-release-plugin] prepare release jenkins-1.553
          git bisect bad 1f28286ee683470a0b0a15b9425046292cc7d2a5
          # good: [0738314cb054f287cb1c66232672a57140987f2b] [maven-release-plugin] prepare release jenkins-1.552
          git bisect good 0738314cb054f287cb1c66232672a57140987f2b
          # good: [b09f828706c9a16b8acc4158f2161117f9359047] [JENKINS-21159] Noting #663 merge.
          git bisect good b09f828706c9a16b8acc4158f2161117f9359047
          # bad: [5d91ed77f7402133fd7c9e77b1a6676c6695ec5a] Merge branch 'JENKINS-20965' of github.com:christ66/jenkins
          git bisect bad 5d91ed77f7402133fd7c9e77b1a6676c6695ec5a
          # bad: [8482756860d8cb838e20f7007c241aa77f2524b7] typo in changelog
          git bisect bad 8482756860d8cb838e20f7007c241aa77f2524b7
          # good: [1b2ac716ba14945fd0654376d44a35feef62d531] Slave started from JNLP can now install itself as systemd service.
          git bisect good 1b2ac716ba14945fd0654376d44a35feef62d531
          # bad: [3b472e5b867a4664f2cd40633cb197d09f470a16] Merge pull request #1135 from christ66/commons-io-up
          git bisect bad 3b472e5b867a4664f2cd40633cb197d09f470a16

          How I reproduced it:

          Built and ran Jenkins on OS X master: mvn -DskipTests=true clean verify
          Added an SSH slave (Linux CentOS 6)
          Created a freestyle job. Two shell build steps: one nohup sleep 10000 &, the other /bin/false.
          After every build, ps aux | grep sleep on the slave. If it's there, kill it and git bisect bad on the master, else git bisect good. Repeat.

          Daniel Beck added a comment - - edited I was able to reproduce the issue in 1.553, and 1.552 wasn't affected. So... $ git bisect bad 4c32649db54a4a4f6162793179143fb12ae9e521 is the first bad commit commit 4c32649db54a4a4f6162793179143fb12ae9e521 Author: christ66 <...> Date: Thu Feb 20 21:50:14 2014 -0800 Upgrade to commons-io 2.4 In order to maintain backwards compatibility we need to keep IOUtils the same as in commons-io 1.4. This code is backwards compatible, however most of the methods have been deprecated and should instead use the org.apache.commons.io.IOUtils version instead. :040000 040000 d42c82833d3b7661a5fdd449e8b436817079b889 4c49b27df1a7adcbc2c58ea9d15535936916fdaa M core :100644 100644 3c88f96220b54a8690cea0d272cd4447bfa98ffb cebccfcc1a20211b732fe96aaf90725ca0194fc8 M pom.xml $ git bisect log git bisect start # bad: [1f28286ee683470a0b0a15b9425046292cc7d2a5] [maven-release-plugin] prepare release jenkins-1.553 git bisect bad 1f28286ee683470a0b0a15b9425046292cc7d2a5 # good: [0738314cb054f287cb1c66232672a57140987f2b] [maven-release-plugin] prepare release jenkins-1.552 git bisect good 0738314cb054f287cb1c66232672a57140987f2b # good: [b09f828706c9a16b8acc4158f2161117f9359047] [JENKINS-21159] Noting #663 merge. git bisect good b09f828706c9a16b8acc4158f2161117f9359047 # bad: [5d91ed77f7402133fd7c9e77b1a6676c6695ec5a] Merge branch 'JENKINS-20965' of github.com:christ66/jenkins git bisect bad 5d91ed77f7402133fd7c9e77b1a6676c6695ec5a # bad: [8482756860d8cb838e20f7007c241aa77f2524b7] typo in changelog git bisect bad 8482756860d8cb838e20f7007c241aa77f2524b7 # good: [1b2ac716ba14945fd0654376d44a35feef62d531] Slave started from JNLP can now install itself as systemd service. git bisect good 1b2ac716ba14945fd0654376d44a35feef62d531 # bad: [3b472e5b867a4664f2cd40633cb197d09f470a16] Merge pull request #1135 from christ66/commons-io-up git bisect bad 3b472e5b867a4664f2cd40633cb197d09f470a16 How I reproduced it: Built and ran Jenkins on OS X master: mvn -DskipTests=true clean verify Added an SSH slave (Linux CentOS 6) Created a freestyle job. Two shell build steps: one nohup sleep 10000 & , the other /bin/false . After every build, ps aux | grep sleep on the slave. If it's there, kill it and git bisect bad on the master, else git bisect good . Repeat.
          Daniel Beck made changes -
          Labels New: lts-candidate
          Steven Christou made changes -
          Assignee New: Steven Christou [ schristou ]

          I can report the same issue. The problem does not occured in version 1.556. But after the upgrade to version 1.563, the issue occurs (both succeeding and failing jobs)

          Jan Řezníček added a comment - I can report the same issue. The problem does not occured in version 1.556. But after the upgrade to version 1.563, the issue occurs (both succeeding and failing jobs)
          Steven Christou made changes -
          Assignee Original: Steven Christou [ schristou ]

          Daniel Beck added a comment -

          I'm trying to find out the cause for the different versions reported here. It would help if you report all of the following in an unambiguous format:

          • last known good Jenkins version
          • first known broken Jenkins version
          • OS of the master and whether this issue appears with jobs executed there
          • OS of all relevant slaves (i.e. slaves the issue appears, or is known not to appear on), and how they're started (e.g. JNLP, SSH slaves)
          • Whether anything is reported in the logs
          • The simplest reproducible job the issue appears with. does two shell build steps, one nohup sleep 10000 &, the other /bin/false do the job? If so, on which nodes? More/fewer than your real issue?

          Daniel Beck added a comment - I'm trying to find out the cause for the different versions reported here. It would help if you report all of the following in an unambiguous format: last known good Jenkins version first known broken Jenkins version OS of the master and whether this issue appears with jobs executed there OS of all relevant slaves (i.e. slaves the issue appears, or is known not to appear on), and how they're started (e.g. JNLP, SSH slaves) Whether anything is reported in the logs The simplest reproducible job the issue appears with. does two shell build steps, one nohup sleep 10000 & , the other /bin/false do the job? If so, on which nodes? More/fewer than your real issue?

            schristou Steven Christou
            toadnik17 Todd Perry
            Votes:
            12 Vote for this issue
            Watchers:
            23 Start watching this issue

              Created:
              Updated:
              Resolved: