Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28968

Aborting builds does not kill surefire sub-process

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core, maven-plugin
    • None

      I have a test that (unfortunately) occasionally hangs waiting on an external dependency. I recently noticed that if the test is aborted, the surefire instance remains running on the slave machine!

      This does not happen when running "sleep 50000" in a command window (i.e. this is killed with the job).

          [JENKINS-28968] Aborting builds does not kill surefire sub-process

          Daniel Beck added a comment -

          Is this a freestyle project with Maven build step, or a Maven project?

          Daniel Beck added a comment - Is this a freestyle project with Maven build step, or a Maven project?

          Ryan Desmond added a comment -

          I noticed it with a maven project.

          Ryan Desmond added a comment - I noticed it with a maven project.

          I remember I had a really similar issue in Bamboo maven builds, so not sure whether that is not connected with a maven+surefire itself.

          Radek Antoniuk added a comment - I remember I had a really similar issue in Bamboo maven builds, so not sure whether that is not connected with a maven+surefire itself.

          Ryan Desmond added a comment - - edited

          I worked out a baseline procedure to recreate the problem.

          Steps:
          1. Create a new Maven Project
          2. Add a shell pre-step to setup the workspace. It should have the contents of JENKINS-28968.txt
          3. Add the goal "test" to the build step
          4. Run and after the console prints "Now sleeping" Abort.

          Expected Results:
          1. Surefire is no longer running on the slave machine

          Actual Results:
          1. Surefire remains running

          $ ps aux | grep sure
          [user] 4222 4.7 0.1 6102248 31612 ? Sl 10:12 0:00 java -jar /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefirebooter449566822541979931.jar /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefire7684357083774633779tmp /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefire_01856690741005869733tmp

          Ryan Desmond added a comment - - edited I worked out a baseline procedure to recreate the problem. Steps: 1. Create a new Maven Project 2. Add a shell pre-step to setup the workspace. It should have the contents of JENKINS-28968.txt 3. Add the goal "test" to the build step 4. Run and after the console prints "Now sleeping" Abort. Expected Results: 1. Surefire is no longer running on the slave machine Actual Results: 1. Surefire remains running $ ps aux | grep sure [user] 4222 4.7 0.1 6102248 31612 ? Sl 10:12 0:00 java -jar /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefirebooter449566822541979931.jar /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefire7684357083774633779tmp /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefire_01856690741005869733tmp

          Ryan Desmond added a comment - - edited

          I did notice that it killed the immediate subprocess.

          Just before aborting:

          $ ps aux | grep sure
          [user] 4220 0.0 0.0 113120 1188 ? S 10:12 0:00 /bin/sh -c cd /home/ussuser/jenkins/workspace/sleeptest && /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51-2.4.5.5.el7.x86_64/jre/bin/java -jar /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefirebooter449566822541979931.jar /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefire7684357083774633779tmp /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefire_01856690741005869733tmp
          [user] 4222 13.0 0.1 6102248 31668 ? Sl 10:12 0:00 java -jar /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefirebooter449566822541979931.jar /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefire7684357083774633779tmp /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefire_01856690741005869733tmp

          After aborting:

          $ ps aux | grep sure
          [user] 4222 5.2 0.1 6102248 31612 ? Sl 10:12 0:00 java -jar /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefirebooter449566822541979931.jar /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefire7684357083774633779tmp /home/[user]/jenkins/workspace/sleeptest/target/surefire/surefire_01856690741005869733tmp

          Ryan Desmond added a comment - - edited I did notice that it killed the immediate subprocess. Just before aborting: $ ps aux | grep sure [user] 4220 0.0 0.0 113120 1188 ? S 10:12 0:00 /bin/sh -c cd /home/ussuser/jenkins/workspace/sleeptest && /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.51-2.4.5.5.el7.x86_64/jre/bin/java -jar /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefirebooter449566822541979931.jar /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefire7684357083774633779tmp /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefire_01856690741005869733tmp [user] 4222 13.0 0.1 6102248 31668 ? Sl 10:12 0:00 java -jar /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefirebooter449566822541979931.jar /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefire7684357083774633779tmp /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefire_01856690741005869733tmp After aborting: $ ps aux | grep sure [user] 4222 5.2 0.1 6102248 31612 ? Sl 10:12 0:00 java -jar /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefirebooter449566822541979931.jar /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefire7684357083774633779tmp /home/ [user] /jenkins/workspace/sleeptest/target/surefire/surefire_01856690741005869733tmp

          Ryan Desmond added a comment -

          I think this is a duplicate of JENKINS-26048. Closing and moving the conversation there.

          Ryan Desmond added a comment - I think this is a duplicate of JENKINS-26048 . Closing and moving the conversation there.

          Ryan Desmond added a comment -

          I think this is a duplicate, closing and moving the conversation there.

          Ryan Desmond added a comment - I think this is a duplicate, closing and moving the conversation there.

          jglick Are pipelines also using this code to cleanup runaway child processes? I just had a screen session escape with a VM from a pipeline job.

          Ing. Christoph Obexer added a comment - jglick Are pipelines also using this code to cleanup runaway child processes? I just had a screen session escape with a VM from a pipeline job.

          Oleg Nenashev added a comment -

          Reopening this ticket according to the discussion in JENKINS-26048 . We need to confirm that this is a generic issue AND not a Maven Project specifics.

          Don Bogardus added a comment - 11 hours ago - edited
          Verified this is still happening in latest jenkins version on CentOS 6. 
          For the test I'm running java tests with the failsafe plugin, which launches multiple jvms for concurrent execution and many phantomjs processes. 
          1.586 - When build is stopped, all child processes are stopped
          1.587 - When build is stopped, all child processes keep running
          2.66.1 - When build is stopped, all child processes keep running
          
          
          Daniel Beck added a comment - 11 hours ago
          Don Bogardus Are you able to reproduce this issue with a simpler environment? What are the complete steps to reproduce from scratch?
          When you write 2.66.1, do you mean 2.46.1?
          
          
          Don Bogardus added a comment - 10 hours ago - edited
          Daniel Beck , I used the latest weekly version - 2.66.1 . (It's just what came down from yum)
          I am able to reproduce it with the simple java/mvn/surefire scenario created by Ryan Desmond :
          https://issues.jenkins-ci.org/browse/JENKINS-28968?focusedCommentId=230600&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-230600
          With this small example, 1.586 cleans up the surefire process, and 2.66.1 does not. 

           

           

          Oleg Nenashev added a comment - Reopening this ticket according to the discussion in  JENKINS-26048 . We need to confirm that this is a generic issue AND not a Maven Project specifics. Don Bogardus added a comment - 11 hours ago - edited Verified this is still happening in latest jenkins version on CentOS 6.  For the test I'm running java tests with the failsafe plugin, which launches multiple jvms for concurrent execution and many phantomjs processes.  1.586 - When build is stopped, all child processes are stopped 1.587 - When build is stopped, all child processes keep running 2.66.1 - When build is stopped, all child processes keep running Daniel Beck added a comment - 11 hours ago Don Bogardus Are you able to reproduce this issue with a simpler environment? What are the complete steps to reproduce from scratch? When you write 2.66.1, do you mean 2.46.1? Don Bogardus added a comment - 10 hours ago - edited Daniel Beck , I used the latest weekly version - 2.66.1 . (It's just what came down from yum) I am able to reproduce it with the simple java/mvn/surefire scenario created by Ryan Desmond : https://issues.jenkins-ci.org/browse/JENKINS-28968?focusedCommentId=230600&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-230600 With this small example, 1.586 cleans up the surefire process, and 2.66.1 does not.     

          Oleg Nenashev added a comment -

          CC aheritier just in case he knows the cause

          Oleg Nenashev added a comment - CC aheritier just in case he knows the cause

          I think tibor17 worked on such issues in surefire recently

          On windows we have this bug https://issues.apache.org/jira/browse/SUREFIRE-1261 but I think I saw several issues related to surefirebooter too

          In Jenkins side, I never succeeded to reproduce/prove it but I'm ~ sure that this case happens when we have a slave disconnection (especially with old remoting)

           

          Arnaud Héritier added a comment - I think tibor17 worked on such issues in surefire recently On windows we have this bug https://issues.apache.org/jira/browse/SUREFIRE-1261  but I think I saw several issues related to surefirebooter too In Jenkins side, I never succeeded to reproduce/prove it but I'm ~ sure that this case happens when we have a slave disconnection (especially with old remoting)  

          Don Bogardus added a comment - - edited

          Just to make sure this is part of the discussion. The problem has come and gone in previous jenkins versions. I installed and tested many versions of jenkins - 

          First noticed in 1.553 

          Fixed in version 1.565.2 (Bug - https://issues.jenkins-ci.org/browse/JENKINS-22641)

          Came back in 1.587, and is still present in weekly 2.66.1 

          (Was tested in on master and slaves, same results)

          All of these tests were for the maven/surefire scenario. I tried a few things to reproduce the bug outside of this scenario and was not able to. 

           

          Don Bogardus added a comment - - edited Just to make sure this is part of the discussion. The problem has come and gone in previous jenkins versions. I installed and tested many versions of jenkins -  First noticed in 1.553  Fixed in version 1.565.2 (Bug - https://issues.jenkins-ci.org/browse/JENKINS-22641 ) Came back in 1.587, and is still present in weekly 2.66.1  (Was tested in on master and slaves, same results) All of these tests were for the maven/surefire scenario. I tried a few things to reproduce the bug outside of this scenario and was not able to.   

          jchatham added a comment -

          This issue is probably related to JENKINS-28125, and I have encountered similar problems with non-surefire sub-processes not dying when a build is stopped.

          jchatham added a comment - This issue is probably related to JENKINS-28125 , and I have encountered similar problems with non-surefire sub-processes not dying when a build is stopped.

          boris ivan added a comment -

          Hoping this can get some attention. We have an extensive test suite that does some destructive things regarding the initialization of a large test  topology. If I realize that I need to abort the run, it would be great if it really aborted. Instead, Jenkins reports it as aborted, but the entire suite is still running in the background, and I need to scramble to remember the IP address and credentials for the slave agent, so I can remote desktop to it, launch task manager, and kill all java processes.

          This bug has existed for years.

          boris ivan added a comment - Hoping this can get some attention. We have an extensive test suite that does some destructive things regarding the initialization of a large test  topology. If I realize that I need to abort the run, it would be great if it really aborted. Instead, Jenkins reports it as aborted, but the entire suite is still running in the background, and I need to scramble to remember the IP address and credentials for the slave agent, so I can remote desktop to it, launch task manager, and kill all java processes. This bug has existed for years.

          Oleg Nenashev added a comment -

          borisivan Which Java do you use on the agent? If you use a 32bit Java on the 64bit platform, this is an expected behavior. There are other cases like Cygwin binaries when it is an expected behavior

          Oleg Nenashev added a comment - borisivan Which Java do you use on the agent? If you use a 32bit Java on the 64bit platform, this is an expected behavior. There are other cases like Cygwin binaries when it is an expected behavior

          boris ivan added a comment -

          It's 64 bit Windows (have seen this on Windows 7, Windows 10), and 64 bit Java being executed in a 64 bit powershell window, to load slave.jar from the command line.

          As far as the maven job goes, I think that starts with a 64 bit version of java too, but will try and make sure. But as far as loading the jenkins slave agent goes, it's definitely being loaded via 64 bit java.

          boris ivan added a comment - It's 64 bit Windows (have seen this on Windows 7, Windows 10), and 64 bit Java being executed in a 64 bit powershell window, to load slave.jar from the command line. As far as the maven job goes, I think that starts with a 64 bit version of java too, but will try and make sure. But as far as loading the jenkins slave agent goes, it's definitely being loaded via 64 bit java.

          Martin Gerdes added a comment -

          We have this problem too.

          For some yet unresolved reason we have surefire and mavenInstallation processes which neven finish (developers are still trying to determine the cause).

          But because of that, this bug is pretty terrible for us: When developers stop a job in Jenkins, the sufrefire and mavenInstallation processes remain, consuming memory and CPU until the server becomes unresponsive or oom events occur.

          Environment:

          Jenkins ver. 2.73.1 running in a docker instance (jenkins/jenkins:lts, which uses Debian Version 9.1)
          Used Java:
            surefire: Java SE Development Kit 7u80 (installed from within Jenkins)
            mavenInstallation: openjdk-8-jdk:amd64           8u141-b15-1~deb9u1 (system wide java installation in the docker container)

          Martin Gerdes added a comment - We have this problem too. For some yet unresolved reason we have surefire and mavenInstallation processes which neven finish (developers are still trying to determine the cause). But because of that, this bug is pretty terrible for us: When developers stop a job in Jenkins, the sufrefire and mavenInstallation processes remain, consuming memory and CPU until the server becomes unresponsive or oom events occur. Environment: Jenkins ver. 2.73.1 running in a docker instance (jenkins/jenkins:lts, which uses Debian Version 9.1) Used Java:   surefire: Java SE Development Kit 7u80 (installed from within Jenkins)   mavenInstallation: openjdk-8-jdk:amd64           8u141-b15-1~deb9u1 (system wide java installation in the docker container)

          Oleg Nenashev added a comment -

          OK, so it happens in Linux as well. Interesting...

          Oleg Nenashev added a comment - OK, so it happens in Linux as well. Interesting...

          Martin Gerdes added a comment -

          It also definitely is not a case of mixing 32 and 64bit java versions:

          /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -version
          openjdk version "1.8.0_141"
          OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-1~deb9u1-b15)
          OpenJDK 64-Bit Server VM (build 25.141-b15, mixed mode)

          /var/jenkins_home/tools/hudson.model.JDK/JDK-7/bin/java -version
          java version "1.7.0_80"
          Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
          Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

          Could it be because we are mixing java 7 and java 8 here?

          Any other ideas of what to change to avoid being affected by this bug?

          Martin Gerdes added a comment - It also definitely is not a case of mixing 32 and 64bit java versions: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -version openjdk version "1.8.0_141" OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-1~deb9u1-b15) OpenJDK 64-Bit Server VM (build 25.141-b15, mixed mode) /var/jenkins_home/tools/hudson.model.JDK/JDK-7/bin/java -version java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) Could it be because we are mixing java 7 and java 8 here? Any other ideas of what to change to avoid being affected by this bug?

          Nirmit Srivastava added a comment - - edited

          Is there any solution to above problem. We are facing similar issue where surefire booter process keeps on running on a linux slave machine.

          Jenkins version being used : Jenkins ver. 2.136

          Nirmit Srivastava added a comment - - edited Is there any solution to above problem. We are facing similar issue where surefire booter process keeps on running on a linux slave machine. Jenkins version being used :  Jenkins ver. 2.136

            Unassigned Unassigned
            rddesmond Ryan Desmond
            Votes:
            6 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated: