• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None
    • any
    • 2.141

      Using the freestyle projects to execute bash shell scripts work fine. But cancelling a jenkins job seems to use SIGKILL. In this way the script cannot perform cleanup operations and free resources.

      SIGKILL cannot be handled by shell

      SIGINT/SIGTERM are not used by jenkins

      Preferred: SIGINT -> wait 5 seconds -> SIGKILL

          [JENKINS-17116] graceful job termination

          Markus Breuer created issue -

          Martin d'Anjou added a comment - - edited

          I created this freestyle job, but the traps are never invoked when hitting [x] to "stop" the job.

          #!/bin/bash
          echo "Starting $0"
          echo "Listing traps"
          trap -p
          echo "Setting trap"
          trap 'echo SIGTERM; kill $pid; exit 15;' SIGTERM
          trap 'echo SIGINT; kill $pid; exit 2;' SIGINT
          echo "Listing traps again"
          trap -p
          echo "Sleeping"
          sleep 10 & pid=$!
          echo "Waiting"
          wait $pid
          echo "Exit status: $?"
          echo "Ending"
          

          It looks like Jenkins is using kill -9, but it is not since the rest of the script is executed:

          Listing traps
          Setting trap
          Listing traps again
          trap -- 'echo SIGINT; kill $pid; exit 2;' SIGINT
          trap -- 'echo SIGTERM; kill $pid; exit 15;' SIGTERM
          Sleeping
          Waiting
          Build was aborted
          Aborted by d'Anjou, Martin
          Build step 'Groovy Postbuild' marked build as failure
          Recording test results
          Exit status: 143
          Ending
          

          Is it possible that Jenkins disables the traps?

          Martin d'Anjou added a comment - - edited I created this freestyle job, but the traps are never invoked when hitting [x] to "stop" the job. #!/bin/bash echo "Starting $0" echo "Listing traps" trap -p echo "Setting trap" trap 'echo SIGTERM; kill $pid; exit 15;' SIGTERM trap 'echo SIGINT; kill $pid; exit 2;' SIGINT echo "Listing traps again" trap -p echo "Sleeping" sleep 10 & pid=$! echo "Waiting" wait $pid echo "Exit status: $?" echo "Ending" It looks like Jenkins is using kill -9, but it is not since the rest of the script is executed: Listing traps Setting trap Listing traps again trap -- 'echo SIGINT; kill $pid; exit 2;' SIGINT trap -- 'echo SIGTERM; kill $pid; exit 15;' SIGTERM Sleeping Waiting Build was aborted Aborted by d'Anjou, Martin Build step 'Groovy Postbuild' marked build as failure Recording test results Exit status: 143 Ending Is it possible that Jenkins disables the traps?

          Making this a major issue because there is no way a free style job can clean up after itself.

          Martin d'Anjou added a comment - Making this a major issue because there is no way a free style job can clean up after itself.
          Martin d'Anjou made changes -
          Priority Original: Minor [ 4 ] New: Major [ 3 ]

          torbent added a comment -

          I am struggling with this as well! There is documentation which states that Jenkins uses SIGTERM to kill processes, but I too am having a hard time trapping it. One of the problems I have is that even if my script might trap the TERM, Jenkins appears to not wait for termination of the process(es) it has started. It's a bit difficult, then, to know whether the traps work or not when I cannot see the output.

          You should be aware that the bash build scripts are usually invoked with -e, which may "break" your error handling. Jenkins will list all of the processes you have started, including the sleep, and send a TERM to all of them. Your sleep then fails (before you can kill it), causing the rest of the script to fail. It looks like you may have worked around that to get the "Ending" text out, but it caught me and may confuse others trying to reproduce the problem
          The "list all of the processes" part involves an environment variable called BUILD_ID. See https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller

          By using a set +e (and maybe BUILD_ID=ignore – so many experiments lately) I have managed to make my script ignore TERM, which can consistently lead to an orphaned bash. Jenkins is certain the build is aborted, but the script keeps running. I can kill the script (behind Jenkins) with -9, however.

          torbent added a comment - I am struggling with this as well! There is documentation which states that Jenkins uses SIGTERM to kill processes, but I too am having a hard time trapping it. One of the problems I have is that even if my script might trap the TERM, Jenkins appears to not wait for termination of the process(es) it has started. It's a bit difficult, then, to know whether the traps work or not when I cannot see the output. You should be aware that the bash build scripts are usually invoked with -e, which may "break" your error handling. Jenkins will list all of the processes you have started, including the sleep, and send a TERM to all of them. Your sleep then fails (before you can kill it), causing the rest of the script to fail. It looks like you may have worked around that to get the "Ending" text out, but it caught me and may confuse others trying to reproduce the problem The "list all of the processes" part involves an environment variable called BUILD_ID. See https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller By using a set +e (and maybe BUILD_ID=ignore – so many experiments lately) I have managed to make my script ignore TERM, which can consistently lead to an orphaned bash. Jenkins is certain the build is aborted, but the script keeps running. I can kill the script (behind Jenkins) with -9, however.

          When the shell script starts with the shabang:

          #!/bin/bash
          set -o
          echo $-
          

          I get:

          allexport      	off
          braceexpand    	on
          emacs          	off
          errexit        	off
          errtrace       	off
          functrace      	off
          hashall        	on
          histexpand     	off
          history        	off
          ignoreeof      	off
          interactive-comments	on
          keyword        	off
          monitor        	off
          noclobber      	off
          noexec         	off
          noglob         	off
          nolog          	off
          notify         	off
          nounset        	off
          onecmd         	off
          physical       	off
          pipefail       	off
          posix          	off
          privileged     	off
          verbose        	off
          vi             	off
          xtrace         	off
          hB
          

          When the shell script does not start with the shabang:

          set -o
          echo $-
          

          I get:

          + set -o
          allexport      	off
          braceexpand    	on
          emacs          	off
          errexit        	on
          errtrace       	off
          functrace      	off
          hashall        	on
          histexpand     	off
          history        	off
          ignoreeof      	off
          interactive-comments	on
          keyword        	off
          monitor        	off
          noclobber      	off
          noexec         	off
          noglob         	off
          nolog          	off
          notify         	off
          nounset        	off
          onecmd         	off
          physical       	off
          pipefail       	off
          posix          	on
          privileged     	off
          verbose        	off
          vi             	off
          xtrace         	on
          + echo ehxB
          ehxB
          

          Conclusion: Jenkins forces -ex when there is no shabang (#!/bin/bash) line, so you can control at least that part.

          Martin d'Anjou added a comment - When the shell script starts with the shabang: #!/bin/bash set -o echo $- I get: allexport off braceexpand on emacs off errexit off errtrace off functrace off hashall on histexpand off history off ignoreeof off interactive-comments on keyword off monitor off noclobber off noexec off noglob off nolog off notify off nounset off onecmd off physical off pipefail off posix off privileged off verbose off vi off xtrace off hB When the shell script does not start with the shabang: set -o echo $- I get: + set -o allexport off braceexpand on emacs off errexit on errtrace off functrace off hashall on histexpand off history off ignoreeof off interactive-comments on keyword off monitor off noclobber off noexec off noglob off nolog off notify off nounset off onecmd off physical off pipefail off posix on privileged off verbose off vi off xtrace on + echo ehxB ehxB Conclusion: Jenkins forces -ex when there is no shabang (#!/bin/bash) line, so you can control at least that part.

          First point: Changing the value of the BUILD_ID variable to bypass the tree killed is a bad idea: it changes the meaning of BUILD_ID. It would have been better to use a different variable name to express the "don't kill me" idea (hint: if the user sets DONTKILLME=true, then don't kill it).

          Second point: Changing BUILD_ID has no effect on the example script shown in the first comment: it seems Jenkins disables the traps. I tried setting BUILD_ID in a job parameter and in the environment injection plugin to no avail.

          Here are 2 scenarios explaining why Jenkins must not intercept the signals and must let the freestyle jobs handle their own termination:
          1) the freestyle job needs a way to remove temporary files it might have created
          2) the freestyle job needs a way to kill remote processes it might have created

          I feel scenario 2 needs an explanation: Say the freestyle job spawned a process on a remote host, and disconnected from that remote host. There is no way for the process tree killer to find the connection between the freestyle job bash script, and the remote process, only the freestyle job script can kill the remote job. This is why signals must be propagated and not intercepted.

          Martin d'Anjou added a comment - First point: Changing the value of the BUILD_ID variable to bypass the tree killed is a bad idea: it changes the meaning of BUILD_ID. It would have been better to use a different variable name to express the "don't kill me" idea (hint: if the user sets DONTKILLME=true, then don't kill it). Second point: Changing BUILD_ID has no effect on the example script shown in the first comment: it seems Jenkins disables the traps. I tried setting BUILD_ID in a job parameter and in the environment injection plugin to no avail. Here are 2 scenarios explaining why Jenkins must not intercept the signals and must let the freestyle jobs handle their own termination: 1) the freestyle job needs a way to remove temporary files it might have created 2) the freestyle job needs a way to kill remote processes it might have created I feel scenario 2 needs an explanation: Say the freestyle job spawned a process on a remote host, and disconnected from that remote host. There is no way for the process tree killer to find the connection between the freestyle job bash script, and the remote process, only the freestyle job script can kill the remote job. This is why signals must be propagated and not intercepted.
          Martin d'Anjou made changes -
          Link New: This issue is related to JENKINS-3105 [ JENKINS-3105 ]

          After experimenting some more, it seems Jenkins cuts the ties to the child process too soon after sending the TERM signal. Some times, when the job runs on the master, I do see the message from the SIGTERM trap, and a lot of times, I don't see it. This makes it hard to tell what really happens. It looks like Jenkins simply needs to wait for the job process to cut the ties to stdout/stderr before it stops listening to the job itself.

          On IRC (May 8, 2013), there was a discussion on changing SIGTERM to SIGTERM -> wait 10 sec -> SIGKILL, but I would prefer if this delay was configurable or even optional, as the clean up done by a properly behaving job could take more than 10 seconds (and it does take a few minutes in my case due to a very large amount of small files to clean up on NFS).

          Here are loosely related but different requests:
          JENKINS-11995
          JENKINS-11996

          Martin d'Anjou added a comment - After experimenting some more, it seems Jenkins cuts the ties to the child process too soon after sending the TERM signal. Some times, when the job runs on the master, I do see the message from the SIGTERM trap, and a lot of times, I don't see it. This makes it hard to tell what really happens. It looks like Jenkins simply needs to wait for the job process to cut the ties to stdout/stderr before it stops listening to the job itself. On IRC (May 8, 2013), there was a discussion on changing SIGTERM to SIGTERM -> wait 10 sec -> SIGKILL, but I would prefer if this delay was configurable or even optional, as the clean up done by a properly behaving job could take more than 10 seconds (and it does take a few minutes in my case due to a very large amount of small files to clean up on NFS). Here are loosely related but different requests: JENKINS-11995 JENKINS-11996

          Owen Mehegan added a comment -

          This may explain a problem I've been seeing. When a user cancels a build while a Ruby 'bundle install' operation is happening, the job exits but the bundle process goes into a zombie-ish state (not literally a zombie process but it never exits), no longer a child of the Jenkins process. I have to kill it manually, and sometimes it freaks out and consumes a lot of resources on the box as well. I'm not sure if we need a bigger/different hammer here, or what.

          Owen Mehegan added a comment - This may explain a problem I've been seeing. When a user cancels a build while a Ruby 'bundle install' operation is happening, the job exits but the bundle process goes into a zombie-ish state (not literally a zombie process but it never exits), no longer a child of the Jenkins process. I have to kill it manually, and sometimes it freaks out and consumes a lot of resources on the box as well. I'm not sure if we need a bigger/different hammer here, or what.

            Unassigned Unassigned
            markusb Markus Breuer
            Votes:
            38 Vote for this issue
            Watchers:
            52 Start watching this issue

              Created:
              Updated:
              Resolved: