• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None
    • any
    • 2.141

      Using the freestyle projects to execute bash shell scripts work fine. But cancelling a jenkins job seems to use SIGKILL. In this way the script cannot perform cleanup operations and free resources.

      SIGKILL cannot be handled by shell

      SIGINT/SIGTERM are not used by jenkins

      Preferred: SIGINT -> wait 5 seconds -> SIGKILL

          [JENKINS-17116] graceful job termination

          Oleg Nenashev added a comment -

          deepchip Leaking of processes is unrelated to this fix.

          Usual causes:

          • You use 32bit-Java on a 64bit machine
          • You use tool wrappers like Cygwin which mess up the process tree in Windows (See the Cygwin Process Killer plugin)
          • The processes are spawned without inheriting Build reference variables, so the library cannot pick them up if parent processes are already aborted, and the process is orphaned

          I suggest creating a separate issue if none of the above is your case

           

           

           

          Oleg Nenashev added a comment - deepchip Leaking of processes is unrelated to this fix. Usual causes: You use 32bit-Java on a 64bit machine You use tool wrappers like Cygwin which mess up the process tree in Windows (See the Cygwin Process Killer plugin) The processes are spawned without inheriting Build reference variables, so the library cannot pick them up if parent processes are already aborted, and the process is orphaned I suggest creating a separate issue if none of the above is your case      

          Victor Magana added a comment -

          Hello, I'm seeing an error in the hudson.util.ProcessTree logger, "External Ctrl+C execution failed for process pid=3872. Ctrl+C process exited with code -1073741515: Failed to attach to the console".  Is there any option/parameter that needs to be set for this to attach and send the Ctrl+C signal. I'm running Jenkins Server on Windows 7x64 version 2.150. Running a Windows batch job on the local master that executes a python script.  Also ran as Execute Python Script job, same error.  Thanks for any help.

           

          Failed to send CTRL+C to pid=3872
          org.jvnet.winp.WinpException: External Ctrl+C execution failed for process pid=3872. Ctrl+C process exited with code -1073741515: Failed to attach to the console (see the AttachConsole WinAPI call). error=0 at winp.cpp:59

          at org.jvnet.winp.Native.sendCtrlC(Native Method)
          at org.jvnet.winp.Native.sendCtrlC(Native.java:90)
          at org.jvnet.winp.WinProcess.sendCtrlC(WinProcess.java:93)
          at hudson.util.ProcessTree$WindowsOSProcess.killSoftly(ProcessTree.java:538)
          at hudson.util.ProcessTree$WindowsOSProcess.killRecursively(ProcessTree.java:517)
          at hudson.util.ProcessTree.killAll(ProcessTree.java:168)
          at hudson.Proc$LocalProc.destroy(Proc.java:384)
          at hudson.Proc$LocalProc.join(Proc.java:357)
          at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
          at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
          at hudson.model.Build$BuildExecution.build(Build.java:206)
          at hudson.model.Build$BuildExecution.doRun(Build.java:163)
          at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
          at hudson.model.Run.execute(Run.java:1810)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
          at hudson.model.ResourceController.execute(ResourceController.java:97)
          at hudson.model.Executor.run(Executor.java:429)

          Victor Magana added a comment - Hello, I'm seeing an error in the hudson.util.ProcessTree logger, "External Ctrl+C execution failed for process pid=3872. Ctrl+C process exited with code -1073741515: Failed to attach to the console".  Is there any option/parameter that needs to be set for this to attach and send the Ctrl+C signal. I'm running Jenkins Server on Windows 7x64 version 2.150. Running a Windows batch job on the local master that executes a python script.  Also ran as Execute Python Script job, same error.  Thanks for any help.   Failed to send CTRL+C to pid=3872 org.jvnet.winp.WinpException: External Ctrl+C execution failed for process pid=3872. Ctrl+C process exited with code -1073741515: Failed to attach to the console (see the AttachConsole WinAPI call). error=0 at winp.cpp:59 at org.jvnet.winp.Native.sendCtrlC(Native Method) at org.jvnet.winp.Native.sendCtrlC(Native.java:90) at org.jvnet.winp.WinProcess.sendCtrlC(WinProcess.java:93) at hudson.util.ProcessTree$WindowsOSProcess.killSoftly(ProcessTree.java:538) at hudson.util.ProcessTree$WindowsOSProcess.killRecursively(ProcessTree.java:517) at hudson.util.ProcessTree.killAll(ProcessTree.java:168) at hudson.Proc$LocalProc.destroy(Proc.java:384) at hudson.Proc$LocalProc.join(Proc.java:357) at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744) at hudson.model.Build$BuildExecution.build(Build.java:206) at hudson.model.Build$BuildExecution.doRun(Build.java:163) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504) at hudson.model.Run.execute(Run.java:1810) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:429)

          Martin d'Anjou added a comment - - edited

          The TERM signal is trapped by the freestyle script when the job runs on the Jenkins master, but when it runs on a node, the signal is not received (or not sent?).

          Martin d'Anjou added a comment - - edited The TERM signal is trapped by the freestyle script when the job runs on the Jenkins master, but when it runs on a node, the signal is not received (or not sent?).

          Oliver Smith added a comment -

          I have created the following demo script:

          #!/bin/sh -ex
          
          trap cleanup "TERM"
          set +x
          
          cleanup() {
          	echo "Caught signal, cleaning up..."
          	exit 1
          }
          
          echo "Sleeping..."
          
          while true; do
          	sleep 0.1
          done
          
          # should not get here due to while true
          echo "EOF"
          

          When running in a terminal without jenkins, it catches the signal as expected (e.g. with "pkill -TERM trapscript.sh"):

          $ ./trapscript.sh
          + trap cleanup TERM
          + set +x
          Sleeping...
          Caught signal, cleaning up...
          

          On Jenkins 2.150.2, it does not run the cleanup function:

          [TEST_trap_in_jenkins_job] $ /bin/sh -ex /tmp/jenkins5365212366501463498.sh
          + trap cleanup TERM
          + set +x
          Sleeping...
          Build was aborted
          Aborted by Oliver Smith
          Terminated
          Finished: ABORTED
          

          The server is configured to run all jobs on nodes, so this might be the same problem that deepchip pointed out above:
          when it runs on a node, the signal is not received (or not sent?).

          It would be great if somebody could look into this, thanks!

          Oliver Smith added a comment - I have created the following demo script: #!/bin/sh -ex trap cleanup "TERM" set +x cleanup() { echo "Caught signal, cleaning up..." exit 1 } echo "Sleeping..." while true; do sleep 0.1 done # should not get here due to while true echo "EOF" When running in a terminal without jenkins, it catches the signal as expected (e.g. with "pkill -TERM trapscript.sh"): $ ./trapscript.sh + trap cleanup TERM + set +x Sleeping... Caught signal, cleaning up... On Jenkins 2.150.2 , it does not run the cleanup function: [TEST_trap_in_jenkins_job] $ /bin/sh -ex /tmp/jenkins5365212366501463498.sh + trap cleanup TERM + set +x Sleeping... Build was aborted Aborted by Oliver Smith Terminated Finished: ABORTED The server is configured to run all jobs on nodes, so this might be the same problem that deepchip pointed out above: when it runs on a node, the signal is not received (or not sent?). It would be great if somebody could look into this, thanks!

          pvtuan10 do you know how we could debug the communication between master and agents? It seems like the unix kill signal is not sent or received by the agent.

          Martin d'Anjou added a comment - pvtuan10 do you know how we could debug the communication between master and agents? It seems like the unix kill signal is not sent or received by the agent.

          Owen Mehegan added a comment -

          deepchip possibly a question for jthompson.

          Owen Mehegan added a comment - deepchip possibly a question for jthompson .

          Uwe Teichmann added a comment -

          I'm using Jenkins 2.198 and JRE  jre-1.8.0-openjdk. I created a free style job based on the script by Oliver Smith. The Build Step defined as:

          sudo su - oracle<<eof
          ./shell_signal_handling
          eof
          

          If i cancel the job the process and it's child process get killed correctly, but the console output is:

          Gestartet durch Benutzer Uwe Teichmann
          Running as SYSTEM
          Baue auf dem Agenten „elendil“ (oracle) in Arbeitsbereich /vorlons/jenkins/elendil/workspace/Training/FreeStyle/Shell_Signal_Handling
          [Shell_Signal_Handling] $ /bin/bash -xe /tmp/jenkins10950746844547967776.sh
          + sudo su - oracle
          Sleeping...
          Build wurde abgebrochen
          Abgebrochen von Uwe Teichmann
          Finished: ABORTED
          

          If i change the Build Step to

          sudo su - oracle<<eof
          ./shell_signal_handling 1>./shell_signal_handling.log 2>&1
          eof
          

          the log file contains

          Sleeping...
          Caught signal, cleaning up...
          

          My conclusion: The behaviour is correct, because:

          • The SIGTERM results in the parent process and his childs to stop and exit correctly.
          • The output of the shell script appears in the console job output as long as the connection is active. When killed, the child processes can't send there output anymore to the parent. To see there output we need the log file.

          Uwe Teichmann added a comment - I'm using Jenkins 2.198 and JRE  jre-1.8.0-openjdk. I created a free style job based on the script by Oliver Smith. The Build Step defined as: sudo su - oracle<<eof ./shell_signal_handling eof If i cancel the job the process and it's child process get killed correctly, but the console output is: Gestartet durch Benutzer Uwe Teichmann Running as SYSTEM Baue auf dem Agenten „elendil“ (oracle) in Arbeitsbereich /vorlons/jenkins/elendil/workspace/Training/FreeStyle/Shell_Signal_Handling [Shell_Signal_Handling] $ /bin/bash -xe /tmp/jenkins10950746844547967776.sh + sudo su - oracle Sleeping... Build wurde abgebrochen Abgebrochen von Uwe Teichmann Finished: ABORTED If i change the Build Step to sudo su - oracle<<eof ./shell_signal_handling 1>./shell_signal_handling.log 2>&1 eof the log file contains Sleeping... Caught signal, cleaning up... My conclusion: The behaviour is correct, because: The SIGTERM results in the parent process and his childs to stop and exit correctly. The output of the shell script appears in the console job output as long as the connection is active. When killed, the child processes can't send there output anymore to the parent. To see there output we need the log file.

          finrodkosh are you running the job on the master or on a remote node? The problem manifests itself when the job runs on a remote node.

          Martin d'Anjou added a comment - finrodkosh are you running the job on the master or on a remote node? The problem manifests itself when the job runs on a remote node.

          Uwe Teichmann added a comment -

          The test was done on a local machine, where Jenkins Master and Slave communicate via JNLP.

          Uwe Teichmann added a comment - The test was done on a local machine, where Jenkins Master and Slave communicate via JNLP.

          Uwe Teichmann added a comment -

          At work we use Jenkins 2.138.1. The problem can be reproduced for both freestyle jobs and pipelines. The testcase is performed between two different servers using SLES 12.4, OpenJDP 1.8.0_191-b12.

          Uwe Teichmann added a comment - At work we use Jenkins 2.138.1. The problem can be reproduced for both freestyle jobs and pipelines. The testcase is performed between two different servers using SLES 12.4, OpenJDP 1.8.0_191-b12.

            Unassigned Unassigned
            markusb Markus Breuer
            Votes:
            38 Vote for this issue
            Watchers:
            52 Start watching this issue

              Created:
              Updated:
              Resolved: