-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Jenkins 2.303.1
Setting the BUILD_ID environment variable in a process to dontKillMe is not preventing it from being killed by ProcessTreeKiller as described in the link.
Given a freestyle project with the following build script:
#!/bin/bash set -x exec 2>/tmp/abort-test.debug exec 1>&2 trap 'echo "Got INTR"; exit 1' SIGINT trap 'echo "Got TERM"; exit 1' SIGTERM trap 'echo "Got HUP"; exit 1' SIGHUP BUILD_ID=dontKillMe /usr/bin/sleep 300 & wait sleep 5
does not make the /usr/bin/sleep immune from the ProcessTreeKiller when the job is aborted.
Here's my proof...
Start the job running and then observe which processes are part of the job by finding which processes have the /tmp/abort-test.debug file open:
# fuser /tmp/abort-test.debug /tmp/abort-test.debug: 37173 37177 # ps -p37173,37177 fw PID TTY STAT TIME COMMAND 37173 ? S 0:00 /bin/bash /tmp/jenkins759647078472167760.sh 37177 ? S 0:00 \_ /usr/bin/sleep 300
Now find the parent of the script to find the Jenkins executor process:
# ps -ef | grep 37173 lcl_bui+ 37173 12777 0 19:18 ? 00:00:00 /bin/bash /tmp/jenkins759647078472167760.sh lcl_bui+ 37177 37173 0 19:18 ? 00:00:00 /usr/bin/sleep 300 # ps -p 12777 PID TTY TIME CMD 12777 ? 08:57:24 java
Now attach strace to the executor:
strace -o /tmp/java.strace -f -p 12777 -e trace=\!futex,sched_yield
and then kill the job from Jenkins. Wait a few seconds and then observe what's in the file strace wrote and see that it did indeed kill the /usr/bin/sleep process:
... 37152 kill(37177, SIGTERM) = 0 37152 stat("/proc/37177/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 37152 stat("/proc/37177/status", 0x7f3af3bc8090) = -1 ENOENT (No such file or directory) 37152 prctl(PR_SET_NAME, "pool-1-thread-4"...) = 0 37152 write(24, "\v+", 2) = 2 37152 write(24, "\254\355\0\5sr\0\33hudson.remoting.UserRequ"..., 2859) = 2859 12949 <... read resumed>"\7\363", 8192) = 2 12949 read(0, "\254\355\0\5sr\0\30hudson.remoting.Response"..., 8192) = 2035 12949 read(0, <unfinished ...> 37152 prctl(PR_SET_NAME, "pool-1-thread-4"...) = 0
Followed by the main job script:
37152 kill(37173, SIGTERM) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 12838 mprotect(0x7f3d4a8f7000, 4096, PROT_READ) = 0 12838 mprotect(0x7f3d4a8f7000, 4096, PROT_READ|PROT_WRITE) = 0 12838 mprotect(0x7f3d4a8f8000, 4096, PROT_NONE) = 0 12838 mprotect(0x7f3d4a8f8000, 4096, PROT_READ) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 12949 <... read resumed>"\6\352", 8192) = 2 12949 read(0, "\254\355\0\5sr\0\33hudson.remoting.UserRequ"..., 8192) = 1770 12949 read(0, <unfinished ...> 37234 prctl(PR_SET_NAME, "pool-1-thread-4"...) = 0 37234 write(24, "\6+", 2) = 2 37234 write(24, "\254\355\0\5sr\0\30hudson.remoting.Response"..., 1579) = 1579 37234 prctl(PR_SET_NAME, "pool-1-thread-4"...) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 37152 stat("/proc/37173/status", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 37152 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=37173, si_uid=1101, si_status=1, si_utime=0, si_stime=0} --- 37152 restart_syscall(<... resuming interrupted stat ...> <unfinished ...> 37174 <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 37173 37152 <... restart_syscall resumed>) = -1 ETIMEDOUT (Connection timed out)
Which we can confirm by looking at the xtrace output from the job:
# cat /tmp/abort-test.debug + exec + trap 'echo "Got INTR"; exit 1' SIGINT + trap 'echo "Got TERM"; exit 1' SIGTERM + trap 'echo "Got HUP"; exit 1' SIGHUP + wait + BUILD_ID=dontKillMe + /usr/bin/sleep 300 + sleep 5 ++ echo 'Got TERM' Got TERM ++ exit 1
The important point here is that the /usr/bin/sleep process was killed by the executor even though it had it's BUILD_ID set to dontKillMe.