• Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • Fedora 26, Jenkins ver. 2.60.2, java-1.8.0-openjdk

      Simple pipeline below sporadically hangs after completing last "sh" step. Command already completed and can't be seen in process list, but pipeline is still in running state and won't finish.

      There are 2 nodes "builder" and "runner", which (for testing) were both setup to run on localhost (via ssh). Jenkins did ~15 builds of pipeline below before running into this problem, there are no other jobs/builds.

      I'll try to keep the system in this failed state for couple days, in case anyone has tips what further data would be useful to gather:

      Jenkinsfile:

      node('builder') {
          stage('Build/Fetch') {
      	git ...
      	sh '''curl -O http://file/skt/sktrc
                    curl -O http://file/skt/default.config'''
          }
          stage('Build') {
      	sh '''./skt.py -vv --state --rc sktrc merge
      	      ./skt.py -vv --state --rc sktrc build
      	      ./skt.py -vv --state --rc sktrc publish'''
      	sktrc = readFile 'sktrc'
      	sh '''./skt.py -vv --state --rc sktrc cleanup'''
          }
      }
      
      node('runner') {
          stage('Test/Fetch') {
      	git 'http://git/skt.git'
      	sh '''curl -O http://filejob.xml'''
      	writeFile file: 'sktrc', text: "${sktrc}"
          }
          stage('Test') {
      	sh '''PATH="/home/worker/bin:$PATH" ./skt.py -vv --state --rc sktrc run --wait
      	      ./skt.py -vv --state --rc sktrc cleanup'''
          }
      }
      

      Pipeline threadDump:

      Thread #12
      	at WorkflowScript.run(WorkflowScript:36)
      	at DSL.stage(Native Method)
      	at WorkflowScript.run(WorkflowScript:35)
      	at DSL.node(running on runner_localhost)
      	at WorkflowScript.run(WorkflowScript:29)
      

          [JENKINS-46283] pipeline hangs after executing sh step command

          Reinhold Füreder added a comment - - edited

          I may have experienced this issue ("pipeline hangs after executing sh step command") as well – jglick maybe it is more similar to JENKINS-37730 though (because the thread dump looks more similar: emphasizing "DSL.sh(completed process...") – and in my case an admittedly accidental unmotivated but at least gentle Jenkins restart happened DURING the 'sh' step execution (thus without prior 'Manage Jenkins > Prepare for Shutdown'):

          • the symptom is that the pipeline build is still hanging around, although the process started by 'sh' step has exited (including all spawned child processes etc; actually it is an ssh call, but let's try to ignore the g(l)ory details...)
          • the thread dump for the build is:
            Thread #12
            	at DSL.sh(completed process (code 0) in /var/lib/jenkins/workspace/Recovery/ACME@tmp/durable-6572e136; recurrence period: 0ms)
            	at WorkflowScript.run(WorkflowScript:35)
            	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.delegateAndExecute(jar:file:/var/lib/jenkins/plugins/pipeline-model-definition/WEB-INF/lib/pipeline-model-definition.jar!/org/jenkinsci/plugins/pipeline/modeldefinition/ModelInterpreter.groovy:136)
            	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.executeSingleStage(jar:file:/var/lib/jenkins/plugins/pipeline-model-definition/WEB-INF/lib/pipeline-model-definition.jar!/org/jenkinsci/plugins/pipeline/modeldefinition/ModelInterpreter.groovy:490)
            	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.catchRequiredContextForNode(jar:file:/var/lib/jenkins/plugins/pipeline-model-definition/WEB-INF/lib/pipeline-model-definition.jar!/org/jenkinsci/plugins/pipeline/modeldefinition/ModelInterpreter.groovy:258)
            	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.catchRequiredContextForNode(jar:file:/var/lib/jenkins/plugins/pipeline-model-definition/WEB-INF/lib/pipeline-model-definition.jar!/org/jenkinsci/plugins/pipeline/modeldefinition/ModelInterpreter.groovy:256)
            	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.executeSingleStage(jar:file:/var/lib/jenkins/plugins
            
          • the build log show:
            ...
            06:09:07 TASK: [Fetch *** home from *** on ***] ***************** 
            Resuming build at Wed Nov 22 06:50:27 CET 2017 after Jenkins restart
            Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down
            Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down
            Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down
            Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down
            Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down
            Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down
            Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down
            Pausing
            Still paused
            Resuming
            Resuming build at Wed Nov 22 07:23:14 CET 2017 after Jenkins restart
            Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down
            Still paused
            Resuming
            
            Pausing
            
            Resuming
            
            • As you can see I got impatient and played around with pause/resume
          • Extension based on JENKINS-41482:
            • It is a master-only scenario with no slaves
              • But I the process hierarchy started by 'sh' step was still running for a very long time after the Jenkins (master) had restarted
          • Questions:
            • Is my scenario even supposed to work (i.e. being supported)?
            • Would it be necessary to (1) use 'Manage Jenkins > Prepare for Shutdown' approach first, (2) waiting for all currently running steps to be finished, and (3) then and only then restart Jenkins (master)?

          Reinhold Füreder added a comment - - edited I may have experienced this issue ("pipeline hangs after executing sh step command") as well – jglick maybe it is more similar to JENKINS-37730 though (because the thread dump looks more similar: emphasizing "DSL.sh(completed process...") – and in my case an admittedly accidental unmotivated but at least gentle Jenkins restart happened DURING the 'sh' step execution (thus without prior 'Manage Jenkins > Prepare for Shutdown'): the symptom is that the pipeline build is still hanging around, although the process started by 'sh' step has exited (including all spawned child processes etc; actually it is an ssh call, but let's try to ignore the g(l)ory details...) the thread dump for the build is: Thread #12 at DSL.sh(completed process (code 0) in /var/lib/jenkins/workspace/Recovery/ACME@tmp/durable-6572e136; recurrence period: 0ms) at WorkflowScript.run(WorkflowScript:35) at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.delegateAndExecute(jar:file:/var/lib/jenkins/plugins/pipeline-model-definition/WEB-INF/lib/pipeline-model-definition.jar!/org/jenkinsci/plugins/pipeline/modeldefinition/ModelInterpreter.groovy:136) at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.executeSingleStage(jar:file:/var/lib/jenkins/plugins/pipeline-model-definition/WEB-INF/lib/pipeline-model-definition.jar!/org/jenkinsci/plugins/pipeline/modeldefinition/ModelInterpreter.groovy:490) at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.catchRequiredContextForNode(jar:file:/var/lib/jenkins/plugins/pipeline-model-definition/WEB-INF/lib/pipeline-model-definition.jar!/org/jenkinsci/plugins/pipeline/modeldefinition/ModelInterpreter.groovy:258) at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.catchRequiredContextForNode(jar:file:/var/lib/jenkins/plugins/pipeline-model-definition/WEB-INF/lib/pipeline-model-definition.jar!/org/jenkinsci/plugins/pipeline/modeldefinition/ModelInterpreter.groovy:256) at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.executeSingleStage(jar:file:/var/lib/jenkins/plugins the build log show: ... 06:09:07 TASK: [Fetch *** home from *** on ***] ***************** Resuming build at Wed Nov 22 06:50:27 CET 2017 after Jenkins restart Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down Pausing Still paused Resuming Resuming build at Wed Nov 22 07:23:14 CET 2017 after Jenkins restart Waiting to resume part of Recovery » ACME 20171122-060700-revUNKNOWN: Jenkins is about to shut down Still paused Resuming Pausing Resuming As you can see I got impatient and played around with pause/resume Extension based on JENKINS-41482 : It is a master-only scenario with no slaves But I the process hierarchy started by 'sh' step was still running for a very long time after the Jenkins (master) had restarted Questions: Is my scenario even supposed to work (i.e. being supported)? Would it be necessary to (1) use 'Manage Jenkins > Prepare for Shutdown' approach first, (2) waiting for all currently running steps to be finished, and (3) then and only then restart Jenkins (master)?

          Jesse Glick added a comment -

          I am not sure what jstancek’s issue is; nothing apparently to do with the sh step. reinholdfuereder’s issue is unrelated and I think a duplicate of something open in workflow-cps-plugin.

          Jesse Glick added a comment - I am not sure what jstancek ’s issue is; nothing apparently to do with the sh step. reinholdfuereder ’s issue is unrelated and I think a duplicate of something open in workflow-cps-plugin .

          Jan Stancek added a comment -

          I haven't seen this issue in last couple months - not sure if it got fixed or I've just been lucky.

          I'm OK if you want to close this as "insufficient data".

          Jan Stancek added a comment - I haven't seen this issue in last couple months - not sure if it got fixed or I've just been lucky. I'm OK if you want to close this as "insufficient data".

          Same for me: (a) not seen anymore – admittedly I also did not try to provoke it; and (b) OK for closing

          Reinhold Füreder added a comment - Same for me: (a) not seen anymore – admittedly I also did not try to provoke it; and (b) OK for closing

          I can confirm this is still happening in Jenkins 2.89.4 with all plugins up to date at the time of writing. 

          In our situation we have one step to start a web server, followed by a step to wait until the web server is ready.

          node {
              stage('Running Tests') {
                  setupWebserver()
                  checkStart('http://webserver:8080')
              }
          }
          
          def checkStart(String url) {
              sh """
              ...
              ...curl command in a for loop etc...
              ...
              echo 'Web server up and running!!!'
              """
          }
          
          def setupWebserver() {
              sh """#!/bin/bash
              ...
              ...some pre-steps...
              ...
              webServer/bin/server.sh restart
              """
          }
          

          In the logs you can see that the server is started...

          18:20:20 Starting jetty using port 8080
          18:20:20 Stopping Jetty: OK
          18:20:25 Starting Jetty: STARTED Jetty Sat Mar  3 18:20:24 CET 2018 under PID 21653
          

          On the agent, you can see that the process was started successfully and that the web server is available:

          # check for PID
          jenkins@p-agent:~$ ps aux | grep [2]1653 | awk '{ print $1" "$2 }'
          jenkins 21653
          
          # check for url
          jenkins@p-agent:~$ curl --write-out '%{http_code}\n' -o /dev/null -qsSL http://localhost:8080
          200
          

          Looking for durable tasks, I could not find any processes containing the word 'durable':

          jenkins@p-agent:~$ ps auxww | grep [d]urable | awk '{ print $1" "$2 }'
          jenkins@p-agent:~$
          

          Question: would it be worth adding a pre-script to our sh scripts to:

          • print out the current PID of the durable task
          • print out the current parent directory of the durable task
          • do a ps aux | grep <parent directory> for the running script
          • add a trap to print when the script is exiting
            • which signals should be best handled in this case? EXIT, INT, TERM, HUP?

          Steve Boardwell added a comment - I can confirm this is still happening in Jenkins 2.89.4 with all plugins up to date at the time of writing.  In our situation we have one step to start a web server, followed by a step to wait until the web server is ready. node {     stage('Running Tests') {         setupWebserver()         checkStart('http://webserver:8080')     } } def checkStart(String url) {     sh """     ...     ...curl command in a for loop etc...     ...     echo 'Web server up and running!!!'     """ } def setupWebserver() {     sh """#!/bin/bash     ...     ...some pre-steps...     ...     webServer/bin/server.sh restart     """ } In the logs you can see that the server is started... 18:20:20 Starting jetty using port 8080 18:20:20 Stopping Jetty: OK 18:20:25 Starting Jetty: STARTED Jetty Sat Mar 3 18:20:24 CET 2018 under PID 21653 On the agent, you can see that the process was started successfully and that the web server is available: # check for PID jenkins@p-agent:~$ ps aux | grep [2]1653 | awk '{ print $1" "$2 }' jenkins 21653 # check for url jenkins@p-agent:~$ curl --write-out '%{http_code}\n' -o /dev/null -qsSL http://localhost:8080 200 Looking for durable tasks, I could not find any processes containing the word 'durable': jenkins@p-agent:~$ ps auxww | grep [d]urable | awk '{ print $1" "$2 }' jenkins@p-agent:~$ Question: would it be worth adding a pre-script to our sh scripts to: print out the current PID of the durable task print out the current parent directory of the durable task do a ps aux | grep <parent directory> for the running script add a trap to print when the script is exiting which signals should be best handled in this case? EXIT, INT, TERM, HUP?

          Steve Boardwell added a comment - - edited

          Hi jglick / jstancek / reinholdfuereder

          I think I have the cause, or at least one of the possible causes, and can reproduce the hanging agent in principle. It has to do with the parent process being killed. Why that happens I cannot say, but perhaps the OS does this when the resources are low, etc (in any dockerized agent environment for example)

          I also have a bit of a hacky workaround - feedback and improvements welcome .

          Disclaimer: I'm using the exit codes calculated for bash in my solution as well as the bash built-in for finding my current directory. You'd need to take account of this if using a different shell.

          It's a bit of a long explanation but here goes...

          Summary

          • investigated how signals can affect shell scripts
          • investigated the processes involved
            • discovered an unexpected sibling process
          • found I could reproduce the behaviour by killing the parent process
          • scripted a workaround for the case that the parent process is missing

          Long version

          Investigated what signals can do to a shell script.

          Using the following script I tested what signals did to a bash script. I wanted to find out:

          • which signals would cause a script to exit immediately
          • which signals would first send an EXIT signal before exiting
          • which signals would do nothing, etc.
          [jenkins@jenkins-server] ~ $ cat /tmp/test.sh
          #!/bin/bash
          
          set -euo pipefail
          latestSignalRc=
          
          # register all known traps
          typeset -i sig=1
          while (( sig < 65 )); do
              trap "signum=${sig};test ${sig}" "$sig"
              let sig=sig+1
          done
          trap "test EXIT" "EXIT"
          trap "test ERR" "ERR"
          test() {
              local rc=$?
              if [[ "$1" != "EXIT" ]] && [[ "$1" != "ERR" ]]; then
          	#echo "Non EXIT or ERR. Making latestSignalRc from signum '$signum'."
          	latestSignalRc=$(( $signum + 128 ))
              fi
              echo "Got sig: ${1:-n/a}, signum: '${signum:-n/a}', rc: $rc, latestSignalRc: ${latestSignalRc:-n/a}"
              # Reset to a default signal handler.
              trap - $1
              unset signum
              if [[ "$1" != "EXIT" ]] && [[ "$1" != "ERR" ]]; then
          	# kill process with signal
          	kill -$1 $$
              else
                  # if we receive an error, reset the EXIT trap
                  # because we are leaving now anyway
                  [[ "$1" == "ERR" ]] && trap - EXIT
          	exit ${latestSignalRc:-$rc}
              fi
          }
          
          sleep 0.1
          if [[ "ERR" == "${1:-}" ]]; then
               ls /tmp/nnnn &> /dev/null
          elif [ -n "${1:-}" ]; then
              kill -$1 $$
          fi
          echo "After kill..."
          

          Testing looked something like this:

          [jenkins@jenkins-server] ~ $ for i in ERR 1 2 3 6 15; do echo "-----------------------------"; (/tmp/test.sh $i; echo "Exited: $?"); done
          -----------------------------
          Got sig: ERR, signum: 'n/a', rc: 2, latestSignalRc: n/a
          Exited: 2
          -----------------------------
          Got sig: 1, signum: '1', rc: 0, latestSignalRc: 129
          Exited: 129
          -----------------------------
          Got sig: 2, signum: '2', rc: 0, latestSignalRc: 130
          Got sig: EXIT, signum: 'n/a', rc: 0, latestSignalRc: 130
          Exited: 130
          -----------------------------
          Got sig: 3, signum: '3', rc: 0, latestSignalRc: 131
          After kill...
          Got sig: EXIT, signum: 'n/a', rc: 0, latestSignalRc: 131
          Exited: 131
          -----------------------------
          Got sig: 6, signum: '6', rc: 0, latestSignalRc: 134
          Exited: 134
          -----------------------------
          Got sig: 15, signum: '15', rc: 0, latestSignalRc: 143
          Exited: 143
          

          I settled on catching
          1) SIGHUP
          2) SIGINT
          6) SIGABRT
          15) SIGTERM
          since they (1) could be caught, and (2) caused the script to exit.

          Investigated the processes involved

          At first I simply grep'ed the processes using derived values

          node('agent') {
              sh '''#!/bin/bash
                  set -euo pipefail
                  set +x
                  function onExit() {
                      local exitCode=\$?
                      echo ">>>> Exiting now with exitCode \$exitCode"
                      echo ">>>> Could place exitCode in \$myResultFile"
                      echo "one last log" >> "\$myDir/jenkins-log.txt"
                      echo \$exitCode > \$myResultFile
                      sleep 5
                      echo "Still running..."
                      exit \$exitCode
                  }
                  trap onExit EXIT
                  
                  myPid=\$\$
                  myDir="\$( cd "\$( dirname "\${BASH_SOURCE[0]}" )" && pwd )"
                  myResultFile="\$myDir/jenkins-result.txt"
                  myResultFileGrep="\$myDir/jenkins-result.tx[t]"
                  echo "my pid is \$myPid"
                  echo "my result file is \$myResultFile"
                  echo '----------------------------------'
                  ps -eaf | head -n 1
                  echo '--------- script PID -------------'
                  ps -eaf | grep [d]urable | grep \$myPid
                  echo '--------- script result file -------------'
                  myCmdsParentPid=\$(ps aux | grep "\$myResultFileGrep" | awk '{ print \$3 }' | sort -u )
                  ps -eaf | grep "\$myResultFileGrep"
                  echo '----------- script commands parent pid -----------'
                  ps -eaf | grep "[j]enkins.*\$myCmdsParentPid"
                  echo '---------------------------------'
                  echo "hello from \$(hostname)"
              '''
          }
          

          Resulting in something like:

          ...
          my pid is 10781
          my result file is /var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-result.txt
          UID        PID  PPID  C STIME TTY          TIME CMD
          --------- script PID -------------
          jenkins  10781 10778  0 14:02 ?        00:00:00 /bin/bash /var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/script.sh
          --------- script result file -------------
          jenkins  10778 12979  0 14:02 ?        00:00:00 sh -c { while [ -d '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0' -a \! -f '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-result.txt' ]; do touch '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-log.txt'; sleep 3; done } & jsc=durable-14f8a02757bd1625e0536d94affe2a93; JENKINS_SERVER_COOKIE=$jsc '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/script.sh' > '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-log.txt' 2>&1; echo $? > '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-result.txt'; wait
          jenkins  10780 10778  0 14:02 ?        00:00:00 sh -c { while [ -d '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0' -a \! -f '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-result.txt' ]; do touch '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-log.txt'; sleep 3; done } & jsc=durable-14f8a02757bd1625e0536d94affe2a93; JENKINS_SERVER_COOKIE=$jsc '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/script.sh' > '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-log.txt' 2>&1; echo $? > '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-result.txt'; wait
          ...
          

          Question: I still do not know why there are two instances of the same command "sh -c ...."

           

          I used a recursePid function to follow the trail and realised that the second sh -c ... process was actually a sibling of the script.sh process rather a parent.

          function recursePid() {
              local currentPid=$1
              local pidEntry=$(ps --no-headers -f --pid $currentPid)
              pidParent=$(ps --no-headers -o ppid --pid $currentPid | xargs)
              echo "--------- Current: $currentPid-------------"
              echo "$pidEntry"
              if [ $pidParent -ne 1 ]; then
                  local pidSiblings=$(ps --no-headers -f --ppid $pidParent | grep -v $currentPid)
                  if [ -n "$pidSiblings" ]; then
                      echo "--------- has following siblings -------------"
                      echo "$pidSiblings"
                      pidSibling=$(echo "$pidSiblings" | awk '{ print $2 }')
                      echo "Siblings pid = $pidSibling"
                 fi
                  echo "Sending signal to parent ${SIGNAL}"
                  sleep 3
                  kill -${SIGNAL:-0} $pidParent
                  sleep 3
                  echo "Parent killed..."
                  if ps --no-headers -$pidParent; then 
                      echo "Parent still there"
                  else 
                      echo "Parent gone"
                  fi
                  #recursePid $pidParent
              fi
          }
          

          So, the process tree looks like:

          slave.jar process
            |__ parent "sh -c ..." process
                 |__ sibling "sh -c ..." process
                 |__ "script.sh" process
          

          Found I could reproduce the behaviour by killing the parent process

          I tested sending various signals to the script.sh and sibling sh -c ... processes but:

          • script.sh - worked as expected
          • sibling sh -c ... didn't seem to have an effect.

          Moving up one level, I sent the TERM signal to the parent which causes the agent to hang.

          In the build log...

          14:37:15 Single pid = 90
          14:37:15 Sending signal to parent TERM
          14:37:21 So far, I've got signal: 'EXIT', signum: 'n/a', exitCode: '0', latestSignalRc: 'n/a'
          14:37:21 >>>> Exiting now with exitCode 0
          14:37:21 >>>> Could place exitCode in /home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-result.txt
          14:37:21 I am placing this log directly into the log file...
          <spinning-wheel>
          

          On the agent only the sibling "sh -c ..." process remains...

          jenkins@cd8c03e15e58:~$ ps aux | grep "[s]h -c"
          jenkins      90  0.0  0.0   4512   928 ?        S    14:37   0:00 sh -c { while [ -d '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a' -a \! -f '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-result.txt' ]; do touch '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-log.txt'; sleep 3; done } & jsc=durable-190a2420fb163bce1cd2a8d2213b499c; JENKINS_SERVER_COOKIE=$jsc '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/script.sh' > '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-log.txt' 2>&1; echo $? > '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-result.txt'; wait
          

          The sibling process is touching the jenkins-log.txt every 3 seconds meaning Jenkins can't determine that it is hung.

          Entering a exit code into jenkins-result.txt causes the build to continue (see the timestamps)

          14:37:21 >>>> Could place exitCode in /home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-result.txt
          14:37:21 I am placing this log directly into the log file...
          
          
          ...
          placed the exit code into the jenkins-result.txt now
          ...
          
          
          [Pipeline] }
          [Pipeline] // node
          [Pipeline] node
          14:44:14 Running on Jenkins in /var/lib/jenkins/workspace/EXP-signal-test
          [Pipeline] {
          [Pipeline] tool
          [Pipeline] retry
          [Pipeline] {
          [Pipeline] node
          14:44:14 Running on Jenkins in /var/lib/jenkins/workspace/EXP-signal-tester@2
          [Pipeline] {
          [Pipeline] tool
          [Pipeline] sh
          

          Scripted a workaround for the case that the parent is missing

          I experimented further by catching and resending the signal as with the examples above. However, some signals such as TERM did not allow the script to write it's exit code to the jenkins-result.txt file.

          So, with the knowledge that

          • we can catch the common signals in the script which could cause the script to exit
          • we can determine the parent pid and its status

          I scripted the following to echo the 'would-be' exit code into the appropriate jenkins-result.txt in the case of the parent process being killed.

          The best way to explain how it works is by using an example. Please check the following Jenkinsfile job:

          def bashPreText(def script, def quiet = false, def login = false) {
              String verboseFlag = quiet ? '' : 'set -x'
              String loginFlag = login ? '-l' : ''
              return '''#!/bin/bash ''' + loginFlag + '''
          
          set -euo pipefail # fail fast and fail on unset variables
          
          # don't print the pre-text stuff 
          set +x
          
          # some shell options
          shopt -s globstar
          shopt -s expand_aliases
          
          # traps
          typeset -i sig=1
          for sig in 1 2 6 15; do
              trap "signum=\${sig};handleSignal \${sig}" "\$sig"
          done
          trap "handleSignal EXIT" "EXIT"
          trap "handleSignal ERR" "ERR"
          
          
          function handleSignal() {
              local exitCode=\$?
              local signal=\$1
              
              if [[ "\$signal" != "EXIT" ]] && [[ "\$signal" != "ERR" ]]; then
                  # TODO: account for non-bash exit codes (ksh = signum + 256) 
              	latestSignalRc=\$(( \$signum + 128 ))
              fi
              local finalExitCode=\${latestSignalRc:-\$exitCode}
          
              echo "SIGNAL STATUS: 
              signal: '\${signal:-n/a}'
              signum: '\${signum:-n/a}'
              exitCode: '\$exitCode'
              latestSignalRc: '\${latestSignalRc:-n/a}'
              finalExitCode: '\${finalExitCode}'"
          
              # don't trap EXIT if already in ERR
              [[ "\$signal" != "EXIT" ]] && trap - EXIT 
              
              # React if no parent found
              local currentPidParent=
              currentPidParent=$(ps --no-headers -o ppid --pid \$myPid | xargs)
              if [ \$pidParent -ne \$currentPidParent ]; then
                  echo "WARNING: Parent process missing..."
                  if [[ "true" == "\$ACTIVATE_WORKAROUND" ]]; then
                      echo "Activating workaround - writing the exitCode directly into the '\$myResultFile'."
                      echo \$finalExitCode > \$myResultFile
                  else 
                      echo "Not activating workaround - script has probably hung by now. Fix by aborting or by writing the exitCode directly into the '\$myResultFile'."
                  fi
              fi
              exit \$finalExitCode
          }
          
          myPid=\$\$
          pidParent=\$(ps --no-headers -o ppid --pid \$myPid | xargs)
          myDir="\$( cd "\$( dirname "\${BASH_SOURCE[0]}" )" && pwd )"
          myResultFile="\$myDir/jenkins-result.txt"
          
          
          # verbose flag
          ''' + verboseFlag + '''
          ''' +
          script + '''
          '''.trim().stripIndent()
              }
          
          def bash(Map vars = [:]) {
              vars.script = bashPreText(vars.script, vars.quiet, vars.login)
              sh(vars)
          }
          /* Convenience overload */
          def bash(String script) {
              return bash(script: script)
          }
          
          pipeline {
              agent any
              options {
                  skipDefaultCheckout()
                  timestamps()
                  disableConcurrentBuilds()
                  buildDiscarder(logRotator(numToKeepStr:'30'))
              }
              parameters {
          		booleanParam(defaultValue: true, description: 'Activate the workaround', name: 'ACTIVATE_WORKAROUND')
          		string(defaultValue: '', description: 'The signal to send to the SCRIPT (int or HUP, TERM, etc)', name: 'SIGNAL_SCRIPT')
          		string(defaultValue: 'TERM', description: 'The signal to send to the PARENT (int or HUP, TERM, etc)', name: 'SIGNAL_PARENT')
          	}
              stages {
                  stage('Test') {
                      steps {
                          script {
                  bash quiet: true, script: '''
          
          echo "Starting script with...
              myPid=\$myPid
              pidParent=\$pidParent
              myDir="\$myDir"
              myResultFile="\$myResultFile"
          "
          
          if [ -n "\${SIGNAL_PARENT:-}" ]; then
              echo "Sending signal \${SIGNAL_PARENT} to parent"
              sleep 0.1
              kill -\${SIGNAL_PARENT:-0} \$pidParent
              sleep 0.1
          fi
          
          echo "Middle of script..."
          
          if [ -n "\${SIGNAL_SCRIPT:-}" ]; then
              if [[ "ERR" == "\${SIGNAL_SCRIPT}" ]]; then
                  echo "Failing with an ERR"
                  ls /bla/bla/bla
              else
                  echo "Sending signal \${SIGNAL_SCRIPT} to script"
                  kill -\${SIGNAL_SCRIPT:-0} \$myPid
              fi
          fi
          
          echo "End of script..."
          '''
                          }
                      }
                  }            
              }
          }
          

          Final Workaround

          The final workaround for me was to put the traps and handleSignal function as a type of pretext in a vars/bash.groovy as in the job above (NOTE: don't forget to remove the ACTIVATE_WORKAROUND == true condition) and using it in my global library.

          Hope this maybe helps find a solution to the problem though rather just a hacky than a workaround .

          Steve Boardwell added a comment - - edited Hi jglick / jstancek / reinholdfuereder I think I have the cause, or at least one of the possible causes, and can reproduce the hanging agent in principle. It has to do with the parent process being killed. Why that happens I cannot say, but perhaps the OS does this when the resources are low, etc (in any dockerized agent environment for example) I also have a bit of a hacky workaround - feedback and improvements welcome . Disclaimer : I'm using the exit codes calculated for bash in my solution as well as the bash built-in for finding my current directory. You'd need to take account of this if using a different shell. It's a bit of a long explanation but here goes... Summary investigated how signals can affect shell scripts investigated the processes involved discovered an unexpected sibling process found I could reproduce the behaviour by killing the parent process scripted a workaround for the case that the parent process is missing Long version Investigated what signals can do to a shell script. Using the following script I tested what signals did to a bash script. I wanted to find out: which signals would cause a script to exit immediately which signals would first send an EXIT signal before exiting which signals would do nothing, etc. [jenkins@jenkins-server] ~ $ cat /tmp/test.sh #!/bin/bash set -euo pipefail latestSignalRc= # register all known traps typeset -i sig=1 while (( sig < 65 )); do trap "signum=${sig};test ${sig}" "$sig" let sig=sig+1 done trap "test EXIT" "EXIT" trap "test ERR" "ERR" test() { local rc=$? if [[ "$1" != "EXIT" ]] && [[ "$1" != "ERR" ]]; then #echo "Non EXIT or ERR. Making latestSignalRc from signum '$signum'." latestSignalRc=$(( $signum + 128 )) fi echo "Got sig: ${1:-n/a}, signum: '${signum:-n/a}', rc: $rc, latestSignalRc: ${latestSignalRc:-n/a}" # Reset to a default signal handler. trap - $1 unset signum if [[ "$1" != "EXIT" ]] && [[ "$1" != "ERR" ]]; then # kill process with signal kill -$1 $$ else # if we receive an error, reset the EXIT trap # because we are leaving now anyway [[ "$1" == "ERR" ]] && trap - EXIT exit ${latestSignalRc:-$rc} fi } sleep 0.1 if [[ "ERR" == "${1:-}" ]]; then ls /tmp/nnnn &> /dev/null elif [ -n "${1:-}" ]; then kill -$1 $$ fi echo "After kill..." Testing looked something like this: [jenkins@jenkins-server] ~ $ for i in ERR 1 2 3 6 15; do echo "-----------------------------"; (/tmp/test.sh $i; echo "Exited: $?"); done ----------------------------- Got sig: ERR, signum: 'n/a', rc: 2, latestSignalRc: n/a Exited: 2 ----------------------------- Got sig: 1, signum: '1', rc: 0, latestSignalRc: 129 Exited: 129 ----------------------------- Got sig: 2, signum: '2', rc: 0, latestSignalRc: 130 Got sig: EXIT, signum: 'n/a', rc: 0, latestSignalRc: 130 Exited: 130 ----------------------------- Got sig: 3, signum: '3', rc: 0, latestSignalRc: 131 After kill... Got sig: EXIT, signum: 'n/a', rc: 0, latestSignalRc: 131 Exited: 131 ----------------------------- Got sig: 6, signum: '6', rc: 0, latestSignalRc: 134 Exited: 134 ----------------------------- Got sig: 15, signum: '15', rc: 0, latestSignalRc: 143 Exited: 143 I settled on catching 1) SIGHUP 2) SIGINT 6) SIGABRT 15) SIGTERM since they (1) could be caught, and (2) caused the script to exit. Investigated the processes involved At first I simply grep'ed the processes using derived values node('agent') { sh '''#!/bin/bash set -euo pipefail set +x function onExit() { local exitCode=\$? echo ">>>> Exiting now with exitCode \$exitCode" echo ">>>> Could place exitCode in \$myResultFile" echo "one last log" >> "\$myDir/jenkins-log.txt" echo \$exitCode > \$myResultFile sleep 5 echo "Still running..." exit \$exitCode } trap onExit EXIT myPid=\$\$ myDir="\$( cd "\$( dirname "\${BASH_SOURCE[0]}" )" && pwd )" myResultFile="\$myDir/jenkins-result.txt" myResultFileGrep="\$myDir/jenkins-result.tx[t]" echo "my pid is \$myPid" echo "my result file is \$myResultFile" echo '----------------------------------' ps -eaf | head -n 1 echo '--------- script PID -------------' ps -eaf | grep [d]urable | grep \$myPid echo '--------- script result file -------------' myCmdsParentPid=\$(ps aux | grep "\$myResultFileGrep" | awk '{ print \$3 }' | sort -u ) ps -eaf | grep "\$myResultFileGrep" echo '----------- script commands parent pid -----------' ps -eaf | grep "[j]enkins.*\$myCmdsParentPid" echo '---------------------------------' echo "hello from \$(hostname)" ''' } Resulting in something like: ... my pid is 10781 my result file is /var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-result.txt UID PID PPID C STIME TTY TIME CMD --------- script PID ------------- jenkins 10781 10778 0 14:02 ? 00:00:00 /bin/bash /var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/script.sh --------- script result file ------------- jenkins 10778 12979 0 14:02 ? 00:00:00 sh -c { while [ -d '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0' -a \! -f '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-result.txt' ]; do touch '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-log.txt'; sleep 3; done } & jsc=durable-14f8a02757bd1625e0536d94affe2a93; JENKINS_SERVER_COOKIE=$jsc '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/script.sh' > '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-log.txt' 2>&1; echo $? > '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-result.txt'; wait jenkins 10780 10778 0 14:02 ? 00:00:00 sh -c { while [ -d '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0' -a \! -f '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-result.txt' ]; do touch '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-log.txt'; sleep 3; done } & jsc=durable-14f8a02757bd1625e0536d94affe2a93; JENKINS_SERVER_COOKIE=$jsc '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/script.sh' > '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-log.txt' 2>&1; echo $? > '/var/lib/jenkins/workspace/EXP-signal-tester@tmp/durable-e19db7e0/jenkins-result.txt'; wait ... Question: I still do not know why there are two instances of the same command " sh -c ...."   I used a recursePid function to follow the trail and realised that the second sh -c ... process was actually a sibling of the script.sh process rather a parent. function recursePid() { local currentPid=$1 local pidEntry=$(ps --no-headers -f --pid $currentPid) pidParent=$(ps --no-headers -o ppid --pid $currentPid | xargs) echo "--------- Current: $currentPid-------------" echo "$pidEntry" if [ $pidParent -ne 1 ]; then local pidSiblings=$(ps --no-headers -f --ppid $pidParent | grep -v $currentPid) if [ -n "$pidSiblings" ]; then echo "--------- has following siblings -------------" echo "$pidSiblings" pidSibling=$(echo "$pidSiblings" | awk '{ print $2 }') echo "Siblings pid = $pidSibling" fi echo "Sending signal to parent ${SIGNAL}" sleep 3 kill -${SIGNAL:-0} $pidParent sleep 3 echo "Parent killed..." if ps --no-headers -$pidParent; then echo "Parent still there" else echo "Parent gone" fi #recursePid $pidParent fi } So, the process tree looks like: slave.jar process |__ parent "sh -c ..." process |__ sibling "sh -c ..." process |__ "script.sh" process Found I could reproduce the behaviour by killing the parent process I tested sending various signals to the script.sh and sibling sh -c ... processes but: script.sh - worked as expected sibling sh -c ... didn't seem to have an effect. Moving up one level, I sent the TERM signal to the parent which causes the agent to hang. In the build log... 14:37:15 Single pid = 90 14:37:15 Sending signal to parent TERM 14:37:21 So far, I've got signal: 'EXIT', signum: 'n/a', exitCode: '0', latestSignalRc: 'n/a' 14:37:21 >>>> Exiting now with exitCode 0 14:37:21 >>>> Could place exitCode in /home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-result.txt 14:37:21 I am placing this log directly into the log file... <spinning-wheel> On the agent only the sibling "sh -c ..." process remains... jenkins@cd8c03e15e58:~$ ps aux | grep "[s]h -c" jenkins 90 0.0 0.0 4512 928 ? S 14:37 0:00 sh -c { while [ -d '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a' -a \! -f '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-result.txt' ]; do touch '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-log.txt'; sleep 3; done } & jsc=durable-190a2420fb163bce1cd2a8d2213b499c; JENKINS_SERVER_COOKIE=$jsc '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/script.sh' > '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-log.txt' 2>&1; echo $? > '/home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-result.txt'; wait The sibling process is touching the jenkins-log.txt every 3 seconds meaning Jenkins can't determine that it is hung. Entering a exit code into jenkins-result.txt causes the build to continue (see the timestamps) 14:37:21 >>>> Could place exitCode in /home/jenkins/workspace/EXP-signal-tester@tmp/durable-889afc3a/jenkins-result.txt 14:37:21 I am placing this log directly into the log file... ... placed the exit code into the jenkins-result.txt now ... [Pipeline] } [Pipeline] // node [Pipeline] node 14:44:14 Running on Jenkins in /var/lib/jenkins/workspace/EXP-signal-test [Pipeline] { [Pipeline] tool [Pipeline] retry [Pipeline] { [Pipeline] node 14:44:14 Running on Jenkins in /var/lib/jenkins/workspace/EXP-signal-tester@2 [Pipeline] { [Pipeline] tool [Pipeline] sh Scripted a workaround for the case that the parent is missing I experimented further by catching and resending the signal as with the examples above. However, some signals such as TERM did not allow the script to write it's exit code to the jenkins-result.txt file. So, with the knowledge that we can catch the common signals in the script which could cause the script to exit we can determine the parent pid and its status I scripted the following to echo the 'would-be' exit code into the appropriate jenkins-result.txt in the case of the parent process being killed. The best way to explain how it works is by using an example. Please check the following Jenkinsfile job: def bashPreText(def script, def quiet = false, def login = false) { String verboseFlag = quiet ? '' : 'set -x' String loginFlag = login ? '-l' : '' return '''#!/bin/bash ''' + loginFlag + ''' set -euo pipefail # fail fast and fail on unset variables # don't print the pre-text stuff set +x # some shell options shopt -s globstar shopt -s expand_aliases # traps typeset -i sig=1 for sig in 1 2 6 15; do trap "signum=\${sig};handleSignal \${sig}" "\$sig" done trap "handleSignal EXIT" "EXIT" trap "handleSignal ERR" "ERR" function handleSignal() { local exitCode=\$? local signal=\$1 if [[ "\$signal" != "EXIT" ]] && [[ "\$signal" != "ERR" ]]; then # TODO: account for non-bash exit codes (ksh = signum + 256) latestSignalRc=\$(( \$signum + 128 )) fi local finalExitCode=\${latestSignalRc:-\$exitCode} echo "SIGNAL STATUS: signal: '\${signal:-n/a}' signum: '\${signum:-n/a}' exitCode: '\$exitCode' latestSignalRc: '\${latestSignalRc:-n/a}' finalExitCode: '\${finalExitCode}'" # don't trap EXIT if already in ERR [[ "\$signal" != "EXIT" ]] && trap - EXIT # React if no parent found local currentPidParent= currentPidParent=$(ps --no-headers -o ppid --pid \$myPid | xargs) if [ \$pidParent -ne \$currentPidParent ]; then echo "WARNING: Parent process missing..." if [[ "true" == "\$ACTIVATE_WORKAROUND" ]]; then echo "Activating workaround - writing the exitCode directly into the '\$myResultFile'." echo \$finalExitCode > \$myResultFile else echo "Not activating workaround - script has probably hung by now. Fix by aborting or by writing the exitCode directly into the '\$myResultFile'." fi fi exit \$finalExitCode } myPid=\$\$ pidParent=\$(ps --no-headers -o ppid --pid \$myPid | xargs) myDir="\$( cd "\$( dirname "\${BASH_SOURCE[0]}" )" && pwd )" myResultFile="\$myDir/jenkins-result.txt" # verbose flag ''' + verboseFlag + ''' ''' + script + ''' '''.trim().stripIndent() } def bash(Map vars = [:]) { vars.script = bashPreText(vars.script, vars.quiet, vars.login) sh(vars) } /* Convenience overload */ def bash(String script) { return bash(script: script) } pipeline { agent any options { skipDefaultCheckout() timestamps() disableConcurrentBuilds() buildDiscarder(logRotator(numToKeepStr:'30')) } parameters { booleanParam(defaultValue: true, description: 'Activate the workaround', name: 'ACTIVATE_WORKAROUND') string(defaultValue: '', description: 'The signal to send to the SCRIPT (int or HUP, TERM, etc)', name: 'SIGNAL_SCRIPT') string(defaultValue: 'TERM', description: 'The signal to send to the PARENT (int or HUP, TERM, etc)', name: 'SIGNAL_PARENT') } stages { stage('Test') { steps { script { bash quiet: true, script: ''' echo "Starting script with... myPid=\$myPid pidParent=\$pidParent myDir="\$myDir" myResultFile="\$myResultFile" " if [ -n "\${SIGNAL_PARENT:-}" ]; then echo "Sending signal \${SIGNAL_PARENT} to parent" sleep 0.1 kill -\${SIGNAL_PARENT:-0} \$pidParent sleep 0.1 fi echo "Middle of script..." if [ -n "\${SIGNAL_SCRIPT:-}" ]; then if [[ "ERR" == "\${SIGNAL_SCRIPT}" ]]; then echo "Failing with an ERR" ls /bla/bla/bla else echo "Sending signal \${SIGNAL_SCRIPT} to script" kill -\${SIGNAL_SCRIPT:-0} \$myPid fi fi echo "End of script..." ''' } } } } } Final Workaround The final workaround for me was to put the traps and handleSignal function as a type of pretext in a vars/bash.groovy as in the job above ( NOTE : don't forget to remove the ACTIVATE_WORKAROUND == true condition) and using it in my global library. Hope this maybe helps find a solution to the problem though rather just a hacky than a workaround .

          One thing I noted was that the first part of the script in our case was executed in one workspace and the command that was hanging, was executed in another workzpace, with the @tmp suffix. Since the @tmp workspace didn't contain the right stuff the comand in the shell script failed

           

           

          Robin Rosenberg added a comment - One thing I noted was that the first part of the script in our case was executed in one workspace and the command that was hanging, was executed in another workzpace, with the @tmp suffix. Since the @tmp workspace didn't contain the right stuff the comand in the shell script failed    

          Jesse Glick added a comment -

          Skimming this, sounds like it could be a dupe of JENKINS-50892. Needs to be determined if there is a non-contrived way to reproduce part of the controller script being killed but not the rest of it; and, either way, whether there is a safe way to ensure that the controller script lives or dies atomically. It seems that use of { rather than ( does not suffice to avoid creation of a cloned sh process.

          Jesse Glick added a comment - Skimming this, sounds like it could be a dupe of JENKINS-50892 . Needs to be determined if there is a non-contrived way to reproduce part of the controller script being killed but not the rest of it; and, either way, whether there is a safe way to ensure that the controller script lives or dies atomically. It seems that use of { rather than ( does not suffice to avoid creation of a cloned sh process.

          Jesse Glick added a comment -

          Most likely a dupe. Please use JENKINS-50892 for discussion.

          lostinberlin for some background: the second copy of sh is the stuff inside curly braces which is touching the log file. (That is a way for Jenkins to tell the difference between a process which just declines to produce output for a long time, as opposed to the whole computer having been rebooted and all these processes are dead.)

          As to why the first copy of sh is getting killed to begin with, your guess is as good as mine. You suggested that low resources in a container could trigger some processes to be killed, but why one and not the other?

          Jesse Glick added a comment - Most likely a dupe. Please use JENKINS-50892 for discussion. lostinberlin for some background: the second copy of sh is the stuff inside curly braces which is touching the log file. (That is a way for Jenkins to tell the difference between a process which just declines to produce output for a long time, as opposed to the whole computer having been rebooted and all these processes are dead.) As to why the first copy of sh is getting killed to begin with, your guess is as good as mine. You suggested that low resources in a container could trigger some processes to be killed, but why one and not the other?

            Unassigned Unassigned
            jstancek Jan Stancek
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: