Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55308

intermittent "terminated" messages using sh in Pipelines

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None

      Testing Jenkins 2.138.2 LTS, Jenkins pipelines that use sh intermittently throw the following message in the console log …

       

      sh: line 1:  4449 Terminated              sleep 3

       

       … and sometimes this … 

       

      sh: line 1: 13136 Terminated              { while [ ( -d /proc/$pid -o ! -d /proc/$$ ) -a -d '/home/ec2-user/workspace/admin-smoke-test@tmp/durable-523481b0' -a ! -f '/home/ec2-user/workspace/admin-smoke-test@tmp/durable-523481b0/jenkins-result.txt' ]; do    touch '/home/ec2-user/workspace/admin-smoke-test@tmp/durable-523481b0/jenkins-log.txt'; sleep 3;{{done; }}}

       

      Jenkins master runs from a Docker image based on jenkins/jenkins:2.138.2-alpine with specific plugins baked into the image by /usr/local/bin/install-plugins.sh

      The message originates in durable-task-plugin, which must be a dependency of one of the plugins.txt plugins.  

      Two important observations:

      1) The issue does not occur when starting with the base jenkins/jenkins:2.138.2-alpine image and manually installing plugins via UI. That might suggest the issue is around how install-plugins.sh installs plugins and/or dependencies. 

      2) The issue does not occur on our production image, which is also 2.138.2-alpine + plugins built 2018-10-11. Rebuilding the the same image from the same Dockerfile results in different installed plugins. Makes me think results using install-plugins.sh are not deterministic.

          [JENKINS-55308] intermittent "terminated" messages using sh in Pipelines

          Jonas Lindström added a comment - - edited

          We run the jenkins-alpine docker image and we also get these messages. I tried switching to the Debian image (2.150.2) and the intermittent "terminated" message are still there, so I don't think it's specifically related to the OS.

          This seems to have been introduced after v1.22 of durable-task-plugin. I guess the error is lurking somewhere in this line in BourneShellScript.java:

                      cmd = String.format("pid=$$; { while [ \\( -d /proc/$pid -o \\! -d /proc/$$ \\) -a -d '%s' -a \\! -f '%s' ]; do touch '%s'; sleep 3; done } & jsc=%s; %s=$jsc %s '%s' > '%s' 2> '%s'; echo $? > '%s.tmp'; mv '%s.tmp' '%s'; wait", 
          ... more args ...

          It's difficult to downgrade to 1.22 since so many other plugins depend on later versions of this plugin.

          Jonas Lindström added a comment - - edited We run the jenkins-alpine docker image and we also get these messages. I tried switching to the Debian image (2.150.2) and the intermittent "terminated" message are still there, so I don't think it's specifically related to the OS. This seems to have been introduced after v1.22 of durable-task-plugin. I guess the error is lurking somewhere in this line in BourneShellScript.java: cmd = String .format( "pid=$$; { while [ \\( -d /proc/$pid -o \\! -d /proc/$$ \\) -a -d '%s' -a \\! -f '%s' ]; do touch '%s' ; sleep 3; done } & jsc=%s; %s=$jsc %s '%s' > '%s' 2> '%s' ; echo $? > '%s.tmp' ; mv '%s.tmp' '%s' ; wait" , ... more args ... It's difficult to downgrade to 1.22 since so many other plugins depend on later versions of this plugin.

          Jean-Paul G added a comment - - edited

          Not specific to docker.

          Have the very same issue when running sh step with parallel, on ssh slave with Jenkins 2.156

          Jean-Paul G added a comment - - edited Not specific to docker. Have the very same issue when running sh step with parallel, on ssh slave with Jenkins 2.156

          Adam vonNieda added a comment -

          I've got the same issue when running sh step without parallel, on ssh slave with Jenkins 2.150.2 LTS, I'm not running on docker.

          Adam vonNieda added a comment - I've got the same issue when running sh step without parallel, on ssh slave with Jenkins 2.150.2 LTS, I'm not running on docker.

          Connor Yu added a comment -

          I am seeing this as well. Upgraded to 2.46.3 yesterday and started seeing this in my simple pipeline. 

          sh: line 1: 27323 Terminated              sleep 3
          

           

          Connor Yu added a comment - I am seeing this as well. Upgraded to 2.46.3 yesterday and started seeing this in my simple pipeline.  sh: line 1: 27323 Terminated sleep 3  

          Karl Parry added a comment -

          I am seeing this as well. We have just seen it randomly. We do get it when we run a custom clean up script that deletes the @tmp/@script/@script@tmp folders 

          sh: line 1: 27323 Terminated              sleep 3

          Karl Parry added a comment - I am seeing this as well. We have just seen it randomly. We do get it when we run a custom clean up script that deletes the @tmp/@script/@script@tmp folders  sh: line 1: 27323 Terminated sleep 3

          Matthew Hall added a comment -

          Also seeing this. Jenkins 2.150.3 and durable-task 1.29, which was an upgrade from 1.25.

          Matthew Hall added a comment - Also seeing this. Jenkins 2.150.3 and durable-task 1.29, which was an upgrade from 1.25.

          Mike Kozell added a comment -

          I upgraded Pipeline: Job (workflow-job) plugin from 2.25 to 2.31 and now see the issue.  I'm running durable-task 1.27.

          Mike Kozell added a comment - I upgraded Pipeline: Job (workflow-job) plugin from 2.25 to 2.31 and now see the issue.  I'm running durable-task 1.27.

          Devin Nusbaum added a comment -

          The sleep 3 is part of the execution of a shell step. A background process touches a specific file on the agent every 3 seconds, and the Jenkins master checks the timestamp on that file as a proxy to know whether the script is still running or not. It seems based on the reports here that something is causing that process to be killed on some systems, but I don't have any ideas about what it could be offhand.

          Does anyone have a Pipeline that can reproduce the issue consistently? If so, please add it here (simplifying as much as possible0 and specify the agent OS and sh implementation (e.g., dash, bash) for the agents running the sh steps.

          Devin Nusbaum added a comment - The sleep 3 is part of the execution of a shell step. A background process touches a specific file on the agent every 3 seconds, and the Jenkins master checks the timestamp on that file as a proxy to know whether the script is still running or not. It seems based on the reports here that something is causing that process to be killed on some systems, but I don't have any ideas about what it could be offhand. Does anyone have a Pipeline that can reproduce the issue consistently? If so, please add it here (simplifying as much as possible0 and specify the agent OS and sh implementation (e.g., dash, bash) for the agents running the sh steps.

          Devin Nusbaum added a comment -

          mayqueen Thanks for posting your Pipeline! Here are some further questions to try to help narrow down what could be happening:

          • Just to confirm, your agents are running OS X?
          • You say you see the issue constantly, but can you be more specific (do you see the issue every other build, every 10th build, ...)?
          • Approximately how many parallel tasks are you running at once (10s, 100s, ...)?
          • How resource-intensive is your shell script (is it possibly triggering memory or CPU limits that would cause the Kernel to start killing processes)?
          • How are the agents connected to the Jenkins master (SSH, JNLP), and are they static machines or are they dynamically provisioned by the Jenkins master through something like the Docker Swarm or EC2 plugins?
          • If they are static machines, are they running anything else or are they only used as Jenkins agents? Assuming they are static, I would see if there are any particular agents where the shell script seems to be getting terminated consistently based on previous failed, and log onto those machines and check general system logs and running processes to see if there is anything that looks like it might be responsible for killing the process.

          Devin Nusbaum added a comment - mayqueen Thanks for posting your Pipeline! Here are some further questions to try to help narrow down what could be happening: Just to confirm, your agents are running OS X? You say you see the issue constantly, but can you be more specific (do you see the issue every other build, every 10th build, ...)? Approximately how many parallel tasks are you running at once (10s, 100s, ...)? How resource-intensive is your shell script (is it possibly triggering memory or CPU limits that would cause the Kernel to start killing processes)? How are the agents connected to the Jenkins master (SSH, JNLP), and are they static machines or are they dynamically provisioned by the Jenkins master through something like the Docker Swarm or EC2 plugins? If they are static machines, are they running anything else or are they only used as Jenkins agents? Assuming they are static, I would see if there are any particular agents where the shell script seems to be getting terminated consistently based on previous failed, and log onto those machines and check general system logs and running processes to see if there is anything that looks like it might be responsible for killing the process.

          Natalia Natalia added a comment - - edited

          dnusbaum

          • My agents are Apple nodes, MacMinis
          • every run
          • I have around 20 nodes gathered in one label and it's just one same task for all of the nodes
          • shell code exactly like the above - keeping only sh """ """" produces the error in every run; so I would say it doesn't matter at all
          • agents are connected through ssh, they are static physical machines in the racks
          • they are only used as Jenkins agents

          I'm not sure if you are getting the problem - the shell code is executed, basically it all works as expected. The only 'not-working' part for me is the noise in the log. It started when we upgraded core and plugins to the newest version.

          EDIT: I also have other nodes provisioned in the cloud (manually), connected through ssh, just tested on these and it's the same issue.

          # cat /etc/redhat-release
          CentOS Linux release 7.6.1810 (Core)
          $ /bin/bash --version
          GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
          Copyright (C) 2011 Free Software Foundation, Inc.
          License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
          
          This is free software; you are free to change and redistribute it.
          There is NO WARRANTY, to the extent permitted by law.
          

          Natalia Natalia added a comment - - edited dnusbaum My agents are Apple nodes, MacMinis every run I have around 20 nodes gathered in one label and it's just one same task for all of the nodes shell code exactly like the above - keeping only sh """ """" produces the error in every run; so I would say it doesn't matter at all agents are connected through ssh, they are static physical machines in the racks they are only used as Jenkins agents I'm not sure if you are getting the problem - the shell code is executed, basically it all works as expected. The only 'not-working' part for me is the noise in the log. It started when we upgraded core and plugins to the newest version. EDIT: I also have other nodes provisioned in the cloud (manually), connected through ssh, just tested on these and it's the same issue. # cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) $ /bin/bash --version GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu) Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http: //gnu.org/licenses/gpl.html> This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

          Devin Nusbaum added a comment -

          the shell code is executed, basically it all works as expected. The only 'not-working' part for me is the noise in the log.

          Thanks for clarifying, I didn't realize that things were working apart from the log message and that you were able to reproduce the issue by literally running a blank sh step. I'll try to create a unit test that can reproduce the problem based on your info when I have some time.

          Devin Nusbaum added a comment - the shell code is executed, basically it all works as expected. The only 'not-working' part for me is the noise in the log. Thanks for clarifying, I didn't realize that things were working apart from the log message and that you were able to reproduce the issue by literally running a blank sh step. I'll try to create a unit test that can reproduce the problem based on your info when I have some time.

          I see this error on Jenkins 2.164.1

          One my pipeline job started failing today with unclear reason. 

          Both successful jobs and failed jobs contain "Terminated"  in console logs

          Vladimir Eremeev added a comment - I see this error on Jenkins 2.164.1 One my pipeline job started failing today with unclear reason.  Both successful jobs and failed jobs contain "Terminated"  in console logs

          hi,

          I am also having the same problem where the console log is contains the "sh: line 1:  xxx Terminated              sleep 3" message unwantedly.

          May I know how to get rid of this ?

          Jenkins version :  2.150

           

          Vennila Perumal added a comment - hi, I am also having the same problem where the console log is contains the " sh: line 1:  xxx Terminated              sleep 3" message unwantedly. May I know how to get rid of this ? Jenkins version :  2.150  

          Jonas Lindström added a comment - - edited

          mkozell was on to something: the bug is not just in the durable-task-plugin, although the symptoms come from there. It is introduced when you upgrade workflow-job. I have managed to pinpoint it down to a specific version.

          The following setup does not trigger the bug (relevant plugins listed). All are at latest version at the time of writing, except workflow-job:

          • durable-task 1.29
          • workflow-api 2.33
          • workflow-durable-task-step 2.30
          • workflow-step-api 2.19
          • workflow-support 3.3
          • workflow-job 2.25

          Upgrading workflow-job to 2.27 or later triggers the bug. (2.26 does not exist.)

          Jonas Lindström added a comment - - edited mkozell was on to something: the bug is not just in the durable-task-plugin, although the symptoms come from there. It is introduced when you upgrade workflow-job. I have managed to pinpoint it down to a specific version. The following setup does not trigger the bug (relevant plugins listed). All are at latest version at the time of writing, except workflow-job: durable-task 1.29 workflow-api 2.33 workflow-durable-task-step 2.30 workflow-step-api 2.19 workflow-support 3.3 workflow-job 2.25 Upgrading workflow-job to 2.27 or later triggers the bug. (2.26 does not exist.)

          Avi Kivity added a comment -

          I am seeing the same problem, parallel jobs running on Linux agents.  I can share the pipeline code if it will help.

          Avi Kivity added a comment - I am seeing the same problem, parallel jobs running on Linux agents.  I can share the pipeline code if it will help.

          We see the same problem intermittently, in the afternoons only. Appeared without any known change in our Jenkins setup. Network/storage issues may produce the same error.

          Martin Wiklundh added a comment - We see the same problem intermittently, in the afternoons only. Appeared without any known change in our Jenkins setup. Network/storage issues may produce the same error.

          Kieron Kierzo added a comment -

          Hey,

          Had this issue on the my Mac build.

          My fix was to set the JAVA_HOME variable which was missing on the machine.

          I created and added this file on the mac : "~/.bash_profile"

          added this line....

          export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home

          rebooted the mac and its fixed it.

          your "jdk-11.0.2.jdk" folder may be different so just replace that with whatever is in there.

          if you run the command:

          echo ${JAVA_HOME}

           

          and its empty this could be the cause.

           

          Hope this helps others.

           

          Cheers Kieron.

           

           

          Kieron Kierzo added a comment - Hey, Had this issue on the my Mac build. My fix was to set the JAVA_HOME variable which was missing on the machine. I created and added this file on the mac : "~/.bash_profile" added this line.... export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home rebooted the mac and its fixed it. your "jdk-11.0.2.jdk" folder may be different so just replace that with whatever is in there. if you run the command: echo ${JAVA_HOME}   and its empty this could be the cause.   Hope this helps others.   Cheers Kieron.    

          Jakov Sosic added a comment -

          It seems that adding $JAVA_HOME in Global properties indeed fixes this!

          Since I run CentOS 7, this is what I've added:

          Name: JAVA_HOME

          Value: /usr/lib/jvm/jre-1.8.0-openjdk

          Jakov Sosic added a comment - It seems that adding $JAVA_HOME in Global properties indeed fixes this! Since I run CentOS 7, this is what I've added: Name: JAVA_HOME Value: /usr/lib/jvm/jre-1.8.0-openjdk

          Cuong Tran added a comment -

          Was this property added to the agent or the master?

          Cuong Tran added a comment - Was this property added to the agent or the master?

          Wen Zhou added a comment -

          if the bug is really introduced by the workflow-job plugin, can we expect a fix in the new release?

           

          Wen Zhou added a comment - if the bug is really introduced by the workflow-job plugin, can we expect a fix in the new release?  

          Pablo Rodriguez added a comment - - edited

          The issue could be reproduced by running this simple Pipeline:

          Pipeline script
          pipeline {
           agent { label 'nodelinux' }
            stages {
             stage('build') {
              steps {
               echo "Hello World!"
               sh "echo Hello from the shell"
          }
          }
          }
          }
          

          Console output:

          [Pipeline] Start of Pipeline
          [Pipeline] node
          [Pipeline] {
          [Pipeline] stage
          [Pipeline] { (build)
          [Pipeline] echo
          Hello World!
          [Pipeline] sh
          + echo Hello from the shell
          Hello from the shell
          [Pipeline] }
          [Pipeline] // stage
          [Pipeline] }
          [Pipeline] // node
          [Pipeline] End of Pipeline
          sh: line 1: 18285 Terminated sleep 3
          Finished: SUCCESS
          

          Environment:

          • CloudBees Jenkins Platform - Master 2.176.2.3

          • workflow-job:2.32 'Pipeline: Job'

          It started happening after workflow-job was updated from 2.25 to 2.32

          Pablo Rodriguez added a comment - - edited The issue could be reproduced by running this simple Pipeline: Pipeline script pipeline { agent { label 'nodelinux' } stages { stage( 'build' ) { steps { echo "Hello World!" sh "echo Hello from the shell" } } } } Console output: [Pipeline] Start of Pipeline [Pipeline] node [Pipeline] { [Pipeline] stage [Pipeline] { (build) [Pipeline] echo Hello World! [Pipeline] sh + echo Hello from the shell Hello from the shell [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline sh: line 1: 18285 Terminated sleep 3 Finished: SUCCESS Environment: CloudBees Jenkins Platform - Master  2.176.2.3 workflow-job:2.32 'Pipeline: Job' It started happening after workflow-job was updated from 2.25 to 2.32

          Carroll Chiou added a comment - - edited

          The sleep 3 process was introduced when the heartbeat check feature was added in version 1.16 of durable-task, which translates to 2.18 of workflow-durable-task-step, which translates to 2.27 of workflow-job.

          Unless I missed a comment, it looks like the pipeline behaves as expected outside of "Terminated sleep 3" line.

          At least in parogui’s example, this might be a race condition between the sleep 3 process and when the script completes. Like dnusbaum mentioned earlier, the sleep 3 process is used to touch the output log file to show the script is still alive. However, when the script completes, it will write to a separate result file. A watcher service is checking for that result file every 100ms. Once that result file is found, results are transmitted and everything related to that specific step’s workspace is purged. It might be possible that the output file gets cleaned up right after the sleep 3 process checks if the file still exists, but before it gets touched again?

          There is a new release of durable-task (1.31) that removes the sleep 3 process so this line won't pop up anymore.

          Update: I have not been able to reproduce this issue, so I can't say for certain if this issue is resolved. Technically, it should, but it could be possible that the new version just changes the behavior of this bug.

          Carroll Chiou added a comment - - edited The sleep 3 process was introduced when the heartbeat check feature was added in version 1.16 of durable-task, which translates to 2.18 of workflow-durable-task-step, which translates to 2.27 of workflow-job. Unless I missed a comment, it looks like the pipeline behaves as expected outside of "Terminated sleep 3" line. At least in parogui ’s example , this might be a race condition between the sleep 3 process and when the script completes. Like dnusbaum mentioned earlier , the sleep 3 process is used to touch the output log file to show the script is still alive. However, when the script completes, it will write to a separate result file. A watcher service is checking for that result file every 100ms. Once that result file is found, results are transmitted and everything related to that specific step’s workspace is purged. It might be possible that the output file gets cleaned up right after the sleep 3 process checks if the file still exists, but before it gets touched again? There is a new release of durable-task (1.31) that removes the sleep 3 process so this line won't pop up anymore. Update: I have not been able to reproduce this issue, so I can't say for certain if this issue is resolved. Technically, it should, but it could be possible that the new version just changes the behavior of this bug.

          A Alvarez added a comment -

          Can confirm that after upgrading durable-task to the latest version, the Terminated messages are gone from our jobs.

          A Alvarez added a comment - Can confirm that after upgrading durable-task to the latest version, the Terminated messages are gone from our jobs.

          Carroll Chiou added a comment -

          aal I'm curious, if you are running 1.33, are you passing in the FORCE_BINARY_WRAPPER=true system property?

          Carroll Chiou added a comment - aal I'm curious, if you are running 1.33, are you passing in the FORCE_BINARY_WRAPPER=true system property?

          A Alvarez added a comment -

          Hi carroll not as far as I'm aware although this could be a default or coming from another plugin. All we did was upgrading the plugin and the messages disappeared from the console output when running shell commands within Pipeline scripts

          A Alvarez added a comment - Hi carroll not as far as I'm aware although this could be a default or coming from another plugin. All we did was upgrading the plugin and the messages disappeared from the console output when running shell commands within Pipeline scripts

          Carroll Chiou added a comment -

          aal thanks, that's good to know

          Carroll Chiou added a comment - aal thanks, that's good to know

            carroll Carroll Chiou
            gc875 gc875
            Votes:
            24 Vote for this issue
            Watchers:
            44 Start watching this issue

              Created:
              Updated: