Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50379

Jenkins kills long running sh script with no output

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • durable-task-plugin
    • None
    • Jenkins ver. 2.107.1 on CentOS 7

      I have a Jenkins pipeline that runs a shell script that takes about 5 minutes and generates no output. The job fails and I'm seeing the following in the output:

      wrapper script does not seem to be touching the log file in /home/jenkins/workspace/job_Pipeline@2@tmp/durable-595950a5
       (--JENKINS-48300--: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)
       script returned exit code -1
      

      Based on JENKINS-48300 it seems that Jenkins is intentionally killing my script while it is still running. IMHO it is a bug for Jenkins to assume that a shell script will generate output every n seconds for any finite n. As a workaround I've set -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL to one hour. But what happens when I have a script that takes an hour and one minute!?

          [JENKINS-50379] Jenkins kills long running sh script with no output

          Evan Ward created issue -

          Daniel Beck added a comment -

          Does it work when you echo whatever ; yourscript.sh instead of just the latter?

          Daniel Beck added a comment - Does it work when you echo whatever ; yourscript.sh instead of just the latter?

          Lee Webb added a comment -

          Travis does this sort of thing too, if there's no output for a while it just assumes the process is hung then stops the build.

          If you don't want to mess with Jenkins something like the following shell snippet can help.

          It forks your long running process & echo's dots to the console as long as it's still running:

          # suppress command output unless there is a failure
          function quiet() {
          if [[ $- =~ x ]]; then set +x; XTRACE=1; fi
          if [[ $- =~ e ]]; then set +e; ERREXIT=1; fi
          tmp=$(mktemp) || return # this will be the temp file w/ the output
          echo -ne "quiet running: ${@} "
          ts_elapsed=0
          ts_start=$(date +%s)
          "${@}" > "${tmp}" 2>&1 &
          cmd_pid=$!
          while [ 1 ]; do
          if [ `uname` == 'Linux' ]; then
          ps -q ${cmd_pid} > /dev/null 2>&1
          running=${?}
          else
          ps -ef ${cmd_pid} > /dev/null 2>&1
          running=${?}
          fi
          if [ "${running}" -eq 0 ]; then
          echo -ne '.'
          sleep 3
          continue
          fi
          break
          done
          wait ${cmd_pid}
          ret=${?}
          ts_end=$(date +%s)
          let "ts_elapsed = ${ts_end} - ${ts_start}"
          if [ "${ret}" -eq 0 ]; then
          echo -ne " finished with code ${ret} in ${ts_elapsed} secs, last lines were:\n"
          tail -n 4 "${tmp}"
          else
          cat "${tmp}"
          fi
          rm -f "${tmp}"
          if [ "${ERREXIT}" ]; then unset ERREXIT; set -e; fi
          if [ "${XTRACE}" ]; then unset XTRACE; set -x; fi
          return "${ret}" # return the exit status of the command
          }

           

          Lee Webb added a comment - Travis does this sort of thing too, if there's no output for a while it just assumes the process is hung then stops the build. If you don't want to mess with Jenkins something like the following shell snippet can help. It forks your long running process & echo's dots to the console as long as it's still running: # suppress command output unless there is a failure function quiet() { if [[ $- =~ x ]]; then set +x; XTRACE=1; fi if [[ $- =~ e ]]; then set +e; ERREXIT=1; fi tmp=$(mktemp) || return # this will be the temp file w/ the output echo -ne "quiet running: ${@} " ts_elapsed=0 ts_start=$(date +%s) "${@}" > "${tmp}" 2>&1 & cmd_pid=$! while [ 1 ]; do if [ `uname` == 'Linux' ]; then ps -q ${cmd_pid} > /dev/ null 2>&1 running=${?} else ps -ef ${cmd_pid} > /dev/ null 2>&1 running=${?} fi if [ "${running}" -eq 0 ]; then echo -ne '.' sleep 3 continue fi break done wait ${cmd_pid} ret=${?} ts_end=$(date +%s) let "ts_elapsed = ${ts_end} - ${ts_start}" if [ "${ret}" -eq 0 ]; then echo -ne " finished with code ${ret} in ${ts_elapsed} secs, last lines were:\n" tail -n 4 "${tmp}" else cat "${tmp}" fi rm -f "${tmp}" if [ "${ERREXIT}" ]; then unset ERREXIT; set -e; fi if [ "${XTRACE}" ]; then unset XTRACE; set -x; fi return "${ret}" # return the exit status of the command }  

          Evan Ward added a comment -

          danielbeck the script initially generates some output to show that it started and then generates no output for a long time. I think this has the same effect as your suggestion of using echo.

          Evan Ward added a comment - danielbeck the script initially generates some output to show that it started and then generates no output for a long time. I think this has the same effect as your suggestion of using echo.

          Daniel Beck added a comment -

          evanward1 I expect so. Thanks for the clarification.

          Daniel Beck added a comment - evanward1 I expect so. Thanks for the clarification.

          Jacob Keller added a comment -

          I see this issue on scripts which do generate some output, but it happens that parts of the script take some time to run: in my case I'm compiling a kernel module and even when the make output is sent to the console, sometimes individual steps take longer than the timeout...

          Jacob Keller added a comment - I see this issue on scripts which do generate some output, but it happens that parts of the script take some time to run: in my case I'm compiling a kernel module and even when the make output is sent to the console, sometimes individual steps take longer than the timeout...

          shreedhara H added a comment -

          Hi evanward1,
          We are also facing the same issue, can you please help us to know how to change Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL interval.

          shreedhara H added a comment - Hi evanward1 , We are also facing the same issue, can you please help us to know how to change Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL interval.

          Evan Ward added a comment -

          Set it in the JVM arg line on master.

          Evan Ward added a comment - Set it in the JVM arg line on master.

          Max Ivanch added a comment -

          Hi there,

           

          Same issue I run task with high load disk tasks. I put durable plugin in "none" option, I tried HEARTBEAT_CHECK_INTERVAL but it doesn't work for me. To have solution I have created additional mount point in jenkins slave. But IMHO I would prefer to have option to disable it at all.

          Max Ivanch added a comment - Hi there,   Same issue I run task with high load disk tasks. I put durable plugin in "none" option, I tried HEARTBEAT_CHECK_INTERVAL but it doesn't work for me. To have solution I have created additional mount point in jenkins slave. But IMHO I would prefer to have option to disable it at all.

          lei rou added a comment -

          I have the same problem some times, and how to change the HEARTBEAT_CHECK_INTERVAL?

          lei rou added a comment - I have the same problem some times, and how to change the HEARTBEAT_CHECK_INTERVAL?

            Unassigned Unassigned
            evanward1 Evan Ward
            Votes:
            16 Vote for this issue
            Watchers:
            35 Start watching this issue

              Created:
              Updated: