Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50379

Jenkins kills long running sh script with no output

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: durable-task-plugin
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.107.1 on CentOS 7
    • Similar Issues:

      Description

      I have a Jenkins pipeline that runs a shell script that takes about 5 minutes and generates no output. The job fails and I'm seeing the following in the output:

      wrapper script does not seem to be touching the log file in /home/jenkins/workspace/job_Pipeline@2@tmp/durable-595950a5
       (--JENKINS-48300--: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)
       script returned exit code -1
      

      Based on JENKINS-48300 it seems that Jenkins is intentionally killing my script while it is still running. IMHO it is a bug for Jenkins to assume that a shell script will generate output every n seconds for any finite n. As a workaround I've set -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL to one hour. But what happens when I have a script that takes an hour and one minute!?

        Attachments

          Issue Links

            Activity

            Hide
            jekeller Jacob Keller added a comment -

            > The problem is not that your script stops producing output for a while. That is perfectly normal and supported. The problem is that a side process which is supposed to be detecting this fact and touching the log file every three seconds is either not running, or not producing the right timestamp as observed by the Jenkins agent JVM.

            Right, so it sounds like we need to investigate why the side process that should be touching the log file isn't working properly.

            Show
            jekeller Jacob Keller added a comment - > The problem is not that your script stops producing output for a while. That is perfectly normal and supported. The problem is that a side process which is supposed to be detecting this fact and touching the log file every three seconds is either not running, or not producing the right timestamp as observed by the Jenkins agent JVM. Right, so it sounds like we need to investigate why the side process that should be touching the log file isn't working properly.
            Hide
            jglick Jesse Glick added a comment -

            I should have mentioned that JENKINS-25503 would completely reimplement the code involved here, possibly solving this issue (possibly introducing others).

            Show
            jglick Jesse Glick added a comment - I should have mentioned that JENKINS-25503 would completely reimplement the code involved here, possibly solving this issue (possibly introducing others).
            Hide
            nfalco Nikolas Falco added a comment - - edited

            We have the same issue, during the JS build job execute a "ng build" command and the job after 32 minutes is killed because seems to not respond.

            Cannot contact Node 02: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@70fad4d7:JNLP4-connect connection from prd-cm-as-09.lan/10.1.3.72:56702": Remote call on JNLP4-connect connection from prd-cm-as-09.lan/10.1.3.72:56702 failed. The channel is closing down or has closed down
            wrapper script does not seem to be touching the log file in /var/lib/jenkins/workspace/xxx@tmp/durable-476d6be2
            (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
            Show
            nfalco Nikolas Falco added a comment - - edited We have the same issue, during the JS build job execute a "ng build" command and the job after 32 minutes is killed because seems to not respond. Cannot contact Node 02: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@70fad4d7:JNLP4-connect connection from prd-cm-as-09.lan/10.1.3.72:56702": Remote call on JNLP4-connect connection from prd-cm-as-09.lan/10.1.3.72:56702 failed. The channel is closing down or has closed down wrapper script does not seem to be touching the log file in /var/lib/jenkins/workspace/xxx@tmp/durable-476d6be2 (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
            Hide
            olblak Olivier Vernin added a comment -

            For the record, I was affected by this issue a while ago and in my case, I was running Jenkins agents on k8s, and increasing the pod memory limit solve it at least in my case.

            Show
            olblak Olivier Vernin added a comment - For the record, I was affected by this issue a while ago and in my case, I was running Jenkins agents on k8s, and increasing the pod memory limit solve it at least in my case.
            Hide
            mmh19891113 bright.ma added a comment -

            I met this issue.

            [2021-05-25T13:42:16.469Z] wrapper script does not seem to be touching the log file in @tmp/durable-c284507c
            
            [2021-05-25T13:42:16.469Z] (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
            

            the reason is " No space left on device"

            Show
            mmh19891113 bright.ma added a comment - I met this issue. [2021-05-25T13:42:16.469Z] wrapper script does not seem to be touching the log file in @tmp/durable-c284507c [2021-05-25T13:42:16.469Z] (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) the reason is " No space left on device"

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              evanward1 Evan Ward
              Votes:
              14 Vote for this issue
              Watchers:
              32 Start watching this issue

                Dates

                Created:
                Updated: