Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50379

Jenkins kills long running sh script with no output

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • durable-task-plugin
    • None
    • Jenkins ver. 2.107.1 on CentOS 7

    Description

      I have a Jenkins pipeline that runs a shell script that takes about 5 minutes and generates no output. The job fails and I'm seeing the following in the output:

      wrapper script does not seem to be touching the log file in /home/jenkins/workspace/job_Pipeline@2@tmp/durable-595950a5
       (--JENKINS-48300--: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)
       script returned exit code -1
      

      Based on JENKINS-48300 it seems that Jenkins is intentionally killing my script while it is still running. IMHO it is a bug for Jenkins to assume that a shell script will generate output every n seconds for any finite n. As a workaround I've set -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL to one hour. But what happens when I have a script that takes an hour and one minute!?

      Attachments

        Issue Links

          Activity

            jglick Jesse Glick added a comment -

            I should have mentioned that JENKINS-25503 would completely reimplement the code involved here, possibly solving this issue (possibly introducing others).

            jglick Jesse Glick added a comment - I should have mentioned that JENKINS-25503 would completely reimplement the code involved here, possibly solving this issue (possibly introducing others).
            nfalco Nikolas Falco added a comment - - edited

            We have the same issue, during the JS build job execute a "ng build" command and the job after 32 minutes is killed because seems to not respond.

            Cannot contact Node 02: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@70fad4d7:JNLP4-connect connection from prd-cm-as-09.lan/10.1.3.72:56702": Remote call on JNLP4-connect connection from prd-cm-as-09.lan/10.1.3.72:56702 failed. The channel is closing down or has closed down
            wrapper script does not seem to be touching the log file in /var/lib/jenkins/workspace/xxx@tmp/durable-476d6be2
            (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
            nfalco Nikolas Falco added a comment - - edited We have the same issue, during the JS build job execute a "ng build" command and the job after 32 minutes is killed because seems to not respond. Cannot contact Node 02: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@70fad4d7:JNLP4-connect connection from prd-cm-as-09.lan/10.1.3.72:56702": Remote call on JNLP4-connect connection from prd-cm-as-09.lan/10.1.3.72:56702 failed. The channel is closing down or has closed down wrapper script does not seem to be touching the log file in /var/lib/jenkins/workspace/xxx@tmp/durable-476d6be2 (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)

            For the record, I was affected by this issue a while ago and in my case, I was running Jenkins agents on k8s, and increasing the pod memory limit solve it at least in my case.

            olblak Olivier Vernin added a comment - For the record, I was affected by this issue a while ago and in my case, I was running Jenkins agents on k8s, and increasing the pod memory limit solve it at least in my case.
            mmh19891113 bright.ma added a comment -

            I met this issue.

            [2021-05-25T13:42:16.469Z] wrapper script does not seem to be touching the log file in @tmp/durable-c284507c
            
            [2021-05-25T13:42:16.469Z] (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
            

            the reason is " No space left on device"

            mmh19891113 bright.ma added a comment - I met this issue. [2021-05-25T13:42:16.469Z] wrapper script does not seem to be touching the log file in @tmp/durable-c284507c [2021-05-25T13:42:16.469Z] (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) the reason is " No space left on device"
            mdebord1 Matt Dee added a comment - - edited

            Running into this. One of my playbooks will restart Jenkins service if it needs to reload init.groovy.d scripts. One Jenkins comes back the job fails with this error. This was working fine for months then stopped working with this same error. 

            • Plenty of memory and space on device. 
            • Durable task plugin is fully up to date.
            • Jenkins 2.360
            mdebord1 Matt Dee added a comment - - edited Running into this. One of my playbooks will restart Jenkins service if it needs to reload init.groovy.d scripts. One Jenkins comes back the job fails with this error. This was working fine for months then stopped working with this same error.  Plenty of memory and space on device.  Durable task plugin is fully up to date. Jenkins 2.360

            People

              Unassigned Unassigned
              evanward1 Evan Ward
              Votes:
              14 Vote for this issue
              Watchers:
              33 Start watching this issue

              Dates

                Created:
                Updated: