Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48300

Pipeline shell step aborts prematurely with ERROR: script returned exit code -1

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • durable-task-plugin
    • None
    • durable-task 1.26

      A few of my Jenkins pipelines failed last night with this failure mode:

      01:19:19 Running on blackbox-slave2 in /var/tmp/jenkins_slaves/jenkins-regression/path/to/workspace.   [Note: this is an SSH slave]
      [Pipeline] {
      [Pipeline] ws
      01:19:19 Running in /net/nas.delphix.com/nas/regression-run-workspace/jenkins-regression/workspace@10. [Note: This is an NFS share on a NAS]nd they shouldn't take down Jenkins jobs when they do. Our Jenkins jobs used to just hang when there was a NFS outage, now the script liveness check kills the job. I view this as a regression. As flawed
      [Pipeline] {
      [Pipeline] sh
      01:20:10 [qa-gate] Running shell script
      [... script output ...]
      01:27:19 Running test_create_domain at 2017-11-29 01:27:18.887531... 
      [Pipeline] // dir
      [Pipeline] }
      [Pipeline] // ws
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] }
      [Pipeline] // timestamps
      [Pipeline] }
      [Pipeline] // timeout
      ERROR: script returned exit code -1
      Finished: FAILURE
      

      As far as I can tell the script was running fine, but apparently Jenkins killed it prematurely because Jenkins didn't think the process was still alive.

      The interesting thing is that this is normally working, but failed last night at exactly the same time in multiple pipeline jobs. And I only started seeing this after upgrading durable-task-plugin from 1.14 to 1.17. I looked at the code change and saw that the main change has been the change in ProcessLiveness from using a ps-based system to a timestamp-based system. What I suspect is that the NFS server on which this workspace is hosted wasn't processing I/O operations fast enough at the time this problem occurred, so the timestamp wasn't updated even though the script continued running. Note that I am not using Docker here, this is just a regular SSH slave.

      The ps-based approach may have been suboptimal, but it was more reliable for us than the new timestamp-based approach, at least when using NFS-based workspaces. Expecting a timestamp to increase on a file every 15 seconds may be a tall order for some system and network administrators, especially over NFS – network issues can and do happen, and they shouldn't take down Jenkins jobs when they do. Our Jenkins jobs used to just hang when there was a NFS outage, now the script liveness check kills the job. I view this as a regression. As flawed as the old approach may have been, it was immune to this failure mode. Is there anything I can do here besides increasing various timeouts to avoid hitting this? The fact that no diagnostic information was printed to the Jenkins log or the SSH slave remotin log is also problematic here.

          [JENKINS-48300] Pipeline shell step aborts prematurely with ERROR: script returned exit code -1

          Basil Crow added a comment -

          jglick, since you've been working on this subsystem here, any ideas on a way forward?

          Basil Crow added a comment - jglick , since you've been working on this subsystem here, any ideas on a way forward?

          Basil Crow added a comment -

          When I add -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300 to my JVM options, this problem goes away. I remain concerned about the general strategy in the case of this new NFS failure mode.

          Basil Crow added a comment - When I add -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300 to my JVM options, this problem goes away. I remain concerned about the general strategy in the case of this new NFS failure mode.

          Jesse Glick added a comment -

          The default heartbeat interval could be increased, though using NFS for workspaces is generally not a good plan to begin with. Also there is an outstanding to-do task to adjust the durable-task API to allow a TaskListener to be injected into more calls, such as Controller.exitStatus, which would allow the implementation to print a helpful diagnostic before returning -1.

          Jesse Glick added a comment - The default heartbeat interval could be increased, though using NFS for workspaces is generally not a good plan to begin with. Also there is an outstanding to-do task to adjust the durable-task API to allow a TaskListener to be injected into more calls, such as Controller.exitStatus , which would allow the implementation to print a helpful diagnostic before returning -1.

          Jesse Glick added a comment -

          there is an outstanding to-do task

          Filed.

          Jesse Glick added a comment - there is an outstanding to-do task Filed.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java
          src/main/java/org/jenkinsci/plugins/durabletask/Controller.java
          src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java
          src/test/java/org/jenkinsci/plugins/durabletask/BourneShellScriptTest.java
          src/test/java/org/jenkinsci/plugins/durabletask/PowershellScriptTest.java
          src/test/java/org/jenkinsci/plugins/durabletask/WindowsBatchScriptTest.java
          http://jenkins-ci.org/commit/durable-task-plugin/bc0e2357e7ee49e0046f3a76ecf87802acd3934a
          Log:
          JENKINS-48300 Add an overload for exitStatus taking TaskListener.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java src/main/java/org/jenkinsci/plugins/durabletask/Controller.java src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java src/test/java/org/jenkinsci/plugins/durabletask/BourneShellScriptTest.java src/test/java/org/jenkinsci/plugins/durabletask/PowershellScriptTest.java src/test/java/org/jenkinsci/plugins/durabletask/WindowsBatchScriptTest.java http://jenkins-ci.org/commit/durable-task-plugin/bc0e2357e7ee49e0046f3a76ecf87802acd3934a Log: JENKINS-48300 Add an overload for exitStatus taking TaskListener.

          Code changed in jenkins
          User: Sam Van Oort
          Path:
          src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java
          src/main/java/org/jenkinsci/plugins/durabletask/Controller.java
          src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java
          src/test/java/org/jenkinsci/plugins/durabletask/BourneShellScriptTest.java
          src/test/java/org/jenkinsci/plugins/durabletask/PowershellScriptTest.java
          src/test/java/org/jenkinsci/plugins/durabletask/WindowsBatchScriptTest.java
          http://jenkins-ci.org/commit/durable-task-plugin/7c12b3a72cb402d89f5d51b7a88811f2ac075891
          Log:
          Merge pull request #57 from jglick/exitStatus-JENKINS-48300

          JENKINS-48300 Add an overload for exitStatus taking TaskListener

          Compare: https://github.com/jenkinsci/durable-task-plugin/compare/7f57bb297ee3...7c12b3a72cb4

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Sam Van Oort Path: src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java src/main/java/org/jenkinsci/plugins/durabletask/Controller.java src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java src/test/java/org/jenkinsci/plugins/durabletask/BourneShellScriptTest.java src/test/java/org/jenkinsci/plugins/durabletask/PowershellScriptTest.java src/test/java/org/jenkinsci/plugins/durabletask/WindowsBatchScriptTest.java http://jenkins-ci.org/commit/durable-task-plugin/7c12b3a72cb402d89f5d51b7a88811f2ac075891 Log: Merge pull request #57 from jglick/exitStatus- JENKINS-48300 JENKINS-48300 Add an overload for exitStatus taking TaskListener Compare: https://github.com/jenkinsci/durable-task-plugin/compare/7f57bb297ee3...7c12b3a72cb4

          Sam Van Oort added a comment -

          Released to the wild now

          Sam Van Oort added a comment - Released to the wild now

          Basil Crow added a comment -

          I see that this bug has been closed as fixed, but I'm not sure I'd consider it fixed. I guess that depends on what the scope of this bug is. There were two problems identified in the bug:

          1. JENKINS-47791 introduced a new failure mode that only manifests when using NFS-based workspaces.
          2. This new failure mode having a poor error message.

          If the scope of this bug is both of these issues, then only the second has been fixed. The first issue remains, and this bug shouldn't be marked as fixed.

          If the scope of this bug is only the second issue, then a new bug should be filed covering the first issue.

          Which of the two is the case?

          Basil Crow added a comment - I see that this bug has been closed as fixed, but I'm not sure I'd consider it fixed. I guess that depends on what the scope of this bug is. There were two problems identified in the bug: JENKINS-47791 introduced a new failure mode that only manifests when using NFS-based workspaces. This new failure mode having a poor error message. If the scope of this bug is both of these issues, then only the second has been fixed. The first issue remains, and this bug shouldn't be marked as fixed. If the scope of this bug is only the second issue, then a new bug should be filed covering the first issue. Which of the two is the case?

          Sam Van Oort added a comment -

          basil #2 has been addressed – there is separate work in the proposed solution to https://issues.jenkins-ci.org/browse/JENKINS-37575 (PRs from Jesse that I am reviewing) that should address your issue, so I'm trying to avoid double-tracking the same cluster of issues.   The issues are phrased differently (resending output vs. timing out) but the root cause is the same (timing in the communication).

          So, that means there's a more comprehensive long-term solution in the works.

          Oh, and if you're by any chance using NFS for your master too: the parts of JENKINS-47170 that I've released already will probably benefit you a lot (particularly the PERFORMANCE-OPTIMIZED pipeline mode) – docs are up at https://jenkins.io/doc/book/pipeline/scaling-pipeline/  Should greatly reduce the IO needs of your master when running Pipelines.

          Sam Van Oort added a comment - basil #2 has been addressed – there is separate work in the proposed solution to  https://issues.jenkins-ci.org/browse/JENKINS-37575  (PRs from Jesse that I am reviewing) that should address your issue, so I'm trying to avoid double-tracking the same cluster of issues.   The issues are phrased differently (resending output vs. timing out) but the root cause is the same (timing in the communication). So, that means there's a more comprehensive long-term solution in the works. Oh, and if you're by any chance using NFS for your master too: the parts of  JENKINS-47170  that I've released already will probably benefit you a lot (particularly the PERFORMANCE-OPTIMIZED pipeline mode) – docs are up at https://jenkins.io/doc/book/pipeline/scaling-pipeline/   Should greatly reduce the IO needs of your master when running Pipelines.

          Basil Crow added a comment -

          svanoort Thanks! I'll start following JENKINS-37575 now. Glad to hear there's a long-term solution in the works.

          I did see the PERFORMANCE-OPTIMIZED pipeline mode and am looking forward to trying it out soon

          Basil Crow added a comment - svanoort Thanks! I'll start following JENKINS-37575 now. Glad to hear there's a long-term solution in the works. I did see the PERFORMANCE-OPTIMIZED pipeline mode and am looking forward to trying it out soon

          Jean-Paul G added a comment - - edited

          The folder jenkins/jobs with the log being on the master, should the JVM parameter -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300 be set on the master JVM (in the start script) or on each slave node JVM (in the node configuration) ?

          Jean-Paul G added a comment - - edited The folder jenkins/jobs with the log being on the master, should the JVM parameter -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300 be set on the master JVM (in the start script) or on each slave node JVM (in the node configuration) ?

          Damien Merlin added a comment -

          Hi Jean-Paul, the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300 must be set on the master JVM. Note that I use only 60s and currently this solve my issues.

          Damien Merlin added a comment - Hi Jean-Paul, the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300 must be set on the master JVM. Note that I use only 60s and currently this solve my issues.

          Sorry for my earlier comment (which I deleted); I misunderstood the logic in ShellController::exitStatus when I first glanced over it. I will try to better understand what's happening in my case (why we're getting failures even though we're using a local hard disk) and comment here and/or open a new issue once I've actually understood the problem.

          Moritz Baumann added a comment - Sorry for my earlier comment (which I deleted); I misunderstood the logic in ShellController::exitStatus  when I first glanced over it. I will try to better understand what's happening in my case (why we're getting failures even though we're using a local hard disk) and comment here and/or open a new issue once I've actually understood the problem.

          Craig Rodrigues added a comment - - edited

          If you see problems like this, I recommend that you go to Manage Jenkins -> System Log, and create a logger which logs all events for org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep

          This will print out more debug statements and help diagnose the problem.

           

          There is additional logging in https://github.com/jenkinsci/workflow-basic-steps-plugin/blob/stable/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java#L177

           

          That is enabled if you do this, and can help identify the problem.

          Craig Rodrigues added a comment - - edited If you see problems like this, I recommend that you go to Manage Jenkins -> System Log , and create a logger which logs all events for org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep This will print out more debug statements and help diagnose the problem.   There is additional logging in https://github.com/jenkinsci/workflow-basic-steps-plugin/blob/stable/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java#L177   That is enabled if you do this, and can help identify the problem.

          I ran into the same problem as others did.

           

          I am using Durable Task Plugin 1.23

           

          I have a Pipeline which takes a very long time, and looks like this:

           

           #!groovypipeline {
              agent {
                  label 'CRAIG-1'
              }    options {
                  disableConcurrentBuilds()
                  timeout(time: 10, unit: 'HOURS')
              }    parameters {
                  booleanParam(name: 'UPDATE_PARAMETERS',
                               defaultValue: false,
                               description: 'Update the parameters from this pipeline script')        string(defaultValue: 'master',
                         description: 'branch',
                         name: 'BRANCH')
              }    stages {
                  stage("Display build parameters") {
                      steps {
                          script {
                              /*
                               * Print out the build parameters
                               */
                              def all_params = ""                    for ( k in params ) {
                                  all_params = all_params + "${k.key}=${k.value}\n"
                              }                    print("These parameters were passed to this build:\n" + all_params)
                              writeFile(file: "env-vars.txt", text: "$all_params")
                          }
                      }
                  }        /*
                   * Jenkins needs to parse the entire pipeline before it can
                   * parse the parameters, if the parameters are specified in this file.
                   */
                  stage("Updating Parameters") {
                      when {
                          expression {
                              params.UPDATE_PARAMETERS == true
                          }
                      }
                      steps {
                          script {
                                  currentBuild.result = 'ABORTED'
                                  error('DRY RUN COMPLETED. JOB PARAMETERIZED.')
                          }
                      }
                  }        stage("First") {
                      steps {
                          dir("dir1") {
          		
                              git(url: 'https://github.com/twisted/twisted')
                              sh("""
                                 echo do some stuff
                                 """)
                          }
                      }
                  }        stage("Second") {
                      steps {
                          dir("dir2") {
                              git(url: 'https://github.com/twisted/twisted')
                          sh("""
                             echo do more stuff
                             """)
          	        }
                      }
                  }       stage("Third: Takes a long time, over 1.5 hours") {
                      steps {
                          sh("""
                              echo this operation takes a long time
                              """)
                      }
                      post {
                          always {
                              junit "report.xml"
                          }
                      }
                 }
              }    post {
                  failure {
                      slackSend (channel: '#channel-alerts', color: '#FF0000', message: "FAILED: Job '${env.JOB_NAME} started by ${env.CAUSEDBY} [${env.BUILD_NUMBER}]' (${env.RUN_DISPLAY_URL})");
                  }        changed {
                      script {
                          /*
                           * Only send e-mails on failures, or when status changes from failure
                           * to success, or success to failure.
                           * This requires currentBuild.result to be set.
                           *
                           * See: https://baptiste-wicht.com/posts/2017/06/jenkins-tip-send-notifications-fixed-builds-declarative-pipeline.html
                           */
                          def prevBuild = currentBuild.getPreviousBuild()
                          /*
                           * If this pipeline has never run before, then prevBuild will be null.
                           */
                          if (prevBuild == null) {
                              return
                          }
                          def prevResult = prevBuild.getResult()
                          def result = currentBuild.getResult()
                          if ("${prevResult}" != "${result}" && "${result}" != "FAILURE") {
                              if ("${prevResult}" == "FAILURE") {
                                  slackSend(channel: '#smoketest-alerts', color: 'good', message: "SUCCEEDED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.RUN_DISPLAY_URL})")
                              }
                          }
                      }
                  }
              }
          }
          

          In Manage Jenkins -> System Log I enabled a logger to log ALL events for org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep

          After running my pipeline for 2 hours, Jenkins terminated the pipeline, and I saw this in the log:

          Post stage
          wrapper script does not seem to be touching the log file in /root/workspace/workspace/PX-TEST-STABLE@tmp/durable-502ca4bd
          (JENKINS-48300: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)         

          Craig Rodrigues added a comment - I ran into the same problem as others did.   I am using Durable Task Plugin 1.23   I have a Pipeline which takes a very long time, and looks like this:   #!groovypipeline { agent { label 'CRAIG-1' } options { disableConcurrentBuilds() timeout(time: 10, unit: 'HOURS') } parameters { booleanParam(name: 'UPDATE_PARAMETERS', defaultValue: false, description: 'Update the parameters from this pipeline script') string(defaultValue: 'master', description: 'branch', name: 'BRANCH') } stages { stage("Display build parameters") { steps { script { /* * Print out the build parameters */ def all_params = "" for ( k in params ) { all_params = all_params + "${k.key}=${k.value}\n" } print("These parameters were passed to this build:\n" + all_params) writeFile(file: "env-vars.txt", text: "$all_params") } } } /* * Jenkins needs to parse the entire pipeline before it can * parse the parameters, if the parameters are specified in this file. */ stage("Updating Parameters") { when { expression { params.UPDATE_PARAMETERS == true } } steps { script { currentBuild.result = 'ABORTED' error('DRY RUN COMPLETED. JOB PARAMETERIZED.') } } } stage("First") { steps { dir("dir1") { git(url: 'https://github.com/twisted/twisted') sh(""" echo do some stuff """) } } } stage("Second") { steps { dir("dir2") { git(url: 'https://github.com/twisted/twisted') sh(""" echo do more stuff """) } } } stage("Third: Takes a long time, over 1.5 hours") { steps { sh(""" echo this operation takes a long time """) } post { always { junit "report.xml" } } } } post { failure { slackSend (channel: '#channel-alerts', color: '#FF0000', message: "FAILED: Job '${env.JOB_NAME} started by ${env.CAUSEDBY} [${env.BUILD_NUMBER}]' (${env.RUN_DISPLAY_URL})"); } changed { script { /* * Only send e-mails on failures, or when status changes from failure * to success, or success to failure. * This requires currentBuild.result to be set. * * See: https://baptiste-wicht.com/posts/2017/06/jenkins-tip-send-notifications-fixed-builds-declarative-pipeline.html */ def prevBuild = currentBuild.getPreviousBuild() /* * If this pipeline has never run before, then prevBuild will be null. */ if (prevBuild == null) { return } def prevResult = prevBuild.getResult() def result = currentBuild.getResult() if ("${prevResult}" != "${result}" && "${result}" != "FAILURE") { if ("${prevResult}" == "FAILURE") { slackSend(channel: '#smoketest-alerts', color: 'good', message: "SUCCEEDED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.RUN_DISPLAY_URL})") } } } } } } In Manage Jenkins -> System Log I enabled a logger to log ALL events for org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep After running my pipeline for 2 hours, Jenkins terminated the pipeline, and I saw this in the log: Post stage wrapper script does not seem to be touching the log file in /root/workspace/workspace/PX-TEST-STABLE@tmp/durable-502ca4bd (JENKINS-48300: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)

          Craig Rodrigues added a comment - - edited

          Is there a way to specify

           -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300

          without modifying the invocation of java which starts the Jenkins master?

          I am running Jenkins using the jenkins-lts docker image, and it is a pain to modify the startup command-line unless I build my own docker image running jenkins-lts.

          Craig Rodrigues added a comment - - edited Is there a way to specify -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300 without modifying the invocation of java which starts the Jenkins master? I am running Jenkins using the jenkins-lts docker image, and it is a pain to modify the startup command-line unless I build my own docker image running jenkins-lts.

          The workaround I tried was to go to Manage Jenkins -> System Console

          then I entered:

           

          System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", 36000)

           

          I then ran my pipeline, and it wasn't terminated.

          Is there a way I can do this inside the pipeline?

          Craig Rodrigues added a comment - The workaround I tried was to go to Manage Jenkins -> System Console then I entered:   System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", 36000)   I then ran my pipeline, and it wasn't terminated. Is there a way I can do this inside the pipeline?

          Craig Rodrigues added a comment - - edited

          I was able to do this in my pipeline, and it worked, after enabling the:

          script {
             System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", "3800");
          } 

          I had to enable the function to work in the security settings at Manage JenkinsIn-**process Script approval, but it worked.

          Craig Rodrigues added a comment - - edited I was able to do this in my pipeline, and it worked, after enabling the: script { System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", "3800"); } I had to enable the function to work in the security settings at Manage Jenkins – In- ** process Script approval , but it worked.

          Sverre Moe added a comment -

          We are using durable-task-plugin 1.23, but are still seeing this problem. According to the changelog it was fixed in version 1.18

          A few(4-5) weeks ago we didn't have this problem, then yesterday we upgraded Jenkins and all our plugins. Now it fails building on Windows.

          [Native master-windows7-x86_64] wrapper script does not seem to be touching the log file in C:\cygwin64\home\build\jenkins\workspace\applicationA_sverre_work-3U54DPE57F6TMOZM2O6QBWDQ2LNRU2QHAXT6INC3UPGWF2ERMXAQ@tmp\durable-0ead6a5b
          [Native master-windows7-x86_64] (JENKINS-48300: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)

          Is the workaround mentioned above the actual fix that was fixed in version 1.18? We have been using it for months without seeing a problem.
          We are not having the problem now when started Jenkins with that system property.

          Sverre Moe added a comment - We are using durable-task-plugin 1.23, but are still seeing this problem. According to the changelog it was fixed in version 1.18 A few(4-5) weeks ago we didn't have this problem, then yesterday we upgraded Jenkins and all our plugins. Now it fails building on Windows. [Native master-windows7-x86_64] wrapper script does not seem to be touching the log file in C:\cygwin64\home\build\jenkins\workspace\applicationA_sverre_work-3U54DPE57F6TMOZM2O6QBWDQ2LNRU2QHAXT6INC3UPGWF2ERMXAQ@tmp\durable-0ead6a5b [Native master-windows7-x86_64] ( JENKINS-48300 : if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300) Is the workaround mentioned above the actual fix that was fixed in version 1.18? We have been using it for months without seeing a problem. We are not having the problem now when started Jenkins with that system property.

          Bartek Kania added a comment - - edited

          I get the same problem as djviking above as of version 1.23 on windows build slaves.

          Didn't have any problems before.

          System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", "3800");

          Seems to work around it for me.

          Bartek Kania added a comment - - edited I get the same problem as djviking above as of version 1.23 on windows build slaves. Didn't have any problems before. System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", "3800"); Seems to work around it for me.

          Craig Rodrigues added a comment - - edited

          @dwnusbaum can you take a look at this?  This seems to be affecting a few people, and the workaround seems to be to set org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL to some really high value.

          Craig Rodrigues added a comment - - edited @dwnusbaum can you take a look at this?  This seems to be affecting a few people, and the workaround seems to be to set org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL to some really high value.

          E H added a comment -

          I ran into this as well, the durable-task 1.25 appears to resolve the problem - thanks much for the quick fix.

          E H added a comment - I ran into this as well, the durable-task 1.25 appears to resolve the problem - thanks much for the quick fix.

          I seem to have the same problem with durable-task 1.25. Trying to build an (empty) Spring boot app using Artifactory Maven plugin on a Kuberenetes slave, the build script was aborted with a link to this issue. After setting the property in the pipeline script, (which fixed it), I noticed that there is a rather long time between two log statements:

           

          17:36:55.137 [DEBUG] [org.gradle.initialization.DefaultGradlePropertiesLoader] Found system project properties: []
          >> Without the increased timeout, the pipeline was aborted here before the next log entry
          17:36:57.323 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationExecutor] Build operation 'Apply script settings.gradle to settings 'ci-test'' started

           

          I am not using NFS but of course the build slave is a virtual machine which will be a little slower.

          Michael Cornel added a comment - I seem to have the same problem with durable-task 1.25. Trying to build an (empty) Spring boot app using Artifactory Maven plugin on a Kuberenetes slave, the build script was aborted with a link to this issue. After setting the property in the pipeline script, (which fixed it), I noticed that there is a rather long time between two log statements:   17:36:55.137 [DEBUG] [org.gradle.initialization.DefaultGradlePropertiesLoader] Found system project properties: [] >> Without the increased timeout, the pipeline was aborted here before the next log entry 17:36:57.323 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationExecutor] Build operation 'Apply script settings.gradle to settings 'ci-test'' started   I am not using NFS but of course the build slave is a virtual machine which will be a little slower.

          Jesse Glick added a comment -

          Do not attempt to call System.setProperty from within a sandboxed script. Whitelisting this would constitute a possibly severe vulnerability.

          The default value for the check interval should likely be higher. Still, presence of this issue suggests that there is something wrong with the agent’s filesystem, or that control processes are being killed.

          Not sure why this marked fixed. PR 57 merely added better logging; it did not change the behavior otherwise.

          Jesse Glick added a comment - Do not attempt to call System.setProperty from within a sandboxed script. Whitelisting this would constitute a possibly severe vulnerability. The default value for the check interval should likely be higher. Still, presence of this issue suggests that there is something wrong with the agent’s filesystem, or that control processes are being killed. Not sure why this marked fixed. PR 57 merely added better logging; it did not change the behavior otherwise.

          Jesse Glick added a comment -

          I do not believe this is related to JENKINS-37575.

          Jesse Glick added a comment - I do not believe this is related to JENKINS-37575 .

          stielerit Can you add a new logger to your Jenkins system by navigating to Manage Jenkins -> System Log, and creating a new logger org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep set to ALL .

          Then re-run your pipeline looking for debugging messages to highlight where the problem might be?

          That logger is defined here: https://github.com/jenkinsci/workflow-durable-task-step-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java#L71

          Craig Rodrigues added a comment - stielerit Can you add a new logger to your Jenkins system by navigating to  Manage Jenkins -> System Log , and creating a new logger org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep set to ALL . Then re-run your pipeline looking for debugging messages to highlight where the problem might be? That logger is defined here: https://github.com/jenkinsci/workflow-durable-task-step-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java#L71

          Jesse Glick added a comment -

          durable-task PR 81 at least increases the grace period, pending some determination of root cause by users seeing this issue.

          Jesse Glick added a comment - durable-task PR 81 at least increases the grace period, pending some determination of root cause by users seeing this issue.

          Jesse Glick added a comment -

          rodrigc that logger is unlikely to be helpful in this case. Really there is no Java logging that is very pertinent to this issue (a -1 return status which vanishes iff HEARTBEAT_CHECK_INTERVAL is increased)—all the meaningful messages are already sent to the build log as of PR 57.

          Root cause diagnosis would involve using an interactive shell to somehow figure out why jenkins-log.txt is not getting touched at least every three seconds. (More often when there is active log output from the user process.) Possibly it is getting touched, but the agent JVM in BourneShellScript.exitStatus is not seeing the right timestamp, or is somehow misinterpreting what it sees; or perhaps one of the two controller sh processes (the one usually inside sleep 3) has been killed by something (such as was claimed in JENKINS-50892).

          Jesse Glick added a comment - rodrigc that logger is unlikely to be helpful in this case. Really there is no Java logging that is very pertinent to this issue (a -1 return status which vanishes iff HEARTBEAT_CHECK_INTERVAL is increased)—all the meaningful messages are already sent to the build log as of PR 57. Root cause diagnosis would involve using an interactive shell to somehow figure out why jenkins-log.txt is not getting touched at least every three seconds. (More often when there is active log output from the user process.) Possibly it is getting touched, but the agent JVM in BourneShellScript.exitStatus is not seeing the right timestamp, or is somehow misinterpreting what it sees; or perhaps one of the two controller sh processes (the one usually inside sleep 3 ) has been killed by something (such as was claimed in JENKINS-50892 ).

          Michael Cornel added a comment - - edited

          Ok, it took me a while to actually reproduce this error message.

          I have to:

          • Start Gradle manually, so in my case sh('./gradlew -d --no-daemon clean bootJar') – it does not occur if I start the Gradle build using the Artifactory Jenkins plugin
          • Configure the Kubernetes build slave with a memory limit of 512Mi which seems to be not enough and results in a (silent) out of memory problem

          So it appears that durable-task is not actually aborting the build but correctly detects that the process is not running any more. Maybe the error message could give a hint that the most probable reason for the log file not being touched is that the wrapper script has actually died?

          Michael Cornel added a comment - - edited Ok, it took me a while to actually reproduce this error message. I have to: Start Gradle manually, so in my case  sh('./gradlew -d --no-daemon clean bootJar') – it does not occur if I start the Gradle build using the Artifactory Jenkins plugin Configure the Kubernetes build slave with a memory limit of 512Mi which seems to be not enough and results in a (silent) out of memory problem So it appears that durable-task is not actually aborting the build but correctly detects that the process is not running any more. Maybe the error message could give a hint that the most probable reason for the log file not being touched is that the wrapper script has actually died?

          Jesse Glick added a comment -

          stielerit that is useful information indeed. If I understand correctly, some out of memory condition is resulting in something (Kubernetes? Docker? the Linux kernel?) deciding to just kill off processes such as the wrapper script. Is the agent JVM also being killed? Whatever the case, Jenkins is then behaving appropriately in marking the sh step as a failure (the -1 pseudo exit code represents the fact that the actual exit code of the process is unknown and something fundamental went wrong), but is not clearly explaining the real problem.

          Jesse Glick added a comment - stielerit that is useful information indeed. If I understand correctly, some out of memory condition is resulting in something (Kubernetes? Docker? the Linux kernel?) deciding to just kill off processes such as the wrapper script. Is the agent JVM also being killed? Whatever the case, Jenkins is then behaving appropriately in marking the sh step as a failure (the -1 pseudo exit code represents the fact that the actual exit code of the process is unknown and something fundamental went wrong), but is not clearly explaining the real problem.

          Almost. So as far as I understand Kubernetes simply "translates" the resource limit configuration and applies it when starting the Docker containers. I am pretty sure that I saw an IOException Out of memory with Gradle stacktrace during one of the builds. Thus, Docker is probably not killing the process but just prevents it to allocate more memory.

          I would expect the Gradle JVM to exit with non-zero exit code and the agent to recognize this and immediately mark the build as failed. I don't know if it does or what happens to the agent JVM and so on, though.

          Michael Cornel added a comment - Almost. So as far as I understand Kubernetes simply "translates" the resource limit configuration and applies it when starting the Docker containers. I am pretty sure that I saw an IOException Out of memory with Gradle stacktrace during one of the builds. Thus, Docker is probably not killing the process but just prevents it to allocate more memory. I would expect the Gradle JVM to exit with non-zero exit code and the agent to recognize this and immediately mark the build as failed. I don't know if it does or what happens to the agent JVM and so on, though.

          Jesse Glick added a comment -

          Possibly the container is so hosed that just trying to fork sleep 3 from the controller process fails.

          Jesse Glick added a comment - Possibly the container is so hosed that just trying to fork sleep 3 from the controller process fails.

          Devin Nusbaum added a comment -

          The fix that increases the default heartbeat interval to 5 minutes was just released in version 1.26 of the Durable Task Plugin.

          Devin Nusbaum added a comment - The fix that increases the default heartbeat interval to 5 minutes was just released in version 1.26 of the Durable Task Plugin.

          Byte Enable added a comment -

          I am experiencing this issue randomly.  durable-task plugin is at version 1.26 as well.  I am running the agent on a 10Gbe port as well.  The port is specifically for Jenkins.

          Cannot contact XXXXXXX: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from 10.10.11.205/10.10.11.205:54092 failed. The channel is closing down or has closed down
          wrapper script does not seem to be touching the log file in /XXXXXXXX@tmp/durable-7e71b4e1
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)

          Byte Enable added a comment - I am experiencing this issue randomly.  durable-task plugin is at version 1.26 as well.  I am running the agent on a 10Gbe port as well.  The port is specifically for Jenkins. Cannot contact XXXXXXX: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from 10.10.11.205/10.10.11.205:54092 failed. The channel is closing down or has closed down wrapper script does not seem to be touching the log file in /XXXXXXXX@tmp/durable-7e71b4e1 ( JENKINS-48300 : if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)

          byteenable I recommend that you try increasing the logging as per the steps mentioned here:

           

          https://issues.jenkins-ci.org/browse/JENKINS-48300?focusedCommentId=346766&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-346766

           

          and see if you see some clues.

           

          Craig Rodrigues added a comment - byteenable I recommend that you try increasing the logging as per the steps mentioned here:   https://issues.jenkins-ci.org/browse/JENKINS-48300?focusedCommentId=346766&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-346766   and see if you see some clues.  

          Jesse Glick added a comment -

          Unlikely to be helpful. The problem here is likely a Remoting channel outage, which is not a Pipeline issue. Diagnosing those are tricky.

          Jesse Glick added a comment - Unlikely to be helpful. The problem here is likely a Remoting channel outage, which is not a Pipeline issue. Diagnosing those are tricky.

          Byte Enable added a comment -

          I just hit it again.  I added the logging as requested earlier and the log was empty.  The plugin is stating that the heartbeat interval should be set to 86400.  If that is in seconds; then that is 24 HRS.  I am running Jenkins on Ubuntu inside a HYPER-V VM.   I have 32GB of mem assigned but its only using 5GB.   With 12 CPU's assigned as well.

          However, I am using rsync to download some files in my scripts from the server Jenkins is running on.  I had three other Pipeline scripts running at the time in various stages when this occurred.  I was also running top on the Ubuntu VM.  I noticed that rsync was at 90% and the load jumped to around 1.01.

          Cannot contact XXXXXXX: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from 10.10.11.205/10.10.11.205:56958 failed. The channel is closing down or has closed down
          wrapper script does not seem to be touching the log file in XXXXXXXXXXX@tmp/durable-c5848708
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)

          Byte Enable added a comment - I just hit it again.  I added the logging as requested earlier and the log was empty.  The plugin is stating that the heartbeat interval should be set to 86400.  If that is in seconds; then that is 24 HRS.  I am running Jenkins on Ubuntu inside a HYPER-V VM.   I have 32GB of mem assigned but its only using 5GB.   With 12 CPU's assigned as well. However, I am using rsync to download some files in my scripts from the server Jenkins is running on.  I had three other Pipeline scripts running at the time in various stages when this occurred.  I was also running top on the Ubuntu VM.  I noticed that rsync was at 90% and the load jumped to around 1.01. Cannot contact XXXXXXX: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from 10.10.11.205/10.10.11.205:56958 failed. The channel is closing down or has closed down wrapper script does not seem to be touching the log file in XXXXXXXXXXX@tmp/durable-c5848708 ( JENKINS-48300 : if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)

          J S added a comment -

          I don't think this problem is solved. I have a Redhat 7 Jenkins Master with the corresponding RedHat 7 Slaves. The plugin version of Jenkins Durable Task is 1.2.6 so the latest version. I have the latest LTS version of Jenkins 2.138.1 and recently I have the following problem:

          wrapper script does not seem to be touching the log file in /data/build/workspace/ro-TWADR5DH34OARMVNOXJQZ74HP4G7QAQ@2/build@tmp/durable-0a842734(JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)script returned exit code -1

          The step  was finished after 10 minutes with the above error. How can I set in the Jenkinsfile the durable task to abort after 60 minutes or 2 hours ? Can I set this anywhere in Jenkins? Could someone please write a tutorial about this I think the problem has been occurring a lot since the last update

          J S added a comment - I don't think this problem is solved. I have a Redhat 7 Jenkins Master with the corresponding RedHat 7 Slaves. The plugin version of Jenkins Durable Task is 1.2.6 so the latest version. I have the latest LTS version of Jenkins 2.138.1 and recently I have the following problem: wrapper script does not seem to be touching the log file in /data/build/workspace/ro-TWADR5DH34OARMVNOXJQZ74HP4G7QAQ@2/build@tmp/durable-0a842734(JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)script returned exit code -1 The step  was finished after 10 minutes with the above error. How can I set in the Jenkinsfile the durable task to abort after 60 minutes or 2 hours ? Can I set this anywhere in Jenkins? Could someone please write a tutorial about this I think the problem has been occurring a lot since the last update

          Jesse Glick added a comment -

          openjenkins no this cannot be set per Jenkinsfile, only via system property, as it is merely an escape hatch for a system with a very laggy filesystem. Anyway if the log file received no touch after the new default of 5m (perhaps ×2), probably it was never going to. Something is broken in your system. I cannot diagnose the exact problem for you, though in your case it does not sound like a broken Remoting channel. Possibly the watcher process was killed off by something. I have heard of cases where the Linux kernel running under low memory conditions starts killing processes at random.

          Jesse Glick added a comment - openjenkins no this cannot be set per Jenkinsfile , only via system property, as it is merely an escape hatch for a system with a very laggy filesystem. Anyway if the log file received no touch after the new default of 5m (perhaps ×2), probably it was never going to. Something is broken in your system. I cannot diagnose the exact problem for you, though in your case it does not sound like a broken Remoting channel. Possibly the watcher process was killed off by something. I have heard of cases where the Linux kernel running under low memory conditions starts killing processes at random.

          Byte Enable added a comment -

          The kernel invokes the OOM (out of memory) killer when SWAP space is filled.  And memory malloc's keep happening.  Such as a memory leak.  This was around RHEL6.  The issue is not fixed.  I did not experience this issue until upgrading to the latest version recently.  What is a laggy filesystem?  I/O is blocked?  System under heavy load?

          Byte Enable added a comment - The kernel invokes the OOM (out of memory) killer when SWAP space is filled.  And memory malloc's keep happening.  Such as a memory leak.  This was around RHEL6.  The issue is not fixed.  I did not experience this issue until upgrading to the latest version recently.  What is a laggy filesystem?  I/O is blocked?  System under heavy load?

          Jesse Glick added a comment -

          The “laggy filesystem” issue pertains to a failure of a watcher process to touch a process log file while the process is idle, or the failure of the Jenkins agent JVM to see/interpret that timestamp. There could be many causes of that, such as a very slow network filesystem. The fix referenced in this issue was just a fix for a particular root cause: it made the grace period very long, so any filesystem which is still functioning at all should not have that issue. Exit codes of -1 from a sh step can be traced ultimately to many, many causes, such as problems with file permissions when using containers, processes being abruptly killed off by the kernel, the system having been rebooted, etc. If there are other conditions in which a -1 exit code is returned improperly—i.e., the process actually did finish with some real exit code but Jenkins failed to either notice it or display diagnostics—then those would be other issues. I cannot attempt to guess at the root cause encountered by a particular user in a particular condition. In general these things need to be tracked down by logging in to the agent machine and inspecting what is actually going on in the durable task control directory vs. what is happening with the user process (usually, but not necessarily, sh) and the two control processes (always sh).

          Jesse Glick added a comment - The “laggy filesystem” issue pertains to a failure of a watcher process to touch a process log file while the process is idle, or the failure of the Jenkins agent JVM to see/interpret that timestamp. There could be many causes of that, such as a very slow network filesystem. The fix referenced in this issue was just a fix for a particular root cause: it made the grace period very long, so any filesystem which is still functioning at all should not have that issue. Exit codes of -1 from a sh step can be traced ultimately to many, many causes, such as problems with file permissions when using containers, processes being abruptly killed off by the kernel, the system having been rebooted, etc. If there are other conditions in which a -1 exit code is returned improperly—i.e., the process actually did finish with some real exit code but Jenkins failed to either notice it or display diagnostics—then those would be other issues. I cannot attempt to guess at the root cause encountered by a particular user in a particular condition. In general these things need to be tracked down by logging in to the agent machine and inspecting what is actually going on in the durable task control directory vs. what is happening with the user process (usually, but not necessarily, sh ) and the two control processes (always sh ).

          Guohua Wu added a comment -

          I met the same issue recently after upgrading durable-task plugin. Here's the error message: 

          My durable-task plugin version is 1.26, the latest.

          I wonder if this issue has any work around operation .

          Guohua Wu added a comment - I met the same issue recently after upgrading durable-task plugin. Here's the error message:  My durable-task plugin version is 1.26, the latest. I wonder if this issue has any work around operation .

          The workaround I tried was to go to Manage Jenkins -> System Console

          Did you mean Script Console (/script)

          System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", 36000)

           When I did that I got:

          groovy.lang.MissingMethodException: No signature of method: static java.lang.System.setProperty() is applicable for argument types: (java.lang.String, java.lang.Integer) values: [org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL, ...]
          Possible solutions: setProperty(java.lang.String, java.lang.String), getProperty(java.lang.String), getProperty(java.lang.String, java.lang.String), hasProperty(java.lang.String), getProperties(), getProperties()
          	at groovy.lang.MetaClassImpl.invokeStaticMissingMethod(MetaClassImpl.java:1501)
          	at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1487)
          	at org.codehaus.groovy.runtime.callsite.StaticMetaClassSite.call(StaticMetaClassSite.java:53)
          	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
          	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
          	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:133)
          	at Script1.run(Script1.groovy:1)
          	at groovy.lang.GroovyShell.evaluate(GroovyShell.java:585)
          	at groovy.lang.GroovyShell.evaluate(GroovyShell.java:623)
          	at groovy.lang.GroovyShell.evaluate(GroovyShell.java:594)
          	at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142)
          	at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114)
          	at hudson.remoting.LocalChannel.call(LocalChannel.java:45)
          	at hudson.util.RemotingDiagnostics.executeGroovy(RemotingDiagnostics.java:111)
          	at jenkins.model.Jenkins._doScript(Jenkins.java:4381)
          	at jenkins.model.Jenkins.doScript(Jenkins.java:4352)
          	at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
          	at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343)
          	at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184)
          	at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117)
          	at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129)
          	at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
          	at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
          	at org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
          	at org.kohsuke.stapler.Stapler.invoke(Stapler.java:668)
          	at org.kohsuke.stapler.Stapler.service(Stapler.java:238)
          	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
          	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:860)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154)
          	at org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:225)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
          	at io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
          	at io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
          	at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter$1.call(ScmSyncConfigurationFilter.java:49)
          	at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter$1.call(ScmSyncConfigurationFilter.java:44)
          	at hudson.plugins.scm_sync_configuration.ScmSyncConfigurationDataProvider.provideRequestDuring(ScmSyncConfigurationDataProvider.java:106)
          	at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter.doFilter(ScmSyncConfigurationFilter.java:44)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
          	at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239)
          	at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215)
          	at net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:88)
          	at org.jvnet.hudson.plugins.monitoring.HudsonMonitoringFilter.doFilter(HudsonMonitoringFilter.java:114)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
          	at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
          	at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249)
          	at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90)
          	at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
          	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
          	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
          	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
          	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
          	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
          	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
          	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
          	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
          	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
          	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
          	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
          	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
          	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
          	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
          	at org.eclipse.jetty.server.Server.handle(Server.java:530)
          	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)
          	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)
          	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
          	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
          	at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
          	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
          	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
          	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
          	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
          	at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at java.lang.Thread.run(Thread.java:748)
           

          Brian J Murrell added a comment - The workaround I tried was to go to Manage Jenkins -> System Console Did you mean Script Console ( /script ) System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", 36000)  When I did that I got: groovy.lang.MissingMethodException: No signature of method: static java.lang.System.setProperty() is applicable for argument types: (java.lang.String, java.lang.Integer) values: [org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL, ...] Possible solutions: setProperty(java.lang.String, java.lang.String), getProperty(java.lang.String), getProperty(java.lang.String, java.lang.String), hasProperty(java.lang.String), getProperties(), getProperties() at groovy.lang.MetaClassImpl.invokeStaticMissingMethod(MetaClassImpl.java:1501) at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1487) at org.codehaus.groovy.runtime.callsite.StaticMetaClassSite.call(StaticMetaClassSite.java:53) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:133) at Script1.run(Script1.groovy:1) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:585) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:623) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:594) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114) at hudson.remoting.LocalChannel.call(LocalChannel.java:45) at hudson.util.RemotingDiagnostics.executeGroovy(RemotingDiagnostics.java:111) at jenkins.model.Jenkins._doScript(Jenkins.java:4381) at jenkins.model.Jenkins.doScript(Jenkins.java:4352) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627) at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343) at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184) at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117) at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:864) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:668) at org.kohsuke.stapler.Stapler.service(Stapler.java:238) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:860) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154) at org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:225) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter$1.call(ScmSyncConfigurationFilter.java:49) at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter$1.call(ScmSyncConfigurationFilter.java:44) at hudson.plugins.scm_sync_configuration.ScmSyncConfigurationDataProvider.provideRequestDuring(ScmSyncConfigurationDataProvider.java:106) at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter.doFilter(ScmSyncConfigurationFilter.java:44) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239) at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215) at net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:88) at org.jvnet.hudson.plugins.monitoring.HudsonMonitoringFilter.doFilter(HudsonMonitoringFilter.java:114) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84) at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249) at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90) at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:530) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

          jglick Your explanation above, is great for people who understand the internals of Jenkins and Pipeline, etc. and how durability works, etc. is great, but it doesn't leave the "layman" (i.e. Jenkins user) a lot to debug with.

          Where is this "laggy filesystem"?  On the agent I am gathering?  How exactly is this laggyness being measured?  What would I have to do when logged on to the agent to see what Jenkins is doing to determine "laggy filesystem"?

          Brian J Murrell added a comment - jglick Your explanation above, is great for people who understand the internals of Jenkins and Pipeline, etc. and how durability works, etc. is great, but it doesn't leave the "layman" (i.e. Jenkins user ) a lot to debug with. Where is this "laggy filesystem"?  On the agent I am gathering?  How exactly is this laggyness being measured?  What would I have to do when logged on to the agent to see what Jenkins is doing to determine "laggy filesystem"?

          I've added -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 to my Java command line but am still getting this error in my jobs.

          Here is my entire java command line:

          java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -DsessionTimeout=8000 -Xms4g -Xmx8g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -Xloggc:/var/log/jenkins/gc-%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=30m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --webroot=/var/lib/jenkins/war --httpsPort=-1 --httpPort=8080 --ajp13Port=-1 -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600
          

           When I put the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 before the -jar flag as such:

          java -Djava.awt.headless=true -DsessionTimeout=8000 -Xms4g -Xmx8g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -Xloggc:/var/log/jenkins/gc-%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=30m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --webroot=/var/lib/jenkins/war --httpsPort=-1 --httpPort=8080 --ajp13Port=-1
          

           Jenkins just doesn't start. The java process starts and runs but nothing is added to jenkins.log and nothing is listening on the web interface.

          Any ideas?

          Brian J Murrell added a comment - I've added -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 to my Java command line but am still getting this error in my jobs. Here is my entire java command line: java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -DsessionTimeout=8000 -Xms4g -Xmx8g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -Xloggc:/var/log/jenkins/gc-%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=30m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --webroot=/var/lib/jenkins/war --httpsPort=-1 --httpPort=8080 --ajp13Port=-1 -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600  When I put the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 before the -jar flag as such: java -Djava.awt.headless=true -DsessionTimeout=8000 -Xms4g -Xmx8g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -Xloggc:/var/log/jenkins/gc-%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=30m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --webroot=/var/lib/jenkins/war --httpsPort=-1 --httpPort=8080 --ajp13Port=-1  Jenkins just doesn't start. The java process starts and runs but nothing is added to jenkins.log and nothing is listening on the web interface. Any ideas?

          Jesse Glick added a comment -

          brianjmurrell yes my explanation was about how to start diagnosing issues in this class, given sufficient knowledge of Jenkins internals. The result of such a diagnosis would be understanding of one new kind of environmental problem that leads to this symptom, and thus a new issue report and an idea for a product patch to either recover automatically or display a user-friendly error. If you are encountering this error on current versions of durable-task, it is likely that your problem is not a laggy filesystem, but something unrelated and yet to be identified.

          Jesse Glick added a comment - brianjmurrell yes my explanation was about how to start diagnosing issues in this class, given sufficient knowledge of Jenkins internals. The result of such a diagnosis would be understanding of one new kind of environmental problem that leads to this symptom, and thus a new issue report and an idea for a product patch to either recover automatically or display a user-friendly error. If you are encountering this error on current versions of durable-task , it is likely that your problem is not a laggy filesystem, but something unrelated and yet to be identified.

          In case it helps anyone else who stumbles across this thread, I just ran into this problem and was able to figure out why (which was not a durable-step, or Jenkins thing, but something I was doing wrong).

          I basically had three different stages in my pipeline using static code analysis tools.  Each of these tools can be CPU intensive and by default are happy to consume as many cores as are available on the host.  We also have multiple Jenkins executors for each of our nodes (e.g. 4 executors on a 4 core node).

          This problem presented itself when I put these three stages in a parallel block, and they all mapped to three executors on the same physical node.  When they started analyzing the code, I'm sure that my system load was completely railed (i.e. 2 if not 3 processes each trying to peg every core to 100% CPU).

          It is no surprise that this error message would occur in this scenario.  Sure, Jenkins could have been more patient, but it also pointed to a pipeline architecture problem on my end.

          Christopher Shannon added a comment - In case it helps anyone else who stumbles across this thread, I just ran into this problem and was able to figure out why (which was not a durable-step, or Jenkins thing, but something I was doing wrong). I basically had three different stages in my pipeline using static code analysis tools.  Each of these tools can be CPU intensive and by default are happy to consume as many cores as are available on the host.  We also have multiple Jenkins executors for each of our nodes (e.g. 4 executors on a 4 core node). This problem presented itself when I put these three stages in a parallel block, and they all mapped to three executors on the same physical node.  When they started analyzing the code, I'm sure that my system load was completely railed (i.e. 2 if not 3 processes each trying to peg every core to 100% CPU). It is no surprise that this error message would occur in this scenario.  Sure, Jenkins could have been more patient, but it also pointed to a pipeline architecture problem on my end.

          Michael Schaufelberger added a comment - - edited

          Thank you for the Workaround via Script Console, rodrigc!

          Note: The second argument had to be a String in my case: "3600".

          Michael Schaufelberger added a comment - - edited Thank you for the Workaround via Script Console, rodrigc ! Note: The second argument had to be a String in my case: "3600" .

          For people who are still experiencing this error message, please check the details of some other possible causes here.

          Benoit Bourdin added a comment - For people who are still experiencing this error message, please check the details of some other possible causes here .

            jglick Jesse Glick
            basil Basil Crow
            Votes:
            6 Vote for this issue
            Watchers:
            33 Start watching this issue

              Created:
              Updated:
              Resolved: