Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48300

Pipeline shell step aborts prematurely with ERROR: script returned exit code -1

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • durable-task-plugin
    • None
    • durable-task 1.26

      A few of my Jenkins pipelines failed last night with this failure mode:

      01:19:19 Running on blackbox-slave2 in /var/tmp/jenkins_slaves/jenkins-regression/path/to/workspace.   [Note: this is an SSH slave]
      [Pipeline] {
      [Pipeline] ws
      01:19:19 Running in /net/nas.delphix.com/nas/regression-run-workspace/jenkins-regression/workspace@10. [Note: This is an NFS share on a NAS]nd they shouldn't take down Jenkins jobs when they do. Our Jenkins jobs used to just hang when there was a NFS outage, now the script liveness check kills the job. I view this as a regression. As flawed
      [Pipeline] {
      [Pipeline] sh
      01:20:10 [qa-gate] Running shell script
      [... script output ...]
      01:27:19 Running test_create_domain at 2017-11-29 01:27:18.887531... 
      [Pipeline] // dir
      [Pipeline] }
      [Pipeline] // ws
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] }
      [Pipeline] // timestamps
      [Pipeline] }
      [Pipeline] // timeout
      ERROR: script returned exit code -1
      Finished: FAILURE
      

      As far as I can tell the script was running fine, but apparently Jenkins killed it prematurely because Jenkins didn't think the process was still alive.

      The interesting thing is that this is normally working, but failed last night at exactly the same time in multiple pipeline jobs. And I only started seeing this after upgrading durable-task-plugin from 1.14 to 1.17. I looked at the code change and saw that the main change has been the change in ProcessLiveness from using a ps-based system to a timestamp-based system. What I suspect is that the NFS server on which this workspace is hosted wasn't processing I/O operations fast enough at the time this problem occurred, so the timestamp wasn't updated even though the script continued running. Note that I am not using Docker here, this is just a regular SSH slave.

      The ps-based approach may have been suboptimal, but it was more reliable for us than the new timestamp-based approach, at least when using NFS-based workspaces. Expecting a timestamp to increase on a file every 15 seconds may be a tall order for some system and network administrators, especially over NFS – network issues can and do happen, and they shouldn't take down Jenkins jobs when they do. Our Jenkins jobs used to just hang when there was a NFS outage, now the script liveness check kills the job. I view this as a regression. As flawed as the old approach may have been, it was immune to this failure mode. Is there anything I can do here besides increasing various timeouts to avoid hitting this? The fact that no diagnostic information was printed to the Jenkins log or the SSH slave remotin log is also problematic here.

          [JENKINS-48300] Pipeline shell step aborts prematurely with ERROR: script returned exit code -1

          Byte Enable added a comment -

          The kernel invokes the OOM (out of memory) killer when SWAP space is filled.  And memory malloc's keep happening.  Such as a memory leak.  This was around RHEL6.  The issue is not fixed.  I did not experience this issue until upgrading to the latest version recently.  What is a laggy filesystem?  I/O is blocked?  System under heavy load?

          Byte Enable added a comment - The kernel invokes the OOM (out of memory) killer when SWAP space is filled.  And memory malloc's keep happening.  Such as a memory leak.  This was around RHEL6.  The issue is not fixed.  I did not experience this issue until upgrading to the latest version recently.  What is a laggy filesystem?  I/O is blocked?  System under heavy load?

          Jesse Glick added a comment -

          The “laggy filesystem” issue pertains to a failure of a watcher process to touch a process log file while the process is idle, or the failure of the Jenkins agent JVM to see/interpret that timestamp. There could be many causes of that, such as a very slow network filesystem. The fix referenced in this issue was just a fix for a particular root cause: it made the grace period very long, so any filesystem which is still functioning at all should not have that issue. Exit codes of -1 from a sh step can be traced ultimately to many, many causes, such as problems with file permissions when using containers, processes being abruptly killed off by the kernel, the system having been rebooted, etc. If there are other conditions in which a -1 exit code is returned improperly—i.e., the process actually did finish with some real exit code but Jenkins failed to either notice it or display diagnostics—then those would be other issues. I cannot attempt to guess at the root cause encountered by a particular user in a particular condition. In general these things need to be tracked down by logging in to the agent machine and inspecting what is actually going on in the durable task control directory vs. what is happening with the user process (usually, but not necessarily, sh) and the two control processes (always sh).

          Jesse Glick added a comment - The “laggy filesystem” issue pertains to a failure of a watcher process to touch a process log file while the process is idle, or the failure of the Jenkins agent JVM to see/interpret that timestamp. There could be many causes of that, such as a very slow network filesystem. The fix referenced in this issue was just a fix for a particular root cause: it made the grace period very long, so any filesystem which is still functioning at all should not have that issue. Exit codes of -1 from a sh step can be traced ultimately to many, many causes, such as problems with file permissions when using containers, processes being abruptly killed off by the kernel, the system having been rebooted, etc. If there are other conditions in which a -1 exit code is returned improperly—i.e., the process actually did finish with some real exit code but Jenkins failed to either notice it or display diagnostics—then those would be other issues. I cannot attempt to guess at the root cause encountered by a particular user in a particular condition. In general these things need to be tracked down by logging in to the agent machine and inspecting what is actually going on in the durable task control directory vs. what is happening with the user process (usually, but not necessarily, sh ) and the two control processes (always sh ).

          Guohua Wu added a comment -

          I met the same issue recently after upgrading durable-task plugin. Here's the error message: 

          My durable-task plugin version is 1.26, the latest.

          I wonder if this issue has any work around operation .

          Guohua Wu added a comment - I met the same issue recently after upgrading durable-task plugin. Here's the error message:  My durable-task plugin version is 1.26, the latest. I wonder if this issue has any work around operation .

          The workaround I tried was to go to Manage Jenkins -> System Console

          Did you mean Script Console (/script)

          System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", 36000)

           When I did that I got:

          groovy.lang.MissingMethodException: No signature of method: static java.lang.System.setProperty() is applicable for argument types: (java.lang.String, java.lang.Integer) values: [org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL, ...]
          Possible solutions: setProperty(java.lang.String, java.lang.String), getProperty(java.lang.String), getProperty(java.lang.String, java.lang.String), hasProperty(java.lang.String), getProperties(), getProperties()
          	at groovy.lang.MetaClassImpl.invokeStaticMissingMethod(MetaClassImpl.java:1501)
          	at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1487)
          	at org.codehaus.groovy.runtime.callsite.StaticMetaClassSite.call(StaticMetaClassSite.java:53)
          	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
          	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
          	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:133)
          	at Script1.run(Script1.groovy:1)
          	at groovy.lang.GroovyShell.evaluate(GroovyShell.java:585)
          	at groovy.lang.GroovyShell.evaluate(GroovyShell.java:623)
          	at groovy.lang.GroovyShell.evaluate(GroovyShell.java:594)
          	at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142)
          	at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114)
          	at hudson.remoting.LocalChannel.call(LocalChannel.java:45)
          	at hudson.util.RemotingDiagnostics.executeGroovy(RemotingDiagnostics.java:111)
          	at jenkins.model.Jenkins._doScript(Jenkins.java:4381)
          	at jenkins.model.Jenkins.doScript(Jenkins.java:4352)
          	at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
          	at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343)
          	at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184)
          	at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117)
          	at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129)
          	at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
          	at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
          	at org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
          	at org.kohsuke.stapler.Stapler.invoke(Stapler.java:668)
          	at org.kohsuke.stapler.Stapler.service(Stapler.java:238)
          	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
          	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:860)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154)
          	at org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:225)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
          	at io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
          	at io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
          	at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter$1.call(ScmSyncConfigurationFilter.java:49)
          	at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter$1.call(ScmSyncConfigurationFilter.java:44)
          	at hudson.plugins.scm_sync_configuration.ScmSyncConfigurationDataProvider.provideRequestDuring(ScmSyncConfigurationDataProvider.java:106)
          	at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter.doFilter(ScmSyncConfigurationFilter.java:44)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
          	at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239)
          	at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215)
          	at net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:88)
          	at org.jvnet.hudson.plugins.monitoring.HudsonMonitoringFilter.doFilter(HudsonMonitoringFilter.java:114)
          	at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
          	at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
          	at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249)
          	at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67)
          	at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
          	at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90)
          	at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30)
          	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
          	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
          	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
          	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
          	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
          	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
          	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
          	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
          	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
          	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
          	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
          	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
          	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
          	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
          	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
          	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
          	at org.eclipse.jetty.server.Server.handle(Server.java:530)
          	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)
          	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)
          	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
          	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
          	at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
          	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
          	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
          	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
          	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
          	at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at java.lang.Thread.run(Thread.java:748)
           

          Brian J Murrell added a comment - The workaround I tried was to go to Manage Jenkins -> System Console Did you mean Script Console ( /script ) System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", 36000)  When I did that I got: groovy.lang.MissingMethodException: No signature of method: static java.lang.System.setProperty() is applicable for argument types: (java.lang.String, java.lang.Integer) values: [org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL, ...] Possible solutions: setProperty(java.lang.String, java.lang.String), getProperty(java.lang.String), getProperty(java.lang.String, java.lang.String), hasProperty(java.lang.String), getProperties(), getProperties() at groovy.lang.MetaClassImpl.invokeStaticMissingMethod(MetaClassImpl.java:1501) at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1487) at org.codehaus.groovy.runtime.callsite.StaticMetaClassSite.call(StaticMetaClassSite.java:53) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:133) at Script1.run(Script1.groovy:1) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:585) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:623) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:594) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114) at hudson.remoting.LocalChannel.call(LocalChannel.java:45) at hudson.util.RemotingDiagnostics.executeGroovy(RemotingDiagnostics.java:111) at jenkins.model.Jenkins._doScript(Jenkins.java:4381) at jenkins.model.Jenkins.doScript(Jenkins.java:4352) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627) at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343) at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184) at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117) at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:864) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:668) at org.kohsuke.stapler.Stapler.service(Stapler.java:238) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:860) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154) at org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:225) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter$1.call(ScmSyncConfigurationFilter.java:49) at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter$1.call(ScmSyncConfigurationFilter.java:44) at hudson.plugins.scm_sync_configuration.ScmSyncConfigurationDataProvider.provideRequestDuring(ScmSyncConfigurationDataProvider.java:106) at hudson.plugins.scm_sync_configuration.extensions.ScmSyncConfigurationFilter.doFilter(ScmSyncConfigurationFilter.java:44) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239) at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215) at net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:88) at org.jvnet.hudson.plugins.monitoring.HudsonMonitoringFilter.doFilter(HudsonMonitoringFilter.java:114) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84) at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249) at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90) at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:530) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

          jglick Your explanation above, is great for people who understand the internals of Jenkins and Pipeline, etc. and how durability works, etc. is great, but it doesn't leave the "layman" (i.e. Jenkins user) a lot to debug with.

          Where is this "laggy filesystem"?  On the agent I am gathering?  How exactly is this laggyness being measured?  What would I have to do when logged on to the agent to see what Jenkins is doing to determine "laggy filesystem"?

          Brian J Murrell added a comment - jglick Your explanation above, is great for people who understand the internals of Jenkins and Pipeline, etc. and how durability works, etc. is great, but it doesn't leave the "layman" (i.e. Jenkins user ) a lot to debug with. Where is this "laggy filesystem"?  On the agent I am gathering?  How exactly is this laggyness being measured?  What would I have to do when logged on to the agent to see what Jenkins is doing to determine "laggy filesystem"?

          I've added -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 to my Java command line but am still getting this error in my jobs.

          Here is my entire java command line:

          java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -DsessionTimeout=8000 -Xms4g -Xmx8g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -Xloggc:/var/log/jenkins/gc-%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=30m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --webroot=/var/lib/jenkins/war --httpsPort=-1 --httpPort=8080 --ajp13Port=-1 -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600
          

           When I put the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 before the -jar flag as such:

          java -Djava.awt.headless=true -DsessionTimeout=8000 -Xms4g -Xmx8g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -Xloggc:/var/log/jenkins/gc-%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=30m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --webroot=/var/lib/jenkins/war --httpsPort=-1 --httpPort=8080 --ajp13Port=-1
          

           Jenkins just doesn't start. The java process starts and runs but nothing is added to jenkins.log and nothing is listening on the web interface.

          Any ideas?

          Brian J Murrell added a comment - I've added -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 to my Java command line but am still getting this error in my jobs. Here is my entire java command line: java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -DsessionTimeout=8000 -Xms4g -Xmx8g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -Xloggc:/var/log/jenkins/gc-%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=30m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --webroot=/var/lib/jenkins/war --httpsPort=-1 --httpPort=8080 --ajp13Port=-1 -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600  When I put the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 before the -jar flag as such: java -Djava.awt.headless=true -DsessionTimeout=8000 -Xms4g -Xmx8g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -Xloggc:/var/log/jenkins/gc-%t.log -XX:NumberOfGCLogFiles=5 -XX:+UseGCLogFileRotation -XX:GCLogFileSize=30m -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -XX:+PrintAdaptiveSizePolicy -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=3600 -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --webroot=/var/lib/jenkins/war --httpsPort=-1 --httpPort=8080 --ajp13Port=-1  Jenkins just doesn't start. The java process starts and runs but nothing is added to jenkins.log and nothing is listening on the web interface. Any ideas?

          Jesse Glick added a comment -

          brianjmurrell yes my explanation was about how to start diagnosing issues in this class, given sufficient knowledge of Jenkins internals. The result of such a diagnosis would be understanding of one new kind of environmental problem that leads to this symptom, and thus a new issue report and an idea for a product patch to either recover automatically or display a user-friendly error. If you are encountering this error on current versions of durable-task, it is likely that your problem is not a laggy filesystem, but something unrelated and yet to be identified.

          Jesse Glick added a comment - brianjmurrell yes my explanation was about how to start diagnosing issues in this class, given sufficient knowledge of Jenkins internals. The result of such a diagnosis would be understanding of one new kind of environmental problem that leads to this symptom, and thus a new issue report and an idea for a product patch to either recover automatically or display a user-friendly error. If you are encountering this error on current versions of durable-task , it is likely that your problem is not a laggy filesystem, but something unrelated and yet to be identified.

          In case it helps anyone else who stumbles across this thread, I just ran into this problem and was able to figure out why (which was not a durable-step, or Jenkins thing, but something I was doing wrong).

          I basically had three different stages in my pipeline using static code analysis tools.  Each of these tools can be CPU intensive and by default are happy to consume as many cores as are available on the host.  We also have multiple Jenkins executors for each of our nodes (e.g. 4 executors on a 4 core node).

          This problem presented itself when I put these three stages in a parallel block, and they all mapped to three executors on the same physical node.  When they started analyzing the code, I'm sure that my system load was completely railed (i.e. 2 if not 3 processes each trying to peg every core to 100% CPU).

          It is no surprise that this error message would occur in this scenario.  Sure, Jenkins could have been more patient, but it also pointed to a pipeline architecture problem on my end.

          Christopher Shannon added a comment - In case it helps anyone else who stumbles across this thread, I just ran into this problem and was able to figure out why (which was not a durable-step, or Jenkins thing, but something I was doing wrong). I basically had three different stages in my pipeline using static code analysis tools.  Each of these tools can be CPU intensive and by default are happy to consume as many cores as are available on the host.  We also have multiple Jenkins executors for each of our nodes (e.g. 4 executors on a 4 core node). This problem presented itself when I put these three stages in a parallel block, and they all mapped to three executors on the same physical node.  When they started analyzing the code, I'm sure that my system load was completely railed (i.e. 2 if not 3 processes each trying to peg every core to 100% CPU). It is no surprise that this error message would occur in this scenario.  Sure, Jenkins could have been more patient, but it also pointed to a pipeline architecture problem on my end.

          Michael Schaufelberger added a comment - - edited

          Thank you for the Workaround via Script Console, rodrigc!

          Note: The second argument had to be a String in my case: "3600".

          Michael Schaufelberger added a comment - - edited Thank you for the Workaround via Script Console, rodrigc ! Note: The second argument had to be a String in my case: "3600" .

          For people who are still experiencing this error message, please check the details of some other possible causes here.

          Benoit Bourdin added a comment - For people who are still experiencing this error message, please check the details of some other possible causes here .

            jglick Jesse Glick
            basil Basil Crow
            Votes:
            6 Vote for this issue
            Watchers:
            33 Start watching this issue

              Created:
              Updated:
              Resolved: