Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52966

Two sequential stages in a parallel stage in a declarative pipeline making use of the same agent can cause a StackOverflowError

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • Jenkins ver 2.121.2
      Windows Server 2016
      Pipeline Plugin: 2.5
      Pipeline Declarative Plugin: 1.3.1

      When running the below pipeline if I have 2 agents that are available and are using the RH7 label then everything completes as expected. However If I only have 1 available agent on the RH7 label either because the other agents are busy or offline then the job will fail with the below error message. Every machine on the RH7 label has a single executor.

      Additionally whilst the job is marked as failed, the agent that tried to run the job still shows it as running. I don't know if it would eventually time out but after a few minutes it still shows the job "running" in the executor window. Cancelling the job in the executor window returns the agent to a usable state.

      Pipeline:

      #!groovy
      
      pipeline {
          agent none
      
          stages {
              stage ("p") {
                  parallel {
                      stage ("p1") {
                          agent { label "RH7" }
      
                          stages {
                              stage ("p1s1") {
                                  steps {
                                      echo "Hello in p1s1"
                                  }
                              }
      
                              stage ("p1s2") {
                                  steps {
                                      echo "Hello in p1s2"
                                  }
                              }
                          }
                      }
      
                      stage ("p2") {
                          agent { label "RH7" }
      
                          stages {
                              stage ("p2s1") {
                                  steps {
                                      echo "Hello in p2s1"
                                  }
                              }
                          }
                      }
                  }
              }
          }
      }
      

      Error Message:

      Running in Durability level: MAX_SURVIVABILITY
      [Pipeline] stage
      [Pipeline] { (p)
      [Pipeline] parallel
      [Pipeline] [p1] { (Branch: p1)
      [Pipeline] [p2] { (Branch: p2)
      [Pipeline] [p1] stage
      [Pipeline] [p1] { (p1)
      [Pipeline] [p2] stage
      [Pipeline] [p2] { (p2)
      [Pipeline] [p1] node
      [p1] Running on Red Hat 7 - 2 in /jenkins/workspace/Problem@2
      [Pipeline] [p2] node
      [Pipeline] [p1] {
      [Pipeline] [p1] stage
      [Pipeline] [p1] { (p1s1)
      [Pipeline] [p1] echo
      [p1] Hello in p1s1
      [Pipeline] [p1] }
      [Pipeline] [p1] // stage
      [Pipeline] [p1] stage
      [Pipeline] [p1] { (p1s2)
      [Pipeline] End of Pipeline
      java.lang.StackOverflowError
      at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:111)
      at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
      at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
      at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
      at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
      at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
      TRUNCATED SEE ATTACHED LOG

      Also within the system log there is the following additional error:

      Aug 09, 2018 9:11:10 PM WARNING org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem

      Unexpected exception in CPS VM thread: CpsFlowExecutionOwner[Problem/39:Problem #39
      java.lang.IllegalStateException: JENKINS-50407: no loaded shell in CpsFlowExecutionOwner[Problem/39:Problem #39
      at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:52)
      at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
      at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:332)
      at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:83)
      at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:244)
      at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:232)
      at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)

      If there is any additional information you require please let me know.

          [JENKINS-52966] Two sequential stages in a parallel stage in a declarative pipeline making use of the same agent can cause a StackOverflowError

          Andrew Bayer added a comment -

          Huh - does this happen consistently? I can't get it to reproduce so far. The underlying issue is, obviously, something in the serialization that gets self-referential. My rough guess is that the second error is a side effect of the stack overflow, but I can't be sure.

          Andrew Bayer added a comment - Huh - does this happen consistently? I can't get it to reproduce so far. The underlying issue is, obviously, something in the serialization that gets self-referential. My rough guess is that the second error is a side effect of the stack overflow, but I can't be sure.

          Andrew Bayer added a comment -

          Oh, and does it always fail at the same point, i.e., right after [Pipeline] [p1] { (p1s2)?

          Andrew Bayer added a comment - Oh, and does it always fail at the same point, i.e., right after [Pipeline] [p1] { (p1s2) ?

          Luke Ross added a comment -

          Yeah it fails every time and fails at the same point every time.

          I just ran 10 more instances of the pipeline to confirm this.

          Luke Ross added a comment - Yeah it fails every time and fails at the same point every time. I just ran 10 more instances of the pipeline to confirm this.

          Louis Heche added a comment -

          I had the same problem, and we fixed it by allowing more stack to the each thread of the JVM. You can do that by modifying the Jenkins.xml file, add the parameter -Xss with the size of the stack you want to allow to each thread

          Louis Heche added a comment - I had the same problem, and we fixed it by allowing more stack to the each thread of the JVM. You can do that by modifying the Jenkins.xml file, add the parameter -Xss with the size of the stack you want to allow to each thread

          Stuart Smith added a comment -

          Louis - I don't suppose you had just updated any plugins to run into this so recently did you? We did quite a major plugin upgrade over the weekend (12th Jan) including many pipeline plugins to the latest version and this has started happening since then. We are trying to track down the plug in that has caused this. Pipeline plugins updated to the latest version were:

          • pipeline model definition
          • pipeline stage tags metadata
          • pipeline model extensions
          • pipeline model api

          There were around 25 other updates that we deployed but these are the ones we are initially thinking may have introduced something

          Stuart Smith added a comment - Louis - I don't suppose you had just updated any plugins to run into this so recently did you? We did quite a major plugin upgrade over the weekend (12th Jan) including many pipeline plugins to the latest version and this has started happening since then. We are trying to track down the plug in that has caused this. Pipeline plugins updated to the latest version were: pipeline model definition pipeline stage tags metadata pipeline model extensions pipeline model api There were around 25 other updates that we deployed but these are the ones we are initially thinking may have introduced something

          Louis Heche added a comment -

          Yes exactly we haven't done any update, this problem appear when we add some steps in our pipeline. 

           

          We do have the following plugin installed

          • pipeline model api 1.3.4
          • pipeline stage tags metadata 1.3.4

          But we don't have the plugins pipeline model definition and pipeline model extensions installed

          Louis Heche added a comment - Yes exactly we haven't done any update, this problem appear when we add some steps in our pipeline.    We do have the following plugin installed pipeline model api 1.3.4 pipeline stage tags metadata 1.3.4 But we don't have the plugins pipeline model definition and pipeline model extensions installed

          Erik Miller added a comment -

          Happening for me as well. I have the most up to date Jenkins and all plugins.

          Erik Miller added a comment - Happening for me as well. I have the most up to date Jenkins and all plugins.

          I've hit the same issue, the latest version of Jenkins + all plugins. Looks to be a serialization issue when running parallel pipelines. Has this been investigated further? 

          Thomas Hutchins added a comment - I've hit the same issue, the latest version of Jenkins + all plugins. Looks to be a serialization issue when running parallel pipelines. Has this been investigated further? 

          loongle tse added a comment -

          I've hit the same issue, the latest version of Jenkins + all plugins.

          loongle tse added a comment - I've hit the same issue, the latest version of Jenkins + all plugins.

          loongle tse added a comment -

          Now I can only roll back the whole package to work properly

          loongle tse added a comment - Now I can only roll back the whole package to work properly

          I've found a temporary workaround to set Manage Jenkins->Configure System->Pipeline Speed/Durability Level to "Performance-optimized". This "Avoids writing data with every step, avoids atomic writes of data. Pipelines can resume if Jenkins shuts down cleanly, but running pipelines lose step information and cannot resume." 

           

          You can also set this option on an individual pipeline to maintain durability for any non-parallel pipelines you have running.

           

          This has gotten things working for me so far, hopefully this is fixed soon. 

          Thomas Hutchins added a comment - I've found a temporary workaround to set Manage Jenkins->Configure System->Pipeline Speed/Durability Level to "Performance-optimized". This "Avoids writing data with every step, avoids atomic writes of data. Pipelines can resume if Jenkins shuts down cleanly, but running pipelines lose step information and cannot resume."    You can also set this option on an individual pipeline to maintain durability for any non-parallel pipelines you have running.   This has gotten things working for me so far, hopefully this is fixed soon. 

          Devin Nusbaum added a comment - - edited

          Does anyone have a minimal and self-contained reproduction case? It is possible that this is a problem in the 2.x version of the JBoss marshalling library used by Pipeline (hence why more users are seeing it after upgrading to workflow-support 3.x), or perhaps a subtle change in some Pipeline-related plugin has caused it to create cyclic data structures that are not being handled correctly during serialization?

          Devin Nusbaum added a comment - - edited Does anyone have a minimal and self-contained reproduction case? It is possible that this is a problem in the 2.x version of the JBoss marshalling library used by Pipeline (hence why more users are seeing it after upgrading to workflow-support 3.x), or perhaps a subtle change in some Pipeline-related plugin has caused it to create cyclic data structures that are not being handled correctly during serialization?

          Devin Nusbaum added a comment -

          See also https://github.com/jenkinsci/ansicolor-plugin/issues/148, where a user reported that removing `ansicolor` caused the problem to go away. I don't think ansicolor is really related to the issue, but if subtle changes to the Pipeline make a difference in whether a StackOverflowError occurs, then perhaps this is not infinite recursion, and JBoss Marshalling 2.x has just increased the number of recursive calls that are made under normal circumstances. Hard to say for sure without a good reproduction.

          Devin Nusbaum added a comment - See also https://github.com/jenkinsci/ansicolor-plugin/issues/148 , where a user reported that removing `ansicolor` caused the problem to go away. I don't think ansicolor is really related to the issue, but if subtle changes to the Pipeline make a difference in whether a StackOverflowError occurs, then perhaps this is not infinite recursion, and JBoss Marshalling 2.x has just increased the number of recursive calls that are made under normal circumstances. Hard to say for sure without a good reproduction.

          loongle tse added a comment - - edited

          dnusbaum I provide the code for my loop section, and you can try it yourself,

          !# grovvy
          def call(path) {
              def changeLogSets = currentBuild.changeSets
              def canNext = false
              if (changeLogSets.size() <= 0){
                  return true
              }
              for (int i = 0; i < changeLogSets.size(); i++) {
                  def entries = changeLogSets[i].items
                  for (int j = 0; j < entries.length; j++) {
                      def entry = entries[j]
                      echo "${entry.commitId} by ${entry.author} on ${new Date(entry.timestamp)}: ${entry.msg}"
                      def files = new ArrayList(entry.affectedFiles)
                      for (int k = 0; k < files.size(); k++) {
                          def file = files[k]
                          if (file.path.contains(path)) {
                              echo "path  ===》 ${path} "
                              canNext = true
                              break;
                          }
                      }
                      if(canNext){
                          break;
                      }
                  }
              }
          
              return canNext
          }
          

          loongle tse added a comment - - edited dnusbaum I provide the code for my loop section, and you can try it yourself, !# grovvy def call(path) { def changeLogSets = currentBuild.changeSets def canNext = false if (changeLogSets.size() <= 0){ return true } for ( int i = 0; i < changeLogSets.size(); i++) { def entries = changeLogSets[i].items for ( int j = 0; j < entries.length; j++) { def entry = entries[j] echo "${entry.commitId} by ${entry.author} on ${ new Date(entry.timestamp)}: ${entry.msg}" def files = new ArrayList(entry.affectedFiles) for ( int k = 0; k < files.size(); k++) { def file = files[k] if (file.path.contains(path)) { echo "path ===》 ${path} " canNext = true break ; } } if (canNext){ break ; } } } return canNext }

          loongle tse added a comment -

          I don't think the upgrade will have an effect, but the fact is it will cause StackOverflow.As soon as I upgrade the plug-in associated with the Pipline

          loongle tse added a comment - I don't think the upgrade will have an effect, but the fact is it will cause StackOverflow.As soon as I upgrade the plug-in associated with the Pipline

          Hannes Kayser added a comment -

          We face the same issue. It causes builds to fail, the executor is stuck afterwards, but the stage seems to be null (displays only "part" instead of the current stage name).

           

          Pipeline console output is
          java.lang.StackOverflowError
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:114)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040)
          [...]
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1019)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:920)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040)

           

          The Jenkins error log contains the following:

          Mai 03, 2019 7:05:54 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
          WARNUNG: Unexpected exception in CPS VM thread: CpsFlowExecution[Owner[<JOB_NAME>/<BUILD>]]
          java.lang.StackOverflowError

          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:114)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1019)
          [...]

          I suspected the HTTP Request Plugin to be responsible for the failure and removed it from the Pipeline code. The removal reduced the occurance, but did not remove it completely.

          Any idea how to process with this issue?

          Hannes Kayser added a comment - We face the same issue. It causes builds to fail, the executor is stuck afterwards, but the stage seems to be null (displays only "part" instead of the current stage name).   Pipeline console output is java.lang.StackOverflowError at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:114) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040) [...] at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1019) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:920) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040)   The Jenkins error log contains the following: Mai 03, 2019 7:05:54 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem WARNUNG: Unexpected exception in CPS VM thread: CpsFlowExecution[Owner [<JOB_NAME>/<BUILD>] ] java.lang.StackOverflowError at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:114) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1019) [...] I suspected the HTTP Request Plugin to be responsible for the failure and removed it from the Pipeline code. The removal reduced the occurance, but did not remove it completely. Any idea how to process with this issue?

          loongle tse added a comment -

          Maybe it has something to do with host memory, I'm using a 1 gigabyte raspberry pie.Switching to a 4G PC won't cause this problem

          loongle tse added a comment - Maybe it has something to do with host memory, I'm using a 1 gigabyte raspberry pie.Switching to a 4G PC won't cause this problem

          Chris Frolik added a comment -

          I'm not sure why this issue is marked as Minor. It has a pretty dramatic impact on us, and a fix would be much appreciated.

          Chris Frolik added a comment - I'm not sure why this issue is marked as Minor. It has a pretty dramatic impact on us, and a fix would be much appreciated.

          I also have this problem with our declarative pipeline, but we do not use parallel steps.

          We run Jenkins Jenkins 2.176.1 but I had it with 2.150.1 as well.

           

          I think it happened from one day to another a few days back. Cannot remember having done any change to Jenkins or the pipeline that caused this to happen.

           I though updating Jenkins and all installed Plugins might fix the issue, but it did not.

          The things I noticed:

          • it happens 9 out of 10 builds
          • when it happens then consistent at the same step
          • when chaining the pipeline to mediate the issue it happens at another step (and stays there) – i initially thought it might be related to credentials (withCredentials, withAWS)
          • When i reduce the size of the pipeline it does not happen. But then of course we can not use the result.
          • so far it only happened on my Windows machine, never during a mac build
          • System Log: 
            • Unexpected exception in CPS VM thread: CpsFlowExecution
            • java.lang.IllegalStateException: JENKINS-50407: no loaded shell in CpsFlowExecution

          Bastian Brodbeck added a comment - I also have this problem with our declarative pipeline, but we do not use parallel steps. We run Jenkins Jenkins 2.176.1 but I had it with 2.150.1 as well.   I think it happened from one day to another a few days back. Cannot remember having done any change to Jenkins or the pipeline that caused this to happen.  I though updating Jenkins and all installed Plugins might fix the issue, but it did not. The things I noticed: it happens 9 out of 10 builds when it happens then consistent at the same step when chaining the pipeline to mediate the issue it happens at another step (and stays there) – i initially thought it might be related to credentials (withCredentials, withAWS) When i reduce the size of the pipeline it does not happen. But then of course we can not use the result. so far it only happened on my Windows machine, never during a mac build System Log:  Unexpected exception in CPS VM thread: CpsFlowExecution java.lang.IllegalStateException: JENKINS-50407 : no loaded shell in CpsFlowExecution

          Andrew Ching added a comment - - edited

          I encountered this issue too. I am using Jenkins on Windows which was installed using the installer. After some digging, I realized that this distribution of Jenkins comes packaged with a 32-bit version of the JRE, and it is used by the Windows Service (which uses the jenkins.xml file). This severely limits the amount of heap memory the JVM can allocate. If you're facing this issue in the same situation, modify jenkins.xml to use a different, 64-bit version of JRE and also increase the max heap allocation (e.g. -Xmx1024m).

          Andrew Ching added a comment - - edited I encountered this issue too. I am using Jenkins on Windows which was installed using the installer. After some digging, I realized that this distribution of Jenkins comes packaged with a 32-bit version of the JRE, and it is used by the Windows Service (which uses the jenkins.xml file). This severely limits the amount of heap memory the JVM can allocate. If you're facing this issue in the same situation, modify jenkins.xml to use a different, 64-bit version of JRE and also increase the max heap allocation (e.g. -Xmx1024m).

          +1 for fayf86 status and solution.

          I initially used Windows installer to avoid Java version and configuration issues but they came back anyway!

           

          Arnaud Richard added a comment - +1 for fayf86 status and solution. I initially used Windows installer to avoid Java version and configuration issues but they came back anyway!  

          Lance Swoboda added a comment -

          I was able to finally overcome this problem by taking these steps:

          1. Update the jenkins.xml file to use a 64bit JRE
          2. increase max heap size to 1024m for the JRE (in the jenkins.xml file)
          3. increase the stack size to 4m in the jenkins.xml as well (-Xss4m)

          hope this is helpful for someone else.

          Lance Swoboda added a comment - I was able to finally overcome this problem by taking these steps: Update the jenkins.xml file to use a 64bit JRE increase max heap size to 1024m for the JRE (in the jenkins.xml file) increase the stack size to 4m in the jenkins.xml as well (-Xss4m) hope this is helpful for someone else.

          Just go the same on a pipeline using parallel steps (with different Kubernetes agent) after Jenkins restart. All is up to date with latest LTS (2.414.1).

          There is enough memory heap, so probably need to increase the stack size. But I don't found any recommendation on Jenkins docs and not sure what are the implication of this. Controller is running on K8S using helm chart

          Valentin Delaye added a comment - Just go the same on a pipeline using parallel steps (with different Kubernetes agent) after Jenkins restart. All is up to date with latest LTS (2.414.1). There is enough memory heap, so probably need to increase the stack size. But I don't found any recommendation on Jenkins docs and not sure what are the implication of this. Controller is running on K8S using helm chart

            Unassigned Unassigned
            lukeross Luke Ross
            Votes:
            27 Vote for this issue
            Watchers:
            39 Start watching this issue

              Created:
              Updated: