Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52966

Two sequential stages in a parallel stage in a declarative pipeline making use of the same agent can cause a StackOverflowError

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • Jenkins ver 2.121.2
      Windows Server 2016
      Pipeline Plugin: 2.5
      Pipeline Declarative Plugin: 1.3.1

      When running the below pipeline if I have 2 agents that are available and are using the RH7 label then everything completes as expected. However If I only have 1 available agent on the RH7 label either because the other agents are busy or offline then the job will fail with the below error message. Every machine on the RH7 label has a single executor.

      Additionally whilst the job is marked as failed, the agent that tried to run the job still shows it as running. I don't know if it would eventually time out but after a few minutes it still shows the job "running" in the executor window. Cancelling the job in the executor window returns the agent to a usable state.

      Pipeline:

      #!groovy
      
      pipeline {
          agent none
      
          stages {
              stage ("p") {
                  parallel {
                      stage ("p1") {
                          agent { label "RH7" }
      
                          stages {
                              stage ("p1s1") {
                                  steps {
                                      echo "Hello in p1s1"
                                  }
                              }
      
                              stage ("p1s2") {
                                  steps {
                                      echo "Hello in p1s2"
                                  }
                              }
                          }
                      }
      
                      stage ("p2") {
                          agent { label "RH7" }
      
                          stages {
                              stage ("p2s1") {
                                  steps {
                                      echo "Hello in p2s1"
                                  }
                              }
                          }
                      }
                  }
              }
          }
      }
      

      Error Message:

      Running in Durability level: MAX_SURVIVABILITY
      [Pipeline] stage
      [Pipeline] { (p)
      [Pipeline] parallel
      [Pipeline] [p1] { (Branch: p1)
      [Pipeline] [p2] { (Branch: p2)
      [Pipeline] [p1] stage
      [Pipeline] [p1] { (p1)
      [Pipeline] [p2] stage
      [Pipeline] [p2] { (p2)
      [Pipeline] [p1] node
      [p1] Running on Red Hat 7 - 2 in /jenkins/workspace/Problem@2
      [Pipeline] [p2] node
      [Pipeline] [p1] {
      [Pipeline] [p1] stage
      [Pipeline] [p1] { (p1s1)
      [Pipeline] [p1] echo
      [p1] Hello in p1s1
      [Pipeline] [p1] }
      [Pipeline] [p1] // stage
      [Pipeline] [p1] stage
      [Pipeline] [p1] { (p1s2)
      [Pipeline] End of Pipeline
      java.lang.StackOverflowError
      at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:111)
      at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
      at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
      at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:854)
      at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1032)
      at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:988)
      TRUNCATED SEE ATTACHED LOG

      Also within the system log there is the following additional error:

      Aug 09, 2018 9:11:10 PM WARNING org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem

      Unexpected exception in CPS VM thread: CpsFlowExecutionOwner[Problem/39:Problem #39
      java.lang.IllegalStateException: JENKINS-50407: no loaded shell in CpsFlowExecutionOwner[Problem/39:Problem #39
      at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:52)
      at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
      at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:332)
      at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:83)
      at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:244)
      at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:232)
      at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.lang.Thread.run(Unknown Source)

      If there is any additional information you require please let me know.

          [JENKINS-52966] Two sequential stages in a parallel stage in a declarative pipeline making use of the same agent can cause a StackOverflowError

          loongle tse added a comment - - edited

          dnusbaum I provide the code for my loop section, and you can try it yourself,

          !# grovvy
          def call(path) {
              def changeLogSets = currentBuild.changeSets
              def canNext = false
              if (changeLogSets.size() <= 0){
                  return true
              }
              for (int i = 0; i < changeLogSets.size(); i++) {
                  def entries = changeLogSets[i].items
                  for (int j = 0; j < entries.length; j++) {
                      def entry = entries[j]
                      echo "${entry.commitId} by ${entry.author} on ${new Date(entry.timestamp)}: ${entry.msg}"
                      def files = new ArrayList(entry.affectedFiles)
                      for (int k = 0; k < files.size(); k++) {
                          def file = files[k]
                          if (file.path.contains(path)) {
                              echo "path  ===》 ${path} "
                              canNext = true
                              break;
                          }
                      }
                      if(canNext){
                          break;
                      }
                  }
              }
          
              return canNext
          }
          

          loongle tse added a comment - - edited dnusbaum I provide the code for my loop section, and you can try it yourself, !# grovvy def call(path) { def changeLogSets = currentBuild.changeSets def canNext = false if (changeLogSets.size() <= 0){ return true } for ( int i = 0; i < changeLogSets.size(); i++) { def entries = changeLogSets[i].items for ( int j = 0; j < entries.length; j++) { def entry = entries[j] echo "${entry.commitId} by ${entry.author} on ${ new Date(entry.timestamp)}: ${entry.msg}" def files = new ArrayList(entry.affectedFiles) for ( int k = 0; k < files.size(); k++) { def file = files[k] if (file.path.contains(path)) { echo "path ===》 ${path} " canNext = true break ; } } if (canNext){ break ; } } } return canNext }

          loongle tse added a comment -

          I don't think the upgrade will have an effect, but the fact is it will cause StackOverflow.As soon as I upgrade the plug-in associated with the Pipline

          loongle tse added a comment - I don't think the upgrade will have an effect, but the fact is it will cause StackOverflow.As soon as I upgrade the plug-in associated with the Pipline

          Hannes Kayser added a comment -

          We face the same issue. It causes builds to fail, the executor is stuck afterwards, but the stage seems to be null (displays only "part" instead of the current stage name).

           

          Pipeline console output is
          java.lang.StackOverflowError
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:114)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040)
          [...]
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1019)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:920)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040)

           

          The Jenkins error log contains the following:

          Mai 03, 2019 7:05:54 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
          WARNUNG: Unexpected exception in CPS VM thread: CpsFlowExecution[Owner[<JOB_NAME>/<BUILD>]]
          java.lang.StackOverflowError

          at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:114)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040)
          at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1019)
          [...]

          I suspected the HTTP Request Plugin to be responsible for the failure and removed it from the Pipeline code. The removal reduced the occurance, but did not remove it completely.

          Any idea how to process with this issue?

          Hannes Kayser added a comment - We face the same issue. It causes builds to fail, the executor is stuck afterwards, but the stage seems to be null (displays only "part" instead of the current stage name).   Pipeline console output is java.lang.StackOverflowError at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:114) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040) [...] at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1019) at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:920) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040)   The Jenkins error log contains the following: Mai 03, 2019 7:05:54 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem WARNUNG: Unexpected exception in CPS VM thread: CpsFlowExecution[Owner [<JOB_NAME>/<BUILD>] ] java.lang.StackOverflowError at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:114) at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1082) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1040) at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1019) [...] I suspected the HTTP Request Plugin to be responsible for the failure and removed it from the Pipeline code. The removal reduced the occurance, but did not remove it completely. Any idea how to process with this issue?

          loongle tse added a comment -

          Maybe it has something to do with host memory, I'm using a 1 gigabyte raspberry pie.Switching to a 4G PC won't cause this problem

          loongle tse added a comment - Maybe it has something to do with host memory, I'm using a 1 gigabyte raspberry pie.Switching to a 4G PC won't cause this problem

          Chris Frolik added a comment -

          I'm not sure why this issue is marked as Minor. It has a pretty dramatic impact on us, and a fix would be much appreciated.

          Chris Frolik added a comment - I'm not sure why this issue is marked as Minor. It has a pretty dramatic impact on us, and a fix would be much appreciated.

          I also have this problem with our declarative pipeline, but we do not use parallel steps.

          We run Jenkins Jenkins 2.176.1 but I had it with 2.150.1 as well.

           

          I think it happened from one day to another a few days back. Cannot remember having done any change to Jenkins or the pipeline that caused this to happen.

           I though updating Jenkins and all installed Plugins might fix the issue, but it did not.

          The things I noticed:

          • it happens 9 out of 10 builds
          • when it happens then consistent at the same step
          • when chaining the pipeline to mediate the issue it happens at another step (and stays there) – i initially thought it might be related to credentials (withCredentials, withAWS)
          • When i reduce the size of the pipeline it does not happen. But then of course we can not use the result.
          • so far it only happened on my Windows machine, never during a mac build
          • System Log: 
            • Unexpected exception in CPS VM thread: CpsFlowExecution
            • java.lang.IllegalStateException: JENKINS-50407: no loaded shell in CpsFlowExecution

          Bastian Brodbeck added a comment - I also have this problem with our declarative pipeline, but we do not use parallel steps. We run Jenkins Jenkins 2.176.1 but I had it with 2.150.1 as well.   I think it happened from one day to another a few days back. Cannot remember having done any change to Jenkins or the pipeline that caused this to happen.  I though updating Jenkins and all installed Plugins might fix the issue, but it did not. The things I noticed: it happens 9 out of 10 builds when it happens then consistent at the same step when chaining the pipeline to mediate the issue it happens at another step (and stays there) – i initially thought it might be related to credentials (withCredentials, withAWS) When i reduce the size of the pipeline it does not happen. But then of course we can not use the result. so far it only happened on my Windows machine, never during a mac build System Log:  Unexpected exception in CPS VM thread: CpsFlowExecution java.lang.IllegalStateException: JENKINS-50407 : no loaded shell in CpsFlowExecution

          Andrew Ching added a comment - - edited

          I encountered this issue too. I am using Jenkins on Windows which was installed using the installer. After some digging, I realized that this distribution of Jenkins comes packaged with a 32-bit version of the JRE, and it is used by the Windows Service (which uses the jenkins.xml file). This severely limits the amount of heap memory the JVM can allocate. If you're facing this issue in the same situation, modify jenkins.xml to use a different, 64-bit version of JRE and also increase the max heap allocation (e.g. -Xmx1024m).

          Andrew Ching added a comment - - edited I encountered this issue too. I am using Jenkins on Windows which was installed using the installer. After some digging, I realized that this distribution of Jenkins comes packaged with a 32-bit version of the JRE, and it is used by the Windows Service (which uses the jenkins.xml file). This severely limits the amount of heap memory the JVM can allocate. If you're facing this issue in the same situation, modify jenkins.xml to use a different, 64-bit version of JRE and also increase the max heap allocation (e.g. -Xmx1024m).

          +1 for fayf86 status and solution.

          I initially used Windows installer to avoid Java version and configuration issues but they came back anyway!

           

          Arnaud Richard added a comment - +1 for fayf86 status and solution. I initially used Windows installer to avoid Java version and configuration issues but they came back anyway!  

          Lance Swoboda added a comment -

          I was able to finally overcome this problem by taking these steps:

          1. Update the jenkins.xml file to use a 64bit JRE
          2. increase max heap size to 1024m for the JRE (in the jenkins.xml file)
          3. increase the stack size to 4m in the jenkins.xml as well (-Xss4m)

          hope this is helpful for someone else.

          Lance Swoboda added a comment - I was able to finally overcome this problem by taking these steps: Update the jenkins.xml file to use a 64bit JRE increase max heap size to 1024m for the JRE (in the jenkins.xml file) increase the stack size to 4m in the jenkins.xml as well (-Xss4m) hope this is helpful for someone else.

          Just go the same on a pipeline using parallel steps (with different Kubernetes agent) after Jenkins restart. All is up to date with latest LTS (2.414.1).

          There is enough memory heap, so probably need to increase the stack size. But I don't found any recommendation on Jenkins docs and not sure what are the implication of this. Controller is running on K8S using helm chart

          Valentin Delaye added a comment - Just go the same on a pipeline using parallel steps (with different Kubernetes agent) after Jenkins restart. All is up to date with latest LTS (2.414.1). There is enough memory heap, so probably need to increase the stack size. But I don't found any recommendation on Jenkins docs and not sure what are the implication of this. Controller is running on K8S using helm chart

            Unassigned Unassigned
            lukeross Luke Ross
            Votes:
            27 Vote for this issue
            Watchers:
            39 Start watching this issue

              Created:
              Updated: