Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38383

StackOverflow when loading a large pipeline job

      I've been using infinite Pipeline for a demo. The job was running for a couple of hours, and then Jenkins master declined to the build due to the StackOverflow.

      node {
          int i = 0;
          while(true) {
              sh "echo 'Hello, world ${i}!'"
              sh "sleep 5"
              i = i + 1;
          }
      }
      

      Log: See attachment

        1. InfinitePipeline.zip
          1.45 MB
        2. log.txt
          7.50 MB
        3. workflow-job.hpi
          109 kB

          [JENKINS-38383] StackOverflow when loading a large pipeline job

          Oleg Nenashev created issue -
          Oleg Nenashev made changes -
          Attachment New: InfinitePipeline.zip [ 34026 ]
          Attachment New: log.txt [ 34027 ]

          Sam Van Oort added a comment -

          Problem is here: we're doing a recursive call to get the logs - https://github.com/jenkinsci/workflow-job-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java#L478

          Needs to be rewritten as a while(...) loop with the current search candidate inside the loop. However I understand that jglick is doing some significant rewrites of log storage at the moment, so I'll hold off touching the code to avoid collisions.

          Side note, this is explicitly a "don't do that" in my recent Jenkins World talk :-P

          Sam Van Oort added a comment - Problem is here: we're doing a recursive call to get the logs - https://github.com/jenkinsci/workflow-job-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java#L478 Needs to be rewritten as a while(...) loop with the current search candidate inside the loop. However I understand that jglick is doing some significant rewrites of log storage at the moment, so I'll hold off touching the code to avoid collisions. Side note, this is explicitly a "don't do that" in my recent Jenkins World talk :-P
          Sam Van Oort made changes -
          Priority Original: Critical [ 2 ] New: Minor [ 4 ]

          Sam Van Oort added a comment -

          Downgrading priority because it should only happen if you have to search many nodes to find one with a log.

          That said, there's some weirdness with the logic here.

          Sam Van Oort added a comment - Downgrading priority because it should only happen if you have to search many nodes to find one with a log. That said, there's some weirdness with the logic here.

          Thomas Johansen added a comment - - edited

          I get a similar symptom on startup with Jenkins version 2.102. The stacktrace is now:

          [...] com.google.common.util.concurrent.ExecutionError: java.lang.StackOverflowError
          2018-02-22T11:17:37.589287583Z  at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232)
          2018-02-22T11:17:37.589290948Z  at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
          2018-02-22T11:17:37.589293750Z  at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
          2018-02-22T11:17:37.589296421Z  at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
          2018-02-22T11:17:37.589299106Z  at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834)
          2018-02-22T11:17:37.589301839Z  at org.jenkinsci.plugins.workflow.job.WorkflowRun.getLogPrefix(WorkflowRun.java:556)
          2018-02-22T11:17:37.589304561Z  at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$900(WorkflowRun.java:134)
          2018-02-22T11:17:37.589308088Z  at org.jenkinsci.plugins.workflow.job.WorkflowRun$6.load(WorkflowRun.java:547)
          2018-02-22T11:17:37.589310809Z  at org.jenkinsci.plugins.workflow.job.WorkflowRun$6.load(WorkflowRun.java:537)
          2018-02-22T11:17:37.589313487Z  at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
          2018-02-22T11:17:37.589316198Z  at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
          2018-02-22T11:17:37.589318881Z  at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
          2018-02-22T11:17:37.589323997Z  at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
          2018-02-22T11:17:37.589326739Z  ... 222 more
          2018-02-22T11:17:37.589333918Z Caused by: com.google.common.util.concurrent.ExecutionError [...]

          Jenkins is then irresponsive for some minutes, and also it seems like the jobs resumed while in the loop is sometimes stuck after restart.

          Is this because I'm having pipeline-jobs using while-loops?

          Thomas Johansen added a comment - - edited I get a similar symptom on startup with Jenkins version 2.102. The stacktrace is now: [...] com.google.common.util.concurrent.ExecutionError: java.lang.StackOverflowError 2018-02-22T11:17:37.589287583Z at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232) 2018-02-22T11:17:37.589290948Z at com.google.common.cache.LocalCache.get(LocalCache.java:3965) 2018-02-22T11:17:37.589293750Z at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) 2018-02-22T11:17:37.589296421Z at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829) 2018-02-22T11:17:37.589299106Z at com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834) 2018-02-22T11:17:37.589301839Z at org.jenkinsci.plugins.workflow.job.WorkflowRun.getLogPrefix(WorkflowRun.java:556) 2018-02-22T11:17:37.589304561Z at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$900(WorkflowRun.java:134) 2018-02-22T11:17:37.589308088Z at org.jenkinsci.plugins.workflow.job.WorkflowRun$6.load(WorkflowRun.java:547) 2018-02-22T11:17:37.589310809Z at org.jenkinsci.plugins.workflow.job.WorkflowRun$6.load(WorkflowRun.java:537) 2018-02-22T11:17:37.589313487Z at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) 2018-02-22T11:17:37.589316198Z at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) 2018-02-22T11:17:37.589318881Z at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) 2018-02-22T11:17:37.589323997Z at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) 2018-02-22T11:17:37.589326739Z ... 222 more 2018-02-22T11:17:37.589333918Z Caused by: com.google.common.util.concurrent.ExecutionError [...] Jenkins is then irresponsive for some minutes, and also it seems like the jobs resumed while in the loop is sometimes stuck after restart. Is this because I'm having pipeline-jobs using while-loops?
          Andrew Bayer made changes -
          Component/s New: workflow-job-plugin [ 21716 ]
          Component/s Original: pipeline [ 21692 ]
          Sam Van Oort made changes -
          Assignee New: Sam Van Oort [ svanoort ]

          Sam Van Oort added a comment -

          Would be resolved by JENKINS-38381 (long handling rewrite).  This is the third bug I've seen tied to the logPrefix cache and similar logic.   

          This officially means it's time for at least a fast-patch solution to tide us over until the comprehensive solution lands with the rewrite (my next Big Work Item).

          thxmasj the issue is that your pipeline is running a very large number of steps (several thousand generally) and under the wrong circumstances this particular operation converts to rather nasty recursive call.  

          I'm going to take this on for fix in the next week or so. 

          Sam Van Oort added a comment - Would be resolved by JENKINS-38381 (long handling rewrite).  This is the third bug I've seen tied to the logPrefix cache and similar logic.    This officially means it's time for at least a fast-patch solution to tide us over until the comprehensive solution lands with the rewrite (my next Big Work Item). thxmasj the issue is that your pipeline is running a very large number of steps (several thousand generally) and under the wrong circumstances this particular operation converts to rather nasty recursive call.   I'm going to take this on for fix in the next week or so. 

          svanoort: If I understand you correctly JENKINS-38381 will resolve my issue, and you aim to fix it in the close future? I am still considering to rewrite my pipelines to avoid while-loops (which result in thousands of steps), as it seems like the Jenkins-code is not designed for this. Btw I'm using loops for polling Jira and Git. My rewrite would then be to move the loop inside a shell-step.

          Thomas Johansen added a comment - svanoort : If I understand you correctly JENKINS-38381 will resolve my issue, and you aim to fix it in the close future? I am still considering to rewrite my pipelines to avoid while-loops (which result in thousands of steps), as it seems like the Jenkins-code is not designed for this. Btw I'm using loops for polling Jira and Git. My rewrite would then be to move the loop inside a shell-step.

            svanoort Sam Van Oort
            oleg_nenashev Oleg Nenashev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: