Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59083

Builds stuck on "is already in progress" forever

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • workflow-job-plugin
    • Centos 7.2.1511 (master and slaves)
      Jenkins 2.176.2
      Jenkins master - open-jdk 1.8.0_212
      Jenkins slaves - Oracle jdk 1.8.0_60

      Jenkins running as docker container (standalone)
      Attached plugin versions as file

    • workflow-job 2.35, workflow-cps 2.74

      After upgrading our jenkins master and plugins, some of our build jobs - specifically it seemed like most pipeline jobs - were stuck waiting for previous build to complete, even though it already completed.

       

      This seems to have been caused by JENKINS-46076.

      After donwngrading the plugin to version 2.32, all builds were working correctly.

       

      From some testing, it seems that calling

      Jenkins.instance.getItemByFullName(...).getBuildByNumber(..).isLogUpdated()
      

      On the previous build (in the sample case build #185) returns true, although the build is completed and the log seems to have been completed.

      This issue happened immediately after restarting the jenkins instance with any of our pipeline jobs

        1. stuck-build-2.tar.gz
          16 kB
        2. failed.PNG
          failed.PNG
          70 kB
        3. Capture2.PNG
          Capture2.PNG
          66 kB
        4. tdump
          75 kB
        5. Capture.PNG
          Capture.PNG
          70 kB
        6. image.png
          image.png
          19 kB
        7. plugins.txt
          5 kB

          [JENKINS-59083] Builds stuck on "is already in progress" forever

          Jesse Glick added a comment -

          The image is not loadable anonymously. Use Attach files to copy it to this report.

          If logUpdated is true on a build which appears to be complete, then something went wrong with that build. I really cannot speculate what that might be without steps to reproduce from scratch.

          Jesse Glick added a comment - The image is not loadable anonymously. Use Attach files to copy it to this report. If logUpdated is true on a build which appears to be complete, then something went wrong with that build. I really cannot speculate what that might be without steps to reproduce from scratch.

          Devin Nusbaum added a comment -

          Perhaps related to issues like JENKINS-45571, JENKINS-53223, and JENKINS-50199, where Pipeline builds appear to have completed but are still running in some sense (symptoms include flyweight executors sticking around, and that the "completed" builds resume when Jenkins restarts). I am suspicious of a bug in Pipeline shutdown/cleanup code, but I'm not really sure.

          hellspam Can you reproduce the problem consistently or is it intermittent? Do you see any messages in your Jenkins system logs that seem relevant? Do the old builds that look like they completed resume if you restart Jenkins? What durability setting are you using? If you could upload the full build folder ($JENKINS_HOME/jobs/$JOB_NAME/builds/$BUILD_NUMBER) of one of the jobs that is hung, it might help diagnose the problem.

          Devin Nusbaum added a comment - Perhaps related to issues like JENKINS-45571 , JENKINS-53223 , and JENKINS-50199 , where Pipeline builds appear to have completed but are still running in some sense (symptoms include flyweight executors sticking around, and that the "completed" builds resume when Jenkins restarts). I am suspicious of a bug in Pipeline shutdown/cleanup code, but I'm not really sure. hellspam Can you reproduce the problem consistently or is it intermittent? Do you see any messages in your Jenkins system logs that seem relevant? Do the old builds that look like they completed resume if you restart Jenkins? What durability setting are you using? If you could upload the full build folder ( $JENKINS_HOME/jobs/$JOB_NAME/builds/$BUILD_NUMBER ) of one of the jobs that is hung, it might help diagnose the problem.

          Jesse Glick added a comment -

          Yes, I suspect this is not so much an introduced bug as a change which made an existing bug more visible by treating termination conditions more consistently.

          Jesse Glick added a comment - Yes, I suspect this is not so much an introduced bug as a change which made an existing bug more visible by treating termination conditions more consistently.

          Jesse Glick added a comment -

          Judging by the cause of blockage, I presume your project is set to disable concurrent builds. (The default for Pipeline is to permit them.)

          Jesse Glick added a comment - Judging by the cause of blockage, I presume your project is set to disable concurrent builds. (The default for Pipeline is to permit them.)

          Roy Arnon added a comment -

          Hi,

          • I can reproduce the error, I am not near a computer right now so I will have that for you by tomorrow.
          • Yes, in this specific project - and probably all stuck builds - we have disabled concurrent builds.

          Roy Arnon added a comment - Hi, I can reproduce the error, I am not near a computer right now so I will have that for you by tomorrow. Yes, in this specific project - and probably all stuck builds - we have disabled concurrent builds.

          Roy Arnon added a comment - - edited

          Hi,

          I've uploaded a build that is stuck and an image of the state in jenkins. The build I uploaded is build #1913 - I aborted it (the process it was testing was just taking too much time, some misconfiguration probably), but jenkins still does not allow next builds to start - [^stuck-build.tar.gz]

          I'v also attached a thread dump - tdump

          After a restart, these "stuck" builds immediately start. The jenkins logs do not seem to contain anything relevant.

          Regarding durability settings - this job is using the default (PERFORMANCE) but I've seen this issue on jobs that are set to DURABILITY as well.

          Let me know if you need anything else.

           

          Roy Arnon added a comment - - edited Hi, I've uploaded a build that is stuck and an image of the state in jenkins. The build I uploaded is build #1913 - I aborted it (the process it was testing was just taking too much time, some misconfiguration probably), but jenkins still does not allow next builds to start -  [^stuck-build.tar.gz] I'v also attached a thread dump -  tdump After a restart, these "stuck" builds immediately start. The jenkins logs do not seem to contain anything relevant. Regarding durability settings - this job is using the default (PERFORMANCE) but I've seen this issue on jobs that are set to DURABILITY as well. Let me know if you need anything else.  

          Jesse Glick added a comment -

          Unfortunately I see nothing problematic in the screenshot or either of the attachments, so unless you have a known way to reproduce the problem from scratch I doubt this is going anywhere.

          Jesse Glick added a comment - Unfortunately I see nothing problematic in the screenshot or either of the attachments, so unless you have a known way to reproduce the problem from scratch I doubt this is going anywhere.

          Roy Arnon added a comment -

          I do not get what you mean nothing problematic in the screenshot.

          This jenkins instance is still stuck forever on this build. I have tried restarting it and it reproduces again after completing one build. Is that not enough of a reproduction?

          Roy Arnon added a comment - I do not get what you mean nothing problematic in the screenshot. This jenkins instance is still stuck forever on this build. I have tried restarting it and it reproduces again after completing one build. Is that not enough of a reproduction?

          Oleg Kalugin added a comment -

          Hey, I can confirm experiencing the same problem of builds stuck in "already in progress" condition. I had to delete the previous build for it to move on every time.

          Unfortunately I didn't grab any logs or screenshots before downgrading because I had a bunch of work dependent on this, so my main concern was trying to get it fixed.

          The common traits with the builds that I noticed were:

          • All of them were multibranch pipeline jobs
          • Concurrent builds were disabled
          • And this one seemed the most outstanding (pure speculation on my part) - each build was kicking off downstream jobs that are defined though properties.pipelineTriggers.upstream.upstreamProjects in those downstream projects

          Oleg Kalugin added a comment - Hey, I can confirm experiencing the same problem of builds stuck in "already in progress" condition. I had to delete the previous build for it to move on every time. Unfortunately I didn't grab any logs or screenshots before downgrading because I had a bunch of work dependent on this, so my main concern was trying to get it fixed. The common traits with the builds that I noticed were: All of them were multibranch pipeline jobs Concurrent builds were disabled And this one seemed the most outstanding (pure speculation on my part) - each build was kicking off downstream jobs that are defined though properties.pipelineTriggers.upstream.upstreamProjects in those downstream projects

          Roy Arnon added a comment -

          I went ahead and reproduced it again.

          This time I simulated a failure in the pipeline job - you can see build #1916 waiting for the failed build #1915.

          I have attached the builds folder again - stuck-build-2.tar.gz

           

          Roy Arnon added a comment - I went ahead and reproduced it again. This time I simulated a failure in the pipeline job - you can see build #1916 waiting for the failed build #1915. I have attached the builds folder again -  stuck-build-2.tar.gz  

          Jesse Glick added a comment -

          I do not get what you mean nothing problematic in the screenshot.

          Sorry, I meant nothing problematic other than the new build waiting, which is probably just due to the incorrect return value from isLogUpdated mentioned to begin with. Everything else looks normal enough.

          each build was kicking off downstream jobs

          A plausible lead, though apparently not the explanation for hellspam.

          Jesse Glick added a comment - I do not get what you mean nothing problematic in the screenshot. Sorry, I meant nothing problematic other than the new build waiting, which is probably just due to the incorrect return value from isLogUpdated mentioned to begin with. Everything else looks normal enough. each build was kicking off downstream jobs A plausible lead, though apparently not the explanation for hellspam .

          Jesse Glick added a comment - - edited

          Anyone affected should please try upgrading to this build. Purely speculative, but in the absence of any known steps to reproduce that is the best we can do.

          Jesse Glick added a comment - - edited Anyone affected should please try upgrading to this build . Purely speculative, but in the absence of any known steps to reproduce that is the best we can do.

          Roy Arnon added a comment -

          Hi,

          I can confirm that the build you've uploaded fixes the issue for us. Thanks!

          Roy Arnon added a comment - Hi, I can confirm that the build you've uploaded fixes the issue for us. Thanks!

          Jesse Glick added a comment -

          hellspam good; now do you see any stack traces in your system log starting with

          java.lang.IllegalStateException: trying to open a build log on … after it has completed
          

          ? If so, we need to see the stack trace to try to understand the root cause.

          Jesse Glick added a comment - hellspam good; now do you see any stack traces in your system log starting with java.lang.IllegalStateException: trying to open a build log on … after it has completed ? If so, we need to see the stack trace to try to understand the root cause.

          Roy Arnon added a comment -

          Yes, I do see a stack trace like that in the log:

          Aug 27, 2019 7:37:09 PM org.jenkinsci.plugins.workflow.job.WorkflowRun getListener
          WARNING: null
          java.lang.IllegalStateException: trying to open a build log on Lint.Hooks #1923 after it has completed
                  at org.jenkinsci.plugins.workflow.job.WorkflowRun.getListener(WorkflowRun.java:219)
                  at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$300(WorkflowRun.java:133)
                  at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.getListener(WorkflowRun.java:955)
                  at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.getListener(EnvActionImpl.java:77)
                  at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.getEnvironment(EnvActionImpl.java:69)
                  at org.jenkinsci.plugin.taboola.ElasticSearchManager.getConsoleEnvs(ElasticSearchManager.java:277)
                  at org.jenkinsci.plugin.taboola.ElasticSearchManager.addCommonDataAndSend(ElasticSearchManager.java:110)
                  at org.jenkinsci.plugin.taboola.ElasticSearchManager.sendStageData(ElasticSearchManager.java:179)
                  at org.jenkinsci.plugin.taboola.events.TaboolaStagesListener.onNewHead(TaboolaStagesListener.java:145)
                  at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1463)
                  at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:458)
                  at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:37)
                  at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
                  at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
                  at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                  at java.lang.Thread.run(Thread.java:748)
          
          

          Roy Arnon added a comment - Yes, I do see a stack trace like that in the log: Aug 27, 2019 7:37:09 PM org.jenkinsci.plugins.workflow.job.WorkflowRun getListener WARNING: null java.lang.IllegalStateException: trying to open a build log on Lint.Hooks #1923 after it has completed at org.jenkinsci.plugins.workflow.job.WorkflowRun.getListener(WorkflowRun.java:219) at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$300(WorkflowRun.java:133) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.getListener(WorkflowRun.java:955) at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.getListener(EnvActionImpl.java:77) at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.getEnvironment(EnvActionImpl.java:69) at org.jenkinsci.plugin.taboola.ElasticSearchManager.getConsoleEnvs(ElasticSearchManager.java:277) at org.jenkinsci.plugin.taboola.ElasticSearchManager.addCommonDataAndSend(ElasticSearchManager.java:110) at org.jenkinsci.plugin.taboola.ElasticSearchManager.sendStageData(ElasticSearchManager.java:179) at org.jenkinsci.plugin.taboola.events.TaboolaStagesListener.onNewHead(TaboolaStagesListener.java:145) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1463) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:458) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:37) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748)

          Roy Arnon added a comment -

          Actually, taking a look at it, it seems it has to do with one of our own plugins - I will look into that.

          Roy Arnon added a comment - Actually, taking a look at it, it seems it has to do with one of our own plugins - I will look into that.

          Jesse Glick added a comment -

          workflow-job #140 should suppress the stack trace, which at least in this case seems to be innocuous.

          Jesse Glick added a comment - workflow-job #140 should suppress the stack trace, which at least in this case seems to be innocuous.

          Devin Nusbaum added a comment - - edited

          Ok, at least one way that this could be happening should be fixed as of Pipeline: Job Plugin 2.35. If you end up finding this ticket when running Pipeline: Job Plugin 2.35 or higher because you are seeing the following error in your logs, please file a new ticket, and include the full stack trace:

          java.lang.IllegalStateException: trying to open a build log on … after it has completed
          

          EDIT: Note that if you are not running Pipeline: Groovy Plugin 2.74 or later, you will see this error message for one case that has already been fixed, so I would recommend upgrading to Pipeline: Groovy Plugin 2.74 or later.

          Devin Nusbaum added a comment - - edited Ok, at least one way that this could be happening should be fixed as of Pipeline: Job Plugin 2.35. If you end up finding this ticket when running Pipeline: Job Plugin 2.35 or higher because you are seeing the following error in your logs, please file a new ticket, and include the full stack trace: java.lang.IllegalStateException: trying to open a build log on … after it has completed EDIT: Note that if you are not running Pipeline: Groovy Plugin 2.74 or later, you will see this error message for one case that has already been fixed, so I would recommend upgrading to Pipeline: Groovy Plugin 2.74 or later.

          Jesse Glick added a comment -

          upgrading to Pipeline: Job Plugin 2.35 or newer which will automatically update Pipeline: Groovy Plugin as well

          No it will not—this is just a test dependency.

          Plugin-Dependencies: workflow-api:2.36,workflow-step-api:2.20,workflow-support:3.3
          

          Jesse Glick added a comment - upgrading to Pipeline: Job Plugin 2.35 or newer which will automatically update Pipeline: Groovy Plugin as well No it will not—this is just a test dependency. Plugin-Dependencies: workflow-api:2.36,workflow-step-api:2.20,workflow-support:3.3

            jglick Jesse Glick
            hellspam Roy Arnon
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: