Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59083

Builds stuck on "is already in progress" forever

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: workflow-job-plugin
    • Labels:
    • Environment:
      Centos 7.2.1511 (master and slaves)
      Jenkins 2.176.2
      Jenkins master - open-jdk 1.8.0_212
      Jenkins slaves - Oracle jdk 1.8.0_60

      Jenkins running as docker container (standalone)
      Attached plugin versions as file

    • Similar Issues:
    • Released As:
      workflow-job 2.35, workflow-cps 2.74

      Description

      After upgrading our jenkins master and plugins, some of our build jobs - specifically it seemed like most pipeline jobs - were stuck waiting for previous build to complete, even though it already completed.

       

      This seems to have been caused by JENKINS-46076.

      After donwngrading the plugin to version 2.32, all builds were working correctly.

       

      From some testing, it seems that calling

      Jenkins.instance.getItemByFullName(...).getBuildByNumber(..).isLogUpdated()
      

      On the previous build (in the sample case build #185) returns true, although the build is completed and the log seems to have been completed.

      This issue happened immediately after restarting the jenkins instance with any of our pipeline jobs

        Attachments

        1. Capture.PNG
          Capture.PNG
          70 kB
        2. Capture2.PNG
          Capture2.PNG
          66 kB
        3. failed.PNG
          failed.PNG
          70 kB
        4. image.png
          image.png
          19 kB
        5. plugins.txt
          5 kB
        6. stuck-build-2.tar.gz
          16 kB
        7. tdump
          75 kB

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            The image is not loadable anonymously. Use Attach files to copy it to this report.

            If logUpdated is true on a build which appears to be complete, then something went wrong with that build. I really cannot speculate what that might be without steps to reproduce from scratch.

            Show
            jglick Jesse Glick added a comment - The image is not loadable anonymously. Use Attach files to copy it to this report. If logUpdated is true on a build which appears to be complete, then something went wrong with that build. I really cannot speculate what that might be without steps to reproduce from scratch.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Perhaps related to issues like JENKINS-45571, JENKINS-53223, and JENKINS-50199, where Pipeline builds appear to have completed but are still running in some sense (symptoms include flyweight executors sticking around, and that the "completed" builds resume when Jenkins restarts). I am suspicious of a bug in Pipeline shutdown/cleanup code, but I'm not really sure.

            Roy Arnon Can you reproduce the problem consistently or is it intermittent? Do you see any messages in your Jenkins system logs that seem relevant? Do the old builds that look like they completed resume if you restart Jenkins? What durability setting are you using? If you could upload the full build folder ($JENKINS_HOME/jobs/$JOB_NAME/builds/$BUILD_NUMBER) of one of the jobs that is hung, it might help diagnose the problem.

            Show
            dnusbaum Devin Nusbaum added a comment - Perhaps related to issues like JENKINS-45571 , JENKINS-53223 , and JENKINS-50199 , where Pipeline builds appear to have completed but are still running in some sense (symptoms include flyweight executors sticking around, and that the "completed" builds resume when Jenkins restarts). I am suspicious of a bug in Pipeline shutdown/cleanup code, but I'm not really sure. Roy Arnon Can you reproduce the problem consistently or is it intermittent? Do you see any messages in your Jenkins system logs that seem relevant? Do the old builds that look like they completed resume if you restart Jenkins? What durability setting are you using? If you could upload the full build folder ( $JENKINS_HOME/jobs/$JOB_NAME/builds/$BUILD_NUMBER ) of one of the jobs that is hung, it might help diagnose the problem.
            Hide
            jglick Jesse Glick added a comment -

            Yes, I suspect this is not so much an introduced bug as a change which made an existing bug more visible by treating termination conditions more consistently.

            Show
            jglick Jesse Glick added a comment - Yes, I suspect this is not so much an introduced bug as a change which made an existing bug more visible by treating termination conditions more consistently.
            Hide
            jglick Jesse Glick added a comment -

            Judging by the cause of blockage, I presume your project is set to disable concurrent builds. (The default for Pipeline is to permit them.)

            Show
            jglick Jesse Glick added a comment - Judging by the cause of blockage, I presume your project is set to disable concurrent builds. (The default for Pipeline is to permit them.)
            Hide
            hellspam Roy Arnon added a comment -

            Hi,

            • I can reproduce the error, I am not near a computer right now so I will have that for you by tomorrow.
            • Yes, in this specific project - and probably all stuck builds - we have disabled concurrent builds.
            Show
            hellspam Roy Arnon added a comment - Hi, I can reproduce the error, I am not near a computer right now so I will have that for you by tomorrow. Yes, in this specific project - and probably all stuck builds - we have disabled concurrent builds.
            Hide
            hellspam Roy Arnon added a comment - - edited

            Hi,

            I've uploaded a build that is stuck and an image of the state in jenkins. The build I uploaded is build #1913 - I aborted it (the process it was testing was just taking too much time, some misconfiguration probably), but jenkins still does not allow next builds to start - [^stuck-build.tar.gz]

            I'v also attached a thread dump - tdump

            After a restart, these "stuck" builds immediately start. The jenkins logs do not seem to contain anything relevant.

            Regarding durability settings - this job is using the default (PERFORMANCE) but I've seen this issue on jobs that are set to DURABILITY as well.

            Let me know if you need anything else.

             

            Show
            hellspam Roy Arnon added a comment - - edited Hi, I've uploaded a build that is stuck and an image of the state in jenkins. The build I uploaded is build #1913 - I aborted it (the process it was testing was just taking too much time, some misconfiguration probably), but jenkins still does not allow next builds to start -  [^stuck-build.tar.gz] I'v also attached a thread dump -  tdump After a restart, these "stuck" builds immediately start. The jenkins logs do not seem to contain anything relevant. Regarding durability settings - this job is using the default (PERFORMANCE) but I've seen this issue on jobs that are set to DURABILITY as well. Let me know if you need anything else.  
            Hide
            jglick Jesse Glick added a comment -

            Unfortunately I see nothing problematic in the screenshot or either of the attachments, so unless you have a known way to reproduce the problem from scratch I doubt this is going anywhere.

            Show
            jglick Jesse Glick added a comment - Unfortunately I see nothing problematic in the screenshot or either of the attachments, so unless you have a known way to reproduce the problem from scratch I doubt this is going anywhere.
            Hide
            hellspam Roy Arnon added a comment -

            I do not get what you mean nothing problematic in the screenshot.

            This jenkins instance is still stuck forever on this build. I have tried restarting it and it reproduces again after completing one build. Is that not enough of a reproduction?

            Show
            hellspam Roy Arnon added a comment - I do not get what you mean nothing problematic in the screenshot. This jenkins instance is still stuck forever on this build. I have tried restarting it and it reproduces again after completing one build. Is that not enough of a reproduction?
            Hide
            kahluagenie Oleg Kalugin added a comment -

            Hey, I can confirm experiencing the same problem of builds stuck in "already in progress" condition. I had to delete the previous build for it to move on every time.

            Unfortunately I didn't grab any logs or screenshots before downgrading because I had a bunch of work dependent on this, so my main concern was trying to get it fixed.

            The common traits with the builds that I noticed were:

            • All of them were multibranch pipeline jobs
            • Concurrent builds were disabled
            • And this one seemed the most outstanding (pure speculation on my part) - each build was kicking off downstream jobs that are defined though properties.pipelineTriggers.upstream.upstreamProjects in those downstream projects
            Show
            kahluagenie Oleg Kalugin added a comment - Hey, I can confirm experiencing the same problem of builds stuck in "already in progress" condition. I had to delete the previous build for it to move on every time. Unfortunately I didn't grab any logs or screenshots before downgrading because I had a bunch of work dependent on this, so my main concern was trying to get it fixed. The common traits with the builds that I noticed were: All of them were multibranch pipeline jobs Concurrent builds were disabled And this one seemed the most outstanding (pure speculation on my part) - each build was kicking off downstream jobs that are defined though properties.pipelineTriggers.upstream.upstreamProjects in those downstream projects
            Hide
            hellspam Roy Arnon added a comment -

            I went ahead and reproduced it again.

            This time I simulated a failure in the pipeline job - you can see build #1916 waiting for the failed build #1915.

            I have attached the builds folder again - stuck-build-2.tar.gz

             

            Show
            hellspam Roy Arnon added a comment - I went ahead and reproduced it again. This time I simulated a failure in the pipeline job - you can see build #1916 waiting for the failed build #1915. I have attached the builds folder again -  stuck-build-2.tar.gz  
            Hide
            jglick Jesse Glick added a comment -

            I do not get what you mean nothing problematic in the screenshot.

            Sorry, I meant nothing problematic other than the new build waiting, which is probably just due to the incorrect return value from isLogUpdated mentioned to begin with. Everything else looks normal enough.

            each build was kicking off downstream jobs

            A plausible lead, though apparently not the explanation for Roy Arnon.

            Show
            jglick Jesse Glick added a comment - I do not get what you mean nothing problematic in the screenshot. Sorry, I meant nothing problematic other than the new build waiting, which is probably just due to the incorrect return value from isLogUpdated mentioned to begin with. Everything else looks normal enough. each build was kicking off downstream jobs A plausible lead, though apparently not the explanation for Roy Arnon .
            Hide
            jglick Jesse Glick added a comment - - edited

            Anyone affected should please try upgrading to this build. Purely speculative, but in the absence of any known steps to reproduce that is the best we can do.

            Show
            jglick Jesse Glick added a comment - - edited Anyone affected should please try upgrading to this build . Purely speculative, but in the absence of any known steps to reproduce that is the best we can do.
            Hide
            hellspam Roy Arnon added a comment -

            Hi,

            I can confirm that the build you've uploaded fixes the issue for us. Thanks!

            Show
            hellspam Roy Arnon added a comment - Hi, I can confirm that the build you've uploaded fixes the issue for us. Thanks!
            Hide
            jglick Jesse Glick added a comment -

            Roy Arnon good; now do you see any stack traces in your system log starting with

            java.lang.IllegalStateException: trying to open a build log on … after it has completed
            

            ? If so, we need to see the stack trace to try to understand the root cause.

            Show
            jglick Jesse Glick added a comment - Roy Arnon good; now do you see any stack traces in your system log starting with java.lang.IllegalStateException: trying to open a build log on … after it has completed ? If so, we need to see the stack trace to try to understand the root cause.
            Hide
            hellspam Roy Arnon added a comment -

            Yes, I do see a stack trace like that in the log:

            Aug 27, 2019 7:37:09 PM org.jenkinsci.plugins.workflow.job.WorkflowRun getListener
            WARNING: null
            java.lang.IllegalStateException: trying to open a build log on Lint.Hooks #1923 after it has completed
                    at org.jenkinsci.plugins.workflow.job.WorkflowRun.getListener(WorkflowRun.java:219)
                    at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$300(WorkflowRun.java:133)
                    at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.getListener(WorkflowRun.java:955)
                    at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.getListener(EnvActionImpl.java:77)
                    at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.getEnvironment(EnvActionImpl.java:69)
                    at org.jenkinsci.plugin.taboola.ElasticSearchManager.getConsoleEnvs(ElasticSearchManager.java:277)
                    at org.jenkinsci.plugin.taboola.ElasticSearchManager.addCommonDataAndSend(ElasticSearchManager.java:110)
                    at org.jenkinsci.plugin.taboola.ElasticSearchManager.sendStageData(ElasticSearchManager.java:179)
                    at org.jenkinsci.plugin.taboola.events.TaboolaStagesListener.onNewHead(TaboolaStagesListener.java:145)
                    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1463)
                    at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:458)
                    at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:37)
                    at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
                    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
                    at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                    at java.lang.Thread.run(Thread.java:748)
            
            
            Show
            hellspam Roy Arnon added a comment - Yes, I do see a stack trace like that in the log: Aug 27, 2019 7:37:09 PM org.jenkinsci.plugins.workflow.job.WorkflowRun getListener WARNING: null java.lang.IllegalStateException: trying to open a build log on Lint.Hooks #1923 after it has completed at org.jenkinsci.plugins.workflow.job.WorkflowRun.getListener(WorkflowRun.java:219) at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$300(WorkflowRun.java:133) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.getListener(WorkflowRun.java:955) at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.getListener(EnvActionImpl.java:77) at org.jenkinsci.plugins.workflow.cps.EnvActionImpl.getEnvironment(EnvActionImpl.java:69) at org.jenkinsci.plugin.taboola.ElasticSearchManager.getConsoleEnvs(ElasticSearchManager.java:277) at org.jenkinsci.plugin.taboola.ElasticSearchManager.addCommonDataAndSend(ElasticSearchManager.java:110) at org.jenkinsci.plugin.taboola.ElasticSearchManager.sendStageData(ElasticSearchManager.java:179) at org.jenkinsci.plugin.taboola.events.TaboolaStagesListener.onNewHead(TaboolaStagesListener.java:145) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1463) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:458) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:37) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748)
            Hide
            hellspam Roy Arnon added a comment -

            Actually, taking a look at it, it seems it has to do with one of our own plugins - I will look into that.

            Show
            hellspam Roy Arnon added a comment - Actually, taking a look at it, it seems it has to do with one of our own plugins - I will look into that.
            Hide
            jglick Jesse Glick added a comment -

            workflow-job #140 should suppress the stack trace, which at least in this case seems to be innocuous.

            Show
            jglick Jesse Glick added a comment - workflow-job #140 should suppress the stack trace, which at least in this case seems to be innocuous.
            Hide
            dnusbaum Devin Nusbaum added a comment - - edited

            Ok, at least one way that this could be happening should be fixed as of Pipeline: Job Plugin 2.35. If you end up finding this ticket when running Pipeline: Job Plugin 2.35 or higher because you are seeing the following error in your logs, please file a new ticket, and include the full stack trace:

            java.lang.IllegalStateException: trying to open a build log on … after it has completed
            

            EDIT: Note that if you are not running Pipeline: Groovy Plugin 2.74 or later, you will see this error message for one case that has already been fixed, so I would recommend upgrading to Pipeline: Groovy Plugin 2.74 or later.

            Show
            dnusbaum Devin Nusbaum added a comment - - edited Ok, at least one way that this could be happening should be fixed as of Pipeline: Job Plugin 2.35. If you end up finding this ticket when running Pipeline: Job Plugin 2.35 or higher because you are seeing the following error in your logs, please file a new ticket, and include the full stack trace: java.lang.IllegalStateException: trying to open a build log on … after it has completed EDIT: Note that if you are not running Pipeline: Groovy Plugin 2.74 or later, you will see this error message for one case that has already been fixed, so I would recommend upgrading to Pipeline: Groovy Plugin 2.74 or later.
            Hide
            jglick Jesse Glick added a comment -

            upgrading to Pipeline: Job Plugin 2.35 or newer which will automatically update Pipeline: Groovy Plugin as well

            No it will not—this is just a test dependency.

            Plugin-Dependencies: workflow-api:2.36,workflow-step-api:2.20,workflow-support:3.3
            
            Show
            jglick Jesse Glick added a comment - upgrading to Pipeline: Job Plugin 2.35 or newer which will automatically update Pipeline: Groovy Plugin as well No it will not—this is just a test dependency. Plugin-Dependencies: workflow-api:2.36,workflow-step-api:2.20,workflow-support:3.3

              People

              Assignee:
              jglick Jesse Glick
              Reporter:
              hellspam Roy Arnon
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: