Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-39552

After restart, interrupted pipeline deadlocks waiting for executor

    XMLWordPrintable

Details

    Description

      I had a pipeline build running, and then restarted Jenkins. After coming up again, I had this in the log for one of the parallel steps in the build:

      Resuming build at Mon Nov 07 13:11:05 CET 2016 after Jenkins restart
      Waiting to resume part of Atlassian Bitbucket » honey » master #4: ???
      Waiting to resume part of Atlassian Bitbucket » honey » master #4: Waiting for next available executor on bcubuntu32

      And the last message repeating every few minutes. The slave bcubuntu32 has only one executor, and it seems like this executor was "used up" for this task of waiting for an available executor...

      After I went into the configuration and changed number of executors to 2, the build continued as normal.

      A possibly related issue: Before restart, I put Jenkins in quiet mode, but the same build agent hung at the end of the pipeline part that was running, never finishing the build. In the end I made the restart without waiting for the part to finish.

      How to reproduce

      • In a fresh Jenkins instance, set master executors number to 1
      • Create job-1 and job-2 as follow
        node {
            parallel "parallel-1": {
                sh "true"
            }, "parallel-2": {
                sh "true"
            }
        }
        build 'job-2'
        
        node {
            sh "sleep 300"
        }
        

      Start a build, wait for job-2 node block to start, then restart Jenkins.

      When it comes back online, you'll see a deadlock

      It seems job-1 is trying to come back on the node it used before the restart, even though its current state doesn't require any node.

      Attachments

        Issue Links

          Activity

            mkobit Mike Kobit added a comment -

            From thread dump

            "AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#24]" Id=647 Group=main WAITING on com.google.common.util.concurrent.AbstractFuture$Sync@47eda1e1
            	at sun.misc.Unsafe.park(Native Method)
            	-  waiting on com.google.common.util.concurrent.AbstractFuture$Sync@47eda1e1
            	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
            	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275)
            	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111)
            	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:248)
            	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237)
            	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:294)
            	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:61)
            	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getNode(ExecutorStepExecution.java:259)
            	at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.categoriesForPipeline(ThrottleQueueTaskDispatcher.java:411)
            	at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canRun(ThrottleQueueTaskDispatcher.java:168)
            	at hudson.model.Queue.isBuildBlocked(Queue.java:1184)
            	at hudson.model.Queue.maintain(Queue.java:1505)
            	at hudson.model.Queue$1.call(Queue.java:320)
            	at hudson.model.Queue$1.call(Queue.java:317)
            	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:108)
            	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:98)
            	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
            	at java.lang.Thread.run(Thread.java:745)
            
            	Number of locked synchronizers = 1
            	- java.util.concurrent.locks.ReentrantLock$NonfairSync@5613fb44
            
            mkobit Mike Kobit added a comment - From thread dump "AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#24]" Id=647 Group=main WAITING on com.google.common.util.concurrent.AbstractFuture$Sync@47eda1e1 at sun.misc.Unsafe.park(Native Method) - waiting on com.google.common.util.concurrent.AbstractFuture$Sync@47eda1e1 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:248) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:294) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:61) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getNode(ExecutorStepExecution.java:259) at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.categoriesForPipeline(ThrottleQueueTaskDispatcher.java:411) at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canRun(ThrottleQueueTaskDispatcher.java:168) at hudson.model.Queue.isBuildBlocked(Queue.java:1184) at hudson.model.Queue.maintain(Queue.java:1505) at hudson.model.Queue$1.call(Queue.java:320) at hudson.model.Queue$1.call(Queue.java:317) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:108) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:98) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110) at java.lang. Thread .run( Thread .java:745) Number of locked synchronizers = 1 - java.util.concurrent.locks.ReentrantLock$NonfairSync@5613fb44
            abayer Andrew Bayer added a comment -

            mkobit - that sounds like JENKINS-44747, fyi. This issue here predates the change in Throttle Concurrent Builds, so is probably caused by something else.

            abayer Andrew Bayer added a comment - mkobit - that sounds like JENKINS-44747 , fyi. This issue here predates the change in Throttle Concurrent Builds, so is probably caused by something else.
            mkobit Mike Kobit added a comment -

            Thanks abayer - I'll follow that issue.

            I'm starting to think that my issue may be a different. We saw a lot of weirdness with Jenkins restarts and lots of LinkageError from a few user pipelines, and they add a bunch of load statements (some nested) and reloading the same resources that may have caused our issue. Still unsure, but haven't seen it happen again since we fixed it in the last day.

            mkobit Mike Kobit added a comment - Thanks abayer - I'll follow that issue. I'm starting to think that my issue may be a different. We saw a lot of weirdness with Jenkins restarts and lots of LinkageError from a few user pipelines, and they add a bunch of load statements (some nested) and reloading the same resources that may have caused our issue. Still unsure, but haven't seen it happen again since we fixed it in the last day.
            macdrega Joerg Schwaerzler added a comment - - edited

            We're facing the same issue here - Jenkins 2.190.2, workflow-durable-task-step-plugin: 2.35.
            Are there probably any updates/workaround available in the meantime?

            Seems like we can work around this issue by temporarily adding a second executor...

            macdrega Joerg Schwaerzler added a comment - - edited We're facing the same issue here - Jenkins 2.190.2, workflow-durable-task-step-plugin: 2.35. Are there probably any updates/workaround available in the meantime? Seems like we can work around this issue by temporarily adding a second executor...
            dnusbaum Devin Nusbaum added a comment -

            There are various issues described here, but I think the main issue in the description is a duplicate of JENKINS-53709 (fixed in Pipeline: Groovy version 2.56) or JENKINS-41791 (fixed in Pipeline: Groovy 2.66). There is also a possibility that the fix for JENKINS-63164 (released in Pipeline: Groovy version 2.82) would fix this issue. Given that, I am going to go ahead and close this issue as a duplicate.

            macdrega Assuming you are running the latest version of Pipeline: Groovy plugin, I would open a new issue and describe the behavior you are seeing, including the Pipeline having the problem, a build log from when the problem happened (ideally the entire build folder zipped), and any exceptions in the Jenkins system logs when the problem occurred.

            dnusbaum Devin Nusbaum added a comment - There are various issues described here, but I think the main issue in the description is a duplicate of JENKINS-53709 (fixed in Pipeline: Groovy version 2.56) or JENKINS-41791 (fixed in Pipeline: Groovy 2.66). There is also a possibility that the fix for JENKINS-63164 (released in Pipeline: Groovy version 2.82) would fix this issue. Given that, I am going to go ahead and close this issue as a duplicate. macdrega Assuming you are running the latest version of Pipeline: Groovy plugin, I would open a new issue and describe the behavior you are seeing, including the Pipeline having the problem, a build log from when the problem happened (ideally the entire build folder zipped), and any exceptions in the Jenkins system logs when the problem occurred.

            People

              Unassigned Unassigned
              estyrke Emil Styrke
              Votes:
              11 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: