Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-12290

Deadlock/Lockup when using "trigger/call builds on other projects" and "Block until the triggered projects finish their builds" is used with only 1 executor

      On a jenkins node that has 1 executor that is trying to execute an upstream job via the "trigger/call builds on other projects" build step, the job hangs indefinately or until the user-supplied timeout is reached.

      In the console for the main job, it shows:

      Waiting for the completion of <name-of-upstream-project>

      While the Jenkins dashboard shows the upstream project is trying to build, with the text:
      (pending - Waiting for next available executor )

      I would expect that the plugin would somehow be able to indicate to the jenkins node that only 1 executor is required.

          [JENKINS-12290] Deadlock/Lockup when using "trigger/call builds on other projects" and "Block until the triggered projects finish their builds" is used with only 1 executor

          Tiger Cheng added a comment -

          Also following and in agreement with Vasily that it is expected that if the childjob is asking for a node that is run by the parent node, that it should be allowed access to the node. Without this expectation working, our complex pipeline is unable to properly archive artifacts in the ideal clean manner it was designed for. The alternative is to seek for a solution to archive from nodes

          Tiger Cheng added a comment - Also following and in agreement with Vasily that it is expected that if the childjob is asking for a node that is run by the parent node, that it should be allowed access to the node. Without this expectation working, our complex pipeline is unable to properly archive artifacts in the ideal clean manner it was designed for. The alternative is to seek for a solution to archive from nodes

          Petar Tahchiev added a comment - - edited

          I've just hit this issue My scenario is the following. I have several projects:

           - bom

           - platform

           - archetype

           - console

           - release-all

           

          The release-all is a pipeline build which calls release on each of them in the following order bom->platform->archetype->console However because I have just one executor running release-all blocks the executor and bom is never started to release because it is waiting for the next available executor.

           

          Petar Tahchiev added a comment - - edited I've just hit this issue My scenario is the following. I have several projects:  - bom  - platform  - archetype  - console  - release-all   The release-all is a pipeline build which calls release on each of them in the following order bom->platform->archetype->console However because I have just one executor running release-all blocks the executor and bom is never started to release because it is waiting for the next available executor.  

          Tamas Hegedus added a comment -

          I have a single jenkins server with two executors. I cannot run two pipelines in parallel because it immediately gets deadlocked as it cannot start the child jobs. I wonder why doesn't this issue have critical priority. It renders pipeline jobs useless in most cases. JENKINS-26959 was set critical and was closed as a duplicate of this issue.

          Tamas Hegedus added a comment - I have a single jenkins server with two executors. I cannot run two pipelines in parallel because it immediately gets deadlocked as it cannot start the child jobs. I wonder why doesn't this issue have critical priority. It renders pipeline jobs useless in most cases. JENKINS-26959 was set critical and was closed as a duplicate of this issue.

          Tobias Gierke added a comment -

          I just hit this bug, IMHO this is a core feature (being able to trigger another project and wait for it's completion without blocking an executor).

          Tobias Gierke added a comment - I just hit this bug, IMHO this is a core feature (being able to trigger another project and wait for it's completion without blocking an executor).

          This is a critical issue that makes parallel/matrix builds unusable. In my pipeline I need to build for Mac and Linux. Obviously, it can be done in parallel. There are two build nodes available (Linux and Mac). The bug reproduces when one build node waits until another build node finishes. And another one waits until the first one finishes. This is a deadlock.

          Alexander Borsuk added a comment - This is a critical issue that makes parallel/matrix builds unusable. In my pipeline I need to build for Mac and Linux. Obviously, it can be done in parallel. There are two build nodes available (Linux and Mac). The bug reproduces when one build node waits until another build node finishes. And another one waits until the first one finishes. This is a deadlock.

          What if we have the queue scheduler reserve x number of executors per agent - so that parent jobs don't block all of executors out from the child (a temporary safe guard)?

          NhatKhai Nguyen added a comment - What if we have the queue scheduler reserve x number of executors per agent - so that parent jobs don't block all of executors out from the child (a temporary safe guard)?

          NhatKhai Nguyen added a comment - - edited

          And/or while parent job wait for the child jobs, it should release the executor. Then when come back to parent jobs, it just had to acquire the new executor again on the same agent machine.

          (Or hang over the executor to child jobs, and take back from it when it done)

          NhatKhai Nguyen added a comment - - edited And/or while parent job wait for the child jobs, it should release the executor. Then when come back to parent jobs, it just had to acquire the new executor again on the same agent machine. (Or hang over the executor to child jobs, and take back from it when it done)

          NhatKhai Nguyen added a comment - - edited

          I think similar release, and acquire could be apply to all the do no thing operation like: sleep x sec, wait for other job finished etc...

          NhatKhai Nguyen added a comment - - edited I think similar release, and acquire could be apply to all the do no thing operation like: sleep x sec, wait for other job finished etc...

          Jose added a comment - - edited

          I see some issues with this ticket:

          • I'm not very familiar about the project/components terms used in Jenkins, however I believe the issue is not about the "parameterized-trigger-plugin" component, but rather about the "build step" used to launch other jobs from within another job.
          • The issue is not related to having only 1 executor, but rather to having no available executors when launching too many master/orchestrators jobs that run sub-jobs, causing a deadlock.

          I propose updating the Jira ticket to reflect the issue as: "Deadlock/Lockup when using "launching builds on other jobs" and "Block until the triggered jobs finish their builds" when no available executors"

          From what I understand, this issue occurs in the following scenario:

          • Node/slave with only 2 (or 1) executors available
          • 1 upstream job finishes successfully
          • 2 downstream (orchestrator) jobs are automatically triggered immediatelly by the upstream job when it finishes successfully
          • Each downstream job launches other sub-jobs (dependent jobs) that each require their own executor

          Since the last bullet point requires extra executors, and there are none available, the master/orchestrator jobs will enter a deadlock.

          There are several ways to resolve this issue:

          1. (my preference): The launched dependent jobs do not occupy extra executors if run on the same node/slave.
          2. Distribute the execution of the downstream jobs so that they do not collide and there are available executors for their sub-jobs.

          A high-level implementation of the above could be:

          1. For the first option, this could be automated by the Jenkins orchestrator. Alternatively, a new flag in the build step could also help.
          2. For the second way, this could be achieved by adding a flag to the trigger-upstream configuration in the downstream job with a parameter similar to the cron H flag. In other words, instead of immediately executing the downstream jobs, they could be distributed over a set period of time (e.g. within an hour).

          Jose added a comment - - edited I see some issues with this ticket: I'm not very familiar about the project/components terms used in Jenkins, however I believe the issue is not about the "parameterized-trigger-plugin" component, but rather about the "build step" used to launch other jobs from within another job. The issue is not related to having only 1 executor, but rather to having no available executors when launching too many master/orchestrators jobs that run sub-jobs, causing a deadlock. I propose updating the Jira ticket to reflect the issue as: "Deadlock/Lockup when using "launching builds on other jobs" and "Block until the triggered jobs finish their builds" when no available executors" From what I understand, this issue occurs in the following scenario: Node/slave with only 2 (or 1) executors available 1 upstream job finishes successfully 2 downstream (orchestrator) jobs are automatically triggered immediatelly by the upstream job when it finishes successfully Each downstream job launches other sub-jobs (dependent jobs) that each require their own executor Since the last bullet point requires extra executors, and there are none available, the master/orchestrator jobs will enter a deadlock. There are several ways to resolve this issue: (my preference): The launched dependent jobs do not occupy extra executors if run on the same node/slave. Distribute the execution of the downstream jobs so that they do not collide and there are available executors for their sub-jobs. A high-level implementation of the above could be: For the first option, this could be automated by the Jenkins orchestrator. Alternatively, a new flag in the build step could also help. For the second way, this could be achieved by adding a flag to the trigger-upstream configuration in the downstream job with a parameter similar to the cron H flag. In other words, instead of immediately executing the downstream jobs, they could be distributed over a set period of time (e.g. within an hour).

          Magnus Reftel added a comment -

          Another way of seeing the issue is that Jenkins needlessly holds an executor occupied while the job it is running is just waiting for another job to finish. If the job somehow would yield its executor while waiting, there would be no deadlocks.

          Magnus Reftel added a comment - Another way of seeing the issue is that Jenkins needlessly holds an executor occupied while the job it is running is just waiting for another job to finish. If the job somehow would yield its executor while waiting, there would be no deadlocks.

            huybrechts huybrechts
            garen Garen Parham
            Votes:
            38 Vote for this issue
            Watchers:
            41 Start watching this issue

              Created:
              Updated: