Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-12290

Deadlock/Lockup when using "trigger/call builds on other projects" and "Block until the triggered projects finish their builds" is used with only 1 executor

      On a jenkins node that has 1 executor that is trying to execute an upstream job via the "trigger/call builds on other projects" build step, the job hangs indefinately or until the user-supplied timeout is reached.

      In the console for the main job, it shows:

      Waiting for the completion of <name-of-upstream-project>

      While the Jenkins dashboard shows the upstream project is trying to build, with the text:
      (pending - Waiting for next available executor )

      I would expect that the plugin would somehow be able to indicate to the jenkins node that only 1 executor is required.

          [JENKINS-12290] Deadlock/Lockup when using "trigger/call builds on other projects" and "Block until the triggered projects finish their builds" is used with only 1 executor

          Garen Parham created issue -

          cjo9900 added a comment -

          The case that you mention here is one of many possible cases I can think of the following
          Case A
          Parent (trigger+block) -> child(label=qwerty)
          A1: No nodes online that have label==qwerty
          A2: No nodes that have label defined.

          Case B
          Parent(label=qwerty)(trigger+block) -> child(label=qwerty)
          B1: Label qwerty has only one executor which is in use by parent

          Case C
          Parent(trigger+block) -> child
          C1: Master only has 1 executor

          This covers the simple cases, however if the parent is a Matrix project, we end up with an even more difficult probelm to solve.

          Case D
          Parent(x*y configurations) -> x*y matrix builds -> x*y child builds

          D1: Less than or equal x*y executors - Child builds cannot run

          So to be able to resolve this within the plugin at either configuration or runtime is very difficult.
          as we cannot just check if master has a single executor as other factors come into play, regarding Cloud
          services and job properties(label, resource, etc) that the child projects require.

          Problems:
          Configuration time.
          Can only check current situation of child projects + Nodes.
          Project list might be a parameter, so cannot determine the project list.
          Passing a label parameter to child project might affect checking.
          Cannot account for any cases where Cloud can allocate nodes.

          Runtime
          Cannot always garrenttee that a Node could be created for a Cloud instance
          Cannot control busy Executors that used by other projects, needed by started build.

          Implementation Ideas

          Get Node that we are being built on (own node)
          (or list of nodes containing all matrix siblings see below)
          get Projects to start
          get All Nodes
          get All Clouds

          foreach project + parameter set
          #check 1
          can it be started on master with parameters?
          Yes
          does master == ownNode
          (
          if master.numExecutors > 1 — Can run on master.
          )
          else
          (
          if master.numExecutors > 0 — Can run on master.
          )

          No
          // cannot start on master try other Nodes
          forEach Node
          can it be started on node with parameters?
          Yes
          does node == ownNode
          (
          if node.numExecutors > 1 — Can run on node.
          )
          else
          (
          if node.numExecutors > 0 — Can run on node.
          )

          No
          // cannot start on master or existing Nodes try Cloud services
          forEach Cloud
          can Cloud start a required node
          Yes
          — Can create a node to run on
          No
          — No possible way that we can continue as we will block.

          This should handle most cases, if it can be implemented,
          however it assumes that any other builds that are ongoing on any executor
          will be able to finish and allow a proposed job to start.
          This may fail if the parent build is a Matrix build as this behaviour would not
          take into account the jobs siblings which also have similar blocking behaviour.

          This could be resolved by
          get current build get root build
          if these are the same there is no issue
          if are different and root build is a matrix we need to find out
          where all of child jobs are running/going to run on.

          Overall this could be done but is there a need for this?

          I feel that it would be better to just add a warning when enabling the
          "Block until the triggered projects finish their builds" item that informs
          the user that this might occur.

          cjo9900 added a comment - The case that you mention here is one of many possible cases I can think of the following Case A Parent (trigger+block) -> child(label=qwerty) A1: No nodes online that have label==qwerty A2: No nodes that have label defined. Case B Parent(label=qwerty)(trigger+block) -> child(label=qwerty) B1: Label qwerty has only one executor which is in use by parent Case C Parent(trigger+block) -> child C1: Master only has 1 executor This covers the simple cases, however if the parent is a Matrix project, we end up with an even more difficult probelm to solve. Case D Parent(x*y configurations) -> x*y matrix builds -> x*y child builds D1: Less than or equal x*y executors - Child builds cannot run So to be able to resolve this within the plugin at either configuration or runtime is very difficult. as we cannot just check if master has a single executor as other factors come into play, regarding Cloud services and job properties(label, resource, etc) that the child projects require. Problems: Configuration time. Can only check current situation of child projects + Nodes. Project list might be a parameter, so cannot determine the project list. Passing a label parameter to child project might affect checking. Cannot account for any cases where Cloud can allocate nodes. Runtime Cannot always garrenttee that a Node could be created for a Cloud instance Cannot control busy Executors that used by other projects, needed by started build. Implementation Ideas Get Node that we are being built on (own node) (or list of nodes containing all matrix siblings see below) get Projects to start get All Nodes get All Clouds foreach project + parameter set #check 1 can it be started on master with parameters? Yes does master == ownNode ( if master.numExecutors > 1 — Can run on master. ) else ( if master.numExecutors > 0 — Can run on master. ) No // cannot start on master try other Nodes forEach Node can it be started on node with parameters? Yes does node == ownNode ( if node.numExecutors > 1 — Can run on node. ) else ( if node.numExecutors > 0 — Can run on node. ) No // cannot start on master or existing Nodes try Cloud services forEach Cloud can Cloud start a required node Yes — Can create a node to run on No — No possible way that we can continue as we will block. This should handle most cases, if it can be implemented, however it assumes that any other builds that are ongoing on any executor will be able to finish and allow a proposed job to start. This may fail if the parent build is a Matrix build as this behaviour would not take into account the jobs siblings which also have similar blocking behaviour. This could be resolved by get current build get root build if these are the same there is no issue if are different and root build is a matrix we need to find out where all of child jobs are running/going to run on. Overall this could be done but is there a need for this? I feel that it would be better to just add a warning when enabling the "Block until the triggered projects finish their builds" item that informs the user that this might occur.

          Roland Schulz added a comment - - edited

          I think it is important. I have a multi-configuration project for both the build and the test step. Each have more configurations then I have executors/cores. Thus the job will always hang because no executors are available when it tries to trigger the test. Ideally the triggering job (in the example the "build" job) would not use any executor while it is waiting. Similar to how the multi-configuration master doesn't use an executor (fixed in 936).

          This would let me use the block feature as a work-around for 11409.

          Roland Schulz added a comment - - edited I think it is important. I have a multi-configuration project for both the build and the test step. Each have more configurations then I have executors/cores. Thus the job will always hang because no executors are available when it tries to trigger the test. Ideally the triggering job (in the example the "build" job) would not use any executor while it is waiting. Similar to how the multi-configuration master doesn't use an executor (fixed in 936 ). This would let me use the block feature as a work-around for 11409 .
          ikedam made changes -
          Issue Type Original: Bug [ 1 ] New: Improvement [ 4 ]
          Oleg Nenashev made changes -
          Priority Original: Blocker [ 1 ] New: Major [ 3 ]
          Daniel Beck made changes -
          Link New: This issue is duplicated by JENKINS-26959 [ JENKINS-26959 ]

          Really looking forward to see this resolved. Due to this we receive jobs results later than expected (executors are blocked by idle upstream projects) and we can't run others at the same time because of the executors limit.

          I wonder also if there is any workaround for this at the moment.

          Marcin Hawraniak added a comment - Really looking forward to see this resolved. Due to this we receive jobs results later than expected (executors are blocked by idle upstream projects) and we can't run others at the same time because of the executors limit. I wonder also if there is any workaround for this at the moment.

          Dominic Cleal added a comment -

          I think what Roland's describing is the "flyweight" job which is used for matrix jobs - a job that doesn't use an executor slot.

          It'd be interesting if when blocking for a triggered job, the triggering process could be changed to a flyweight, the triggered job inherits the slot and then it swaps back on completion, or for the builder to get a temporary additional slot as the blocked process won't be using much in the way of resources.

          Dominic Cleal added a comment - I think what Roland's describing is the "flyweight" job which is used for matrix jobs - a job that doesn't use an executor slot. It'd be interesting if when blocking for a triggered job, the triggering process could be changed to a flyweight, the triggered job inherits the slot and then it swaps back on completion, or for the builder to get a temporary additional slot as the blocked process won't be using much in the way of resources.
          Joachim Herb made changes -
          Link New: This issue is duplicated by JENKINS-30302 [ JENKINS-30302 ]

          Hi is any one working on that?
          For me this problem make the parameterized jobs plugin unusable. I cannot risk that jenkins suddenly get stuck and block all builds.

          My scenario is as following:

          I have a buildscript which needs to be executed (on specific slave) to create a property files for each dependency I need to build. I use the parameterized job plugin to run a job for each dependency and wait until its finished. Here the problem starts as soon as I need to block until the end, the executed used by the main job is never releases. This issue can be easily reproduced using a slave with exactly one executor. Now I thought the Matrix Job used as upstream job type would solve the issue since it is executed as a flyweight job, however I need to execute the script for creating the dependencies on a specific slave which forces me to define a slave axis. The slave axis again is exeuted as a subtask which is not flyweight causing the same issues. One solution which will work is releasing the executor before blocking and acquire it again when the subtask is done.

          My current idea to workaround that is avoiding the need for a axis in the matrixjob and using a separate downstream job to create the property files which I will transfer to the matrxijob by using the copy artifacts plugin. That's rather complicated and I am still not sure whether that will work.

          Any better idea?

          Heiko Böttger added a comment - Hi is any one working on that? For me this problem make the parameterized jobs plugin unusable. I cannot risk that jenkins suddenly get stuck and block all builds. My scenario is as following: I have a buildscript which needs to be executed (on specific slave) to create a property files for each dependency I need to build. I use the parameterized job plugin to run a job for each dependency and wait until its finished. Here the problem starts as soon as I need to block until the end, the executed used by the main job is never releases. This issue can be easily reproduced using a slave with exactly one executor. Now I thought the Matrix Job used as upstream job type would solve the issue since it is executed as a flyweight job, however I need to execute the script for creating the dependencies on a specific slave which forces me to define a slave axis. The slave axis again is exeuted as a subtask which is not flyweight causing the same issues. One solution which will work is releasing the executor before blocking and acquire it again when the subtask is done. My current idea to workaround that is avoiding the need for a axis in the matrixjob and using a separate downstream job to create the property files which I will transfer to the matrxijob by using the copy artifacts plugin. That's rather complicated and I am still not sure whether that will work. Any better idea?

            huybrechts huybrechts
            garen Garen Parham
            Votes:
            38 Vote for this issue
            Watchers:
            41 Start watching this issue

              Created:
              Updated: