-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Powered by SuggestiMate
On a jenkins node that has 1 executor that is trying to execute an upstream job via the "trigger/call builds on other projects" build step, the job hangs indefinately or until the user-supplied timeout is reached.
In the console for the main job, it shows:
Waiting for the completion of <name-of-upstream-project>
While the Jenkins dashboard shows the upstream project is trying to build, with the text:
(pending - Waiting for next available executor )
I would expect that the plugin would somehow be able to indicate to the jenkins node that only 1 executor is required.
- is duplicated by
-
JENKINS-26959 Waiting Job blocks executor causing deadlock
-
- Resolved
-
-
JENKINS-30302 Matrix parent takes executor slot and blocks children
-
- Resolved
-
[JENKINS-12290] Deadlock/Lockup when using "trigger/call builds on other projects" and "Block until the triggered projects finish their builds" is used with only 1 executor
I think it is important. I have a multi-configuration project for both the build and the test step. Each have more configurations then I have executors/cores. Thus the job will always hang because no executors are available when it tries to trigger the test. Ideally the triggering job (in the example the "build" job) would not use any executor while it is waiting. Similar to how the multi-configuration master doesn't use an executor (fixed in 936).
This would let me use the block feature as a work-around for 11409.
Really looking forward to see this resolved. Due to this we receive jobs results later than expected (executors are blocked by idle upstream projects) and we can't run others at the same time because of the executors limit.
I wonder also if there is any workaround for this at the moment.
I think what Roland's describing is the "flyweight" job which is used for matrix jobs - a job that doesn't use an executor slot.
It'd be interesting if when blocking for a triggered job, the triggering process could be changed to a flyweight, the triggered job inherits the slot and then it swaps back on completion, or for the builder to get a temporary additional slot as the blocked process won't be using much in the way of resources.
Hi is any one working on that?
For me this problem make the parameterized jobs plugin unusable. I cannot risk that jenkins suddenly get stuck and block all builds.
My scenario is as following:
I have a buildscript which needs to be executed (on specific slave) to create a property files for each dependency I need to build. I use the parameterized job plugin to run a job for each dependency and wait until its finished. Here the problem starts as soon as I need to block until the end, the executed used by the main job is never releases. This issue can be easily reproduced using a slave with exactly one executor. Now I thought the Matrix Job used as upstream job type would solve the issue since it is executed as a flyweight job, however I need to execute the script for creating the dependencies on a specific slave which forces me to define a slave axis. The slave axis again is exeuted as a subtask which is not flyweight causing the same issues. One solution which will work is releasing the executor before blocking and acquire it again when the subtask is done.
My current idea to workaround that is avoiding the need for a axis in the matrixjob and using a separate downstream job to create the property files which I will transfer to the matrxijob by using the copy artifacts plugin. That's rather complicated and I am still not sure whether that will work.
Any better idea?
I think the waiting parent job should share its executor for the children.
Alternatively, it can just release the executor while blocked and pick it up on unblock.
Also following and in agreement with Vasily that it is expected that if the childjob is asking for a node that is run by the parent node, that it should be allowed access to the node. Without this expectation working, our complex pipeline is unable to properly archive artifacts in the ideal clean manner it was designed for. The alternative is to seek for a solution to archive from nodes
I've just hit this issue My scenario is the following. I have several projects:
- bom
- platform
- archetype
- console
- release-all
The release-all is a pipeline build which calls release on each of them in the following order bom->platform->archetype->console However because I have just one executor running release-all blocks the executor and bom is never started to release because it is waiting for the next available executor.
I have a single jenkins server with two executors. I cannot run two pipelines in parallel because it immediately gets deadlocked as it cannot start the child jobs. I wonder why doesn't this issue have critical priority. It renders pipeline jobs useless in most cases. JENKINS-26959 was set critical and was closed as a duplicate of this issue.
I just hit this bug, IMHO this is a core feature (being able to trigger another project and wait for it's completion without blocking an executor).
This is a critical issue that makes parallel/matrix builds unusable. In my pipeline I need to build for Mac and Linux. Obviously, it can be done in parallel. There are two build nodes available (Linux and Mac). The bug reproduces when one build node waits until another build node finishes. And another one waits until the first one finishes. This is a deadlock.
What if we have the queue scheduler reserve x number of executors per agent - so that parent jobs don't block all of executors out from the child (a temporary safe guard)?
And/or while parent job wait for the child jobs, it should release the executor. Then when come back to parent jobs, it just had to acquire the new executor again on the same agent machine.
(Or hang over the executor to child jobs, and take back from it when it done)
I think similar release, and acquire could be apply to all the do no thing operation like: sleep x sec, wait for other job finished etc...
I see some issues with this ticket:
- I'm not very familiar about the project/components terms used in Jenkins, however I believe the issue is not about the "parameterized-trigger-plugin" component, but rather about the "build step" used to launch other jobs from within another job.
- The issue is not related to having only 1 executor, but rather to having no available executors when launching too many master/orchestrators jobs that run sub-jobs, causing a deadlock.
I propose updating the Jira ticket to reflect the issue as: "Deadlock/Lockup when using "launching builds on other jobs" and "Block until the triggered jobs finish their builds" when no available executors"
From what I understand, this issue occurs in the following scenario:
- Node/slave with only 2 (or 1) executors available
- 1 upstream job finishes successfully
- 2 downstream (orchestrator) jobs are automatically triggered immediatelly by the upstream job when it finishes successfully
- Each downstream job launches other sub-jobs (dependent jobs) that each require their own executor
Since the last bullet point requires extra executors, and there are none available, the master/orchestrator jobs will enter a deadlock.
There are several ways to resolve this issue:
- (my preference): The launched dependent jobs do not occupy extra executors if run on the same node/slave.
- Distribute the execution of the downstream jobs so that they do not collide and there are available executors for their sub-jobs.
A high-level implementation of the above could be:
- For the first option, this could be automated by the Jenkins orchestrator. Alternatively, a new flag in the build step could also help.
- For the second way, this could be achieved by adding a flag to the trigger-upstream configuration in the downstream job with a parameter similar to the cron H flag. In other words, instead of immediately executing the downstream jobs, they could be distributed over a set period of time (e.g. within an hour).
Another way of seeing the issue is that Jenkins needlessly holds an executor occupied while the job it is running is just waiting for another job to finish. If the job somehow would yield its executor while waiting, there would be no deadlocks.
The case that you mention here is one of many possible cases I can think of the following
Case A
Parent (trigger+block) -> child(label=qwerty)
A1: No nodes online that have label==qwerty
A2: No nodes that have label defined.
Case B
Parent(label=qwerty)(trigger+block) -> child(label=qwerty)
B1: Label qwerty has only one executor which is in use by parent
Case C
Parent(trigger+block) -> child
C1: Master only has 1 executor
This covers the simple cases, however if the parent is a Matrix project, we end up with an even more difficult probelm to solve.
Case D
Parent(x*y configurations) -> x*y matrix builds -> x*y child builds
D1: Less than or equal x*y executors - Child builds cannot run
So to be able to resolve this within the plugin at either configuration or runtime is very difficult.
as we cannot just check if master has a single executor as other factors come into play, regarding Cloud
services and job properties(label, resource, etc) that the child projects require.
Problems:
Configuration time.
Can only check current situation of child projects + Nodes.
Project list might be a parameter, so cannot determine the project list.
Passing a label parameter to child project might affect checking.
Cannot account for any cases where Cloud can allocate nodes.
Runtime
Cannot always garrenttee that a Node could be created for a Cloud instance
Cannot control busy Executors that used by other projects, needed by started build.
Implementation Ideas
Get Node that we are being built on (own node)
(or list of nodes containing all matrix siblings see below)
get Projects to start
get All Nodes
get All Clouds
foreach project + parameter set
#check 1
can it be started on master with parameters?
Yes
does master == ownNode
(
if master.numExecutors > 1 — Can run on master.
)
else
(
if master.numExecutors > 0 — Can run on master.
)
No
// cannot start on master try other Nodes
forEach Node
can it be started on node with parameters?
Yes
does node == ownNode
(
if node.numExecutors > 1 — Can run on node.
)
else
(
if node.numExecutors > 0 — Can run on node.
)
No
// cannot start on master or existing Nodes try Cloud services
forEach Cloud
can Cloud start a required node
Yes
— Can create a node to run on
No
— No possible way that we can continue as we will block.
This should handle most cases, if it can be implemented,
however it assumes that any other builds that are ongoing on any executor
will be able to finish and allow a proposed job to start.
This may fail if the parent build is a Matrix build as this behaviour would not
take into account the jobs siblings which also have similar blocking behaviour.
This could be resolved by
get current build get root build
if these are the same there is no issue
if are different and root build is a matrix we need to find out
where all of child jobs are running/going to run on.
Overall this could be done but is there a need for this?
I feel that it would be better to just add a warning when enabling the
"Block until the triggered projects finish their builds" item that informs
the user that this might occur.