-
Bug
-
Resolution: Cannot Reproduce
-
Blocker
-
None
-
Platform: All, OS: All
-
Powered by SuggestiMate
Currently Hudson does not allow multiple builds in parallel for the same
project. However when a project is parameterized, this IS possible.
This is very dangerous, since it's perfectly possible that two builds will be
started on the same workspace. Imagine your workspace being updated by one build
while the other one is still doing the build. You might not even notice
something wrong has happened but your build results would be completely bogus.
The current workaround is to make sure you do only one parallel build per node.
The immediate solution is to disable parallel builds in the core, but this
requires implementing equals() and hashcode() differently on ALL Queue.Task
implementers.
The final solution is to implement decent locking, and fix all other places in
Hudson that depend on only having one build running.
- is blocking
-
JENKINS-3004 Unable to cancel pending jobs (since issue #291)
-
- Closed
-
- is duplicated by
-
JENKINS-3154 dangerous bug with buildWithParameters option.
-
- Closed
-
[JENKINS-2997] multiple parallel parameterized builds seem to happen at once
proposed patch passes Actions when queuing a build
TODO:
- verify queue.xml serialization (backward compatible ?)
TODO (optional):
- implement an optional interface BuildWrappingAction that can influence the build
- implement parameters on top of this
- implement UI when scheduling a build
- implement parameter asking on top of this
one comment on the patch: I see one spot in AbstractProject diffs where
quietPeriod was replaced with zero.
one question: before this change you could go schedule several builds with
different parameters.. the dangerous part was these builds running in parallel,
but I guess it was good they could all be in the queue. with this patch what
happens if you submit a parameterized build when one for the same project is
already in the queue (even with different parameters)?
Created an attachment (id=554)
This version allows multiple scheduled builds for parameterized projects, but no parallel building. Still WIP.
patch is getting long.. I didn't make it thru all the Queue.java stuff yet, but
did notice: diff in MatrixConfiguration doesn't use the Cause parameter anymore
Looks like ParametersAction always allows the same job to be added in the queue.
In previous(current) code it was using equality of the Task objects to avoid
duplicates, so a job submitted with the same set of parameters would not get
scheduled again.
Seems like this aspect is somewhat important, as jobs with parameters may still
be invoked via SCM or Timer with the default parameters.. if a build is blocked
(waiting on executor, etc) and it keeps polling and triggering, we could end up
with tons of scheduled jobs with the same default parameters. What do you think?
r15291 | huybrechts | 2009-02-13 09:59:04 -0700 (Fri, 13 Feb 2009) | 1 line
Changed paths:
M /trunk/hudson/main/core/src/main/java/hudson/matrix/MatrixConfiguration.java
M /trunk/hudson/main/core/src/main/java/hudson/model/AbstractProject.java
M /trunk/hudson/main/core/src/main/java/hudson/model/Executor.java
M /trunk/hudson/main/core/src/main/java/hudson/model/FileParameterValue.java
M /trunk/hudson/main/core/src/main/java/hudson/model/ParameterValue.java
D
/trunk/hudson/main/core/src/main/java/hudson/model/ParameterizedProjectTask.java
M /trunk/hudson/main/core/src/main/java/hudson/model/ParametersAction.java
M
/trunk/hudson/main/core/src/main/java/hudson/model/ParametersDefinitionProperty.java
M /trunk/hudson/main/core/src/main/java/hudson/model/Queue.java
M /trunk/hudson/main/core/src/main/java/hudson/model/StringParameterValue.java
D /trunk/hudson/main/core/src/main/java/hudson/util/QueueTaskFilter.java
M /trunk/hudson/main/core/src/main/java/hudson/widgets/BuildHistoryWidget.java
M
/trunk/hudson/main/core/src/main/resources/hudson/widgets/BuildHistoryWidget/entries.jelly
[FIXED JENKINS-2997] introducing QueueAction to pass actions from scheduling to
the build it self
On current svn, I just tried this:
1) create job with 1 string parameter, job just does "sleep 90"
2) start job (starts immediately)
3) start job again (now one in queue)
4) start job again with different parameter value (now two in queue)
when the first build completed, both jobs in the queue started simultaneously.
also, the "others.isEmpty" case in ParametersAction.shouldSchedule looks wrong..
it returns parameters.isEmpty().. should it be !parameters.isEmpty() ?
- the isEmpty() is corrected
- the parallel building you see is an illusion... There is a gap (you might call
it a bug) in the Queue implementation. It checks for resource locks before
handing over an item in the queue to an executor, but doesn actually reserve the
resources. Then before the executor starts building, the resources are actually
reserverd. Therefore it is possible that more than one build gets passed the
first check. They will then all be assigned an executor, but still only one can
build at the same time. When multiple builds are scheduled, this is very likely
to happen. The bad effect here is that an executor is unnecessarily occupied,
but there is still no parallel building.
But... when two builds on the same project start in the same second, they will
have the same build id (not to be confused with the build number), and thus
share the same build directory ! Really bad... Because of this, the two builds
that look like they are building in parallel in the previous example will even
share the same log file. And if you look at the console output of both these
builds, it looks like both are building.
r15338 | huybrechts | 2009-02-14 04:36:57 -0800 (Sat, 14 Feb 2009) | 3 lines
Changed paths:
M /trunk/hudson/main/core/src/main/java/hudson/model/AbstractProject.java
M /trunk/hudson/main/core/src/main/java/hudson/model/ParametersAction.java
[FIXED JENKINS-2997]
- don't start two builds of the same project in the same second
- don't schedule two parameterized builds without parameters at the same time
------------------------------------------------------------------------
r15339 | huybrechts | 2009-02-14 04:45:57 -0800 (Sat, 14 Feb 2009) | 1 line
Changed paths:
M /trunk/hudson/main/core/src/main/java/hudson/model/AbstractProject.java
JENKINS-2997 no need to serialize this...
Code changed in hudson
User: : huybrechts
Path:
trunk/hudson/main/core/src/main/java/hudson/model/AbstractProject.java
http://fisheye4.cenqua.com/changelog/hudson/?cs=15339
Log:
JENKINS-2997 no need to serialize this...
one more edge case:
1) start parameterized project with "sleep 120" as build script (any param value)
2) schedule a build with param=B (now waiting in queue)
3) schedule a build with param=A (added to queue since different param value)
4) schedule another build with param=A
Job gets added again, even though a duplicate of job in step 3, because
Queue.add() does getItem(p) which just gets the first Item matching that Task.
QueueAction.shouldSchedule isn't called for Actions on Item from step 3.
pretty minor case, but thought I'd mention it..
It looks like this bug has resurfaced. I have a parameterized job that gets
executed multiple times in parallel when I queue several instances with
different parameter values.
I can reproduce the bug with a minimal job, using the instructions that
'mindless' left in his previous comment. The issue occurs both with 1.303 and 1.304.
I'm reopening this issue, I'm avail for further information if needed.
Are they actually executing in parallel? Sometimes Hudson will allocate an
executor to a job, but it won't start because it is waiting for a resource to
free up. A script like:
date
sleep 100
date
Ought to demonstrate whether they have overlapping timestamps when executing.
Here's a more detailed test report with the script you mentioned. I have 3
executors. I started 3 instances of the same 'Test' Job within a few seconds,
with dummy A/B/A param values :
- While Test #1 was being executed (and showing in the Executor view), Test #2
and Test #3 remained in the Queue. (OK) - Then as soon as Test #1 completed, both Test #2 and Test #3 got moved out of
the Queue and appeared in the Executor view with a progress bar showing some
progress for the both of them. (NOK) - I have pasted the output of Test #1, #2 and #3, which shows that timestamps do
not actually overlap.
=> So it's not bad as it first seemed (jobs are not actually executed in
parallel), but I'd expect jobs to stay queued until they are ready to be
executed. This would:
- prevent the user from thinking that multiple instances of the same job are
executing simultaneously (the progress bar is particularly misleading). - free up a valuable Executor slot, letting other jobs (that are ready to start)
execute.
Furthermore, I'm pretty sure that I've seen several instances of the more
complex parameterized job that I have overlap, I'll post again if I have more
information that can help track this down.
Thanks for your efforts!
----------------
Test #1
Started by user anonymous
[workspace] $ /bin/sh -xe /tmp/hudson1497163093424666761.sh
+ date
Fri May 15 08:43:56 CEST 2009
+ sleep 100
+ date
Fri May 15 08:45:36 CEST 2009
Finished: SUCCESS
Test #2
Started by user anonymous
[workspace] $ /bin/sh -xe /tmp/hudson1486533333714628975.sh
+ date
Fri May 15 08:45:36 CEST 2009
+ sleep 100
+ date
Fri May 15 08:47:16 CEST 2009
Finished: SUCCESS
Test #3
Started by user anonymous
[workspace] $ /bin/sh -xe /tmp/hudson487910029299787088.sh
+ date
Fri May 15 08:47:16 CEST 2009
+ sleep 100
+ date
Fri May 15 08:48:56 CEST 2009
Finished: SUCCESS
Got another similar problem today with version 1.308: scheduled 7 builds of the
same job (#117 through #123) on a machine with 2 executors.
The first 2 builds went through OK, but then mayhem started: #120 started
before #119 and failed with the following exception:
FATAL: null
java.lang.NullPointerException
at hudson.model.AbstractBuild.getCulprits(AbstractBuild.java:192)
at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:349)
at hudson.model.Run.run(Run.java:947)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:119)
I'm also attaching a screenshot of the 'build history' panel, which shows
duplicated items, perhaps an indication of what's going on.
Code changed in hudson
User: : kohsuke
Path:
trunk/hudson/main/core/src/main/java/hudson/model/AbstractBuild.java
trunk/hudson/main/core/src/main/java/hudson/model/Run.java
http://fisheye4.cenqua.com/changelog/hudson/?cs=19570
Log:
[FIXED JENKINS-2997] NPE fix. This will be in 1.316.
After updating to 1.318, I can confirm the NPE no longer appears in the log.
However, multiple instances of the same job still appear as 'ongoing' (with a
progress bar) in the 'Build Executor Status' and 'Build History' views, as
described in the comment I left on May 15 07:30:02.
I'm reopening this ticket. I'm available for more information if needed.
can you provide steps to see this problem from a clean install of the latest
hudson release?
Sorry for not replying earlier, the first notification got lost in a sea of
emails...
The problem I described in my last comment on Aug 7 is still accurate and
reproducible in 1.334. Here is a simple way to reproduce the problem:
- Make sure you have 2 executors.
- Create a new freestyle job 'Test'.
- Tick 'This build is parameterized' and add a 'Dummy' String parameter.
- Add an 'Execute shell' build step with the following script: 'date && sleep
100 && date'. - Save the job.
- Schedule 5 builds with different parameter values (a/b/c/d/e), quickly enough
so they all get queued. - Wait for the first build to finish.
- Watch the 'Build history' frame: it will show two builds with a progress bar,
just like on the attached screenshot (following).
Hope that helps,
Maxence
Hi. Is there any progress on this issue? If not, is there any workaround?
We have a linux master, and a windows slave (with label "win").
Our .NET apps have matrix build jobs marked to build on label="win" ('cause they need MSBuild). We use gerrit for code review and the gerrit-trigger-plugin on our hudson server. Whenever someone pushes several commits to gerrit, it triggers several hudson builds, most of which get "ABORTED". Very frustrating...
I am willing to do some coding/testing myself, though I have not looked at the hudson sources :-/
... Am I asking in the wrong place? Does nobody but me have this problem?
Just wondering
@toravid, please take a look at issue #5653. Seems like updating to a recent version should fix your problem.
As far as I'm concerned, parameterized builds now work properly (problem fixed).
@maxence, thanks for replying. But alas, my problem still persists after upgrading to 1.386.
So it seems there might be another cause for my problem... Maybe it has something to do with Matrix builds... (We use that to limit build slaves to those running Windows when we build .NET apps).
No new reports in four years, it's safe to assume this is obsolete. Most of the original report (e.g. parallel builds using the same workspace) simply no longer apply anyway.
working on it