[JENKINS-936] Matrix parent build shouldn't consume an executor.

Kohsuke Kawaguchi added a comment - 2007-10-22 18:08

Right. The parent build needs to be run outside of the normal executors so that
it can let child builds use that executor.

In the mean time, a work around is to tie the parent build to somewhere where
more executors are available.

Kohsuke Kawaguchi added a comment - 2007-10-22 18:08 Right. The parent build needs to be run outside of the normal executors so that it can let child builds use that executor. In the mean time, a work around is to tie the parent build to somewhere where more executors are available.

Régis Desgroppes added a comment - 2007-10-23 01:40

I don't see how to put in place the workaround you propose, in that the
multi-project configuration page doesn't allow to tie the parent build to a
dedicated node.

Régis Desgroppes added a comment - 2007-10-23 01:40 I don't see how to put in place the workaround you propose, in that the multi-project configuration page doesn't allow to tie the parent build to a dedicated node.

Kohsuke Kawaguchi added a comment - 2007-10-29 17:35

- - Issue 961 has been marked as a duplicate of this issue. ***

Kohsuke Kawaguchi added a comment - 2007-10-29 17:35 Issue 961 has been marked as a duplicate of this issue. ***

Kohsuke Kawaguchi added a comment - 2008-03-19 22:54

- - Issue 1432 has been marked as a duplicate of this issue. ***

Kohsuke Kawaguchi added a comment - 2008-03-19 22:54 Issue 1432 has been marked as a duplicate of this issue. ***

musilt2 added a comment - 2008-04-18 04:00

- - Issue 1561 has been marked as a duplicate of this issue. ***

musilt2 added a comment - 2008-04-18 04:00 Issue 1561 has been marked as a duplicate of this issue. ***

musilt2 added a comment - 2008-04-18 04:04

what's the current status of this issue? any time-frame when the fix will be
available?

musilt2 added a comment - 2008-04-18 04:04 what's the current status of this issue? any time-frame when the fix will be available?

mirilovic added a comment - 2008-04-18 08:35

It would be great to have a possibility to tie the parent job to particular
machine - and I personally think the best one would be Master.

mirilovic added a comment - 2008-04-18 08:35 It would be great to have a possibility to tie the parent job to particular machine - and I personally think the best one would be Master.

Lloyd Chang added a comment - 2008-07-06 23:56

Kohsuke, unless I'm mistaken, a code change in the Hudson Core is needed to even
try the work-around.

Like others, I'm not sure how to configure Hudson to tie the parent build to a
specific node. In node selection, I selected 1 label for multiple axes to use,
but I don't see any options for the parent build.

Lloyd Chang added a comment - 2008-07-06 23:56 Kohsuke, unless I'm mistaken, a code change in the Hudson Core is needed to even try the work-around. Like others, I'm not sure how to configure Hudson to tie the parent build to a specific node. In node selection, I selected 1 label for multiple axes to use, but I don't see any options for the parent build.

SCM/JIRA link daemon added a comment - 2009-07-18 17:39

Code changed in hudson
User: : kohsuke
Path:
http://fisheye4.cenqua.com/changelog/hudson/?cs=19911
Log:
~~JENKINS-936~~ Created a branch to experiment with the solution.

SCM/JIRA link daemon added a comment - 2009-07-18 17:39 Code changed in hudson User: : kohsuke Path: http://fisheye4.cenqua.com/changelog/hudson/?cs=19911 Log: JENKINS-936 Created a branch to experiment with the solution.

Kohsuke Kawaguchi added a comment - 2009-07-18 18:45

Note to myself. There's two ways to do this.

One is to just add one more executor temporarily to compensate the effect. This
is easy, but the downside is that the added executor may end up doing something
else, and it might take a bit of time before they get released. Plus this will
cause a UI discrepancy between the user setting vs what they see.

Another is to add subtype of Executor and let it run just so that it can execute
the parent build. The trick is to find a situation where an executor shouldn't
run (like Hudson is shutting down) so that we won't lose the item.

Kohsuke Kawaguchi added a comment - 2009-07-18 18:45 Note to myself. There's two ways to do this. One is to just add one more executor temporarily to compensate the effect. This is easy, but the downside is that the added executor may end up doing something else, and it might take a bit of time before they get released. Plus this will cause a UI discrepancy between the user setting vs what they see. Another is to add subtype of Executor and let it run just so that it can execute the parent build. The trick is to find a situation where an executor shouldn't run (like Hudson is shutting down) so that we won't lose the item.

Ringo De Smet added a comment - 2009-07-19 01:37

Kohsuke, from your last comment, I think your second suggestion is a better solution. Just add a "virtual"
executor to the master that can run the composite builds, and also making the composite build of type
"virtual".

Ringo De Smet added a comment - 2009-07-19 01:37 Kohsuke, from your last comment, I think your second suggestion is a better solution. Just add a "virtual" executor to the master that can run the composite builds, and also making the composite build of type "virtual".

SCM/JIRA link daemon added a comment - 2009-07-20 18:58

Code changed in hudson
User: : kohsuke
Path:
branches/matrix-parent/core/src/main/java/hudson/matrix/MatrixProject.java
branches/matrix-parent/core/src/main/java/hudson/model/Computer.java
branches/matrix-parent/core/src/main/java/hudson/model/Executor.java
branches/matrix-parent/core/src/main/java/hudson/model/OneOffExecutor.java
branches/matrix-parent/core/src/main/java/hudson/model/Queue.java
branches/matrix-parent/core/src/main/resources/lib/hudson/executors.jelly
http://fisheye4.cenqua.com/changelog/hudson/?cs=19998
Log:
~~JENKINS-936~~ I believe this should do.

SCM/JIRA link daemon added a comment - 2009-07-20 18:58 Code changed in hudson User: : kohsuke Path: branches/matrix-parent/core/src/main/java/hudson/matrix/MatrixProject.java branches/matrix-parent/core/src/main/java/hudson/model/Computer.java branches/matrix-parent/core/src/main/java/hudson/model/Executor.java branches/matrix-parent/core/src/main/java/hudson/model/OneOffExecutor.java branches/matrix-parent/core/src/main/java/hudson/model/Queue.java branches/matrix-parent/core/src/main/resources/lib/hudson/executors.jelly http://fisheye4.cenqua.com/changelog/hudson/?cs=19998 Log: JENKINS-936 I believe this should do.

Kohsuke Kawaguchi added a comment - 2009-07-21 13:57

Hudson 1.317 will include the fix for this, but because of a potential impact to
users, the fix is disabled by default for now. I'd like interested parties to
enable this (by setting the system property
hudson.model.Hudson.flyweightSupport=true on Hudson JVM), and report back if
this is working OK for you.

If the fix appears to work without any side effect, I'll enable the fix by default.

Kohsuke Kawaguchi added a comment - 2009-07-21 13:57 Hudson 1.317 will include the fix for this, but because of a potential impact to users, the fix is disabled by default for now. I'd like interested parties to enable this (by setting the system property hudson.model.Hudson.flyweightSupport=true on Hudson JVM), and report back if this is working OK for you. If the fix appears to work without any side effect, I'll enable the fix by default.

hydraswitch added a comment - 2009-08-27 12:15

I set the property as described and hit Build Now for my multi configuration job.
The one Node that I did not select the job for is now showing a Dead thread?
or something. Not sure what happened. It goes away if I stop and restart hudson.
It comes back again each time the job runs. It seems to run correctly on the nodes
that I checked it on for.

hydraswitch added a comment - 2009-08-27 12:15 I set the property as described and hit Build Now for my multi configuration job. The one Node that I did not select the job for is now showing a Dead thread? or something. Not sure what happened. It goes away if I stop and restart hudson. It comes back again each time the job runs. It seems to run correctly on the nodes that I checked it on for.

emmulator added a comment - 2009-09-23 10:50

I was hoping this might help with the problems I've encountered in Issue 1022.
It does seem to address the problem of the deadlock, but without the ability to
have the parent job run on a different slave node from any of the children, the
interaction with perforce still causes one of the children to not sync its
workspace.

I found this issue when looking for how I would expose the
assignedNode/hasSlaveAffinity property of a MatrixProject, in the hopes that if
I could tie the parent to a particular node, it would no longer interfere with
the children. I've noticed that a few other people in this ticket were thinking
along the same lines, but you have chosen this virtual executor approach
instead. Is there a reason it would not be desirable to be able to tie the
parent to a particular node? And even if you don't like that approach for the
general release, could you please point me to where in the source I would go to
expose that property? I realize that the 'real' solution to Issue 1022 probably
involves modifying the perforce plugin, but that seems more complicated, and I
was hoping this would work as a workaround in the meantime.

Thanks!

emmulator added a comment - 2009-09-23 10:50 I was hoping this might help with the problems I've encountered in Issue 1022. It does seem to address the problem of the deadlock, but without the ability to have the parent job run on a different slave node from any of the children, the interaction with perforce still causes one of the children to not sync its workspace. I found this issue when looking for how I would expose the assignedNode/hasSlaveAffinity property of a MatrixProject, in the hopes that if I could tie the parent to a particular node, it would no longer interfere with the children. I've noticed that a few other people in this ticket were thinking along the same lines, but you have chosen this virtual executor approach instead. Is there a reason it would not be desirable to be able to tie the parent to a particular node? And even if you don't like that approach for the general release, could you please point me to where in the source I would go to expose that property? I realize that the 'real' solution to Issue 1022 probably involves modifying the perforce plugin, but that seems more complicated, and I was hoping this would work as a workaround in the meantime. Thanks!

huybrechts added a comment - 2009-09-27 09:24

When using the experimental flyweight support, I noticed that one of my (drools)
builds was scheduled on an offline jnlp slave. It was very hard to detect, since
the Drools plugin does not actually use the slave. Because the slave was
offline, Computer.defaultCharset was null, which results in an NPE in Run when
the logfile is created.

huybrechts added a comment - 2009-09-27 09:24 When using the experimental flyweight support, I noticed that one of my (drools) builds was scheduled on an offline jnlp slave. It was very hard to detect, since the Drools plugin does not actually use the slave. Because the slave was offline, Computer.defaultCharset was null, which results in an NPE in Run when the logfile is created.

mdonohue added a comment - 2009-10-01 23:04

- - Issue 4552 has been marked as a duplicate of this issue. ***

mdonohue added a comment - 2009-10-01 23:04 Issue 4552 has been marked as a duplicate of this issue. ***

nairb774 added a comment - 2009-11-06 20:46

Digging through the code today I found this feature and turned it on. Sadly, I
ran into the same problem hydraswitch did. Looking into why the executor/thread
died shows nothing on the dead page, but in the main hudson log I saw the following:

Nov 6, 2009 10:40:36 PM hudson.ExpressionFactory2$JexlExpression evaluate
WARNING: Caught exception evaluating: h.printThrowable(it.causeOfDeath). Reason:
java.lang.NullPointerException
java.lang.NullPointerException
at hudson.Functions.printThrowable(Functions.java:916)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
<<SNIP - Full stack can be provided on request, but it is mostly non-descriptive
jelly frames>>

I am running 1.330 currently and would love for this to work. I took a look
over the code and the code changes and I can't seem to see where this might be
falling flat. Anything I can do to help diagnose what hydraswitch and I are seeing?

Brian

nairb774 added a comment - 2009-11-06 20:46 Digging through the code today I found this feature and turned it on. Sadly, I ran into the same problem hydraswitch did. Looking into why the executor/thread died shows nothing on the dead page, but in the main hudson log I saw the following: Nov 6, 2009 10:40:36 PM hudson.ExpressionFactory2$JexlExpression evaluate WARNING: Caught exception evaluating: h.printThrowable(it.causeOfDeath). Reason: java.lang.NullPointerException java.lang.NullPointerException at hudson.Functions.printThrowable(Functions.java:916) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) <<SNIP - Full stack can be provided on request, but it is mostly non-descriptive jelly frames>> I am running 1.330 currently and would love for this to work. I took a look over the code and the code changes and I can't seem to see where this might be falling flat. Anything I can do to help diagnose what hydraswitch and I are seeing? Brian

nairb774 added a comment - 2009-11-08 10:25

I think I nailed down where the dead executor is coming from. In OneOffExecutor
you have in the constructor:

super(owner, -1);
this.item = item;

In the super constructor (Executor), the last line is a call to Thread.start.
If the thread is able to start and complete shouldRun before the "this.item =
item" line is run, the executor finishes out and shows up dead. On a
multi-processor computer this race condition is quite common because of the lack
of locking/fencing.

Two solutions to this, one move the start call out of the executor constructor,
and the other is to put in necessary locking around the setting of that field.
I would argue that the best solution be the moving of the start method, but that
might be better saved for an enhancement at a later time.

I will attach a patch in a few minutes that should fix this dead executor problem.

nairb774 added a comment - 2009-11-08 10:25 I think I nailed down where the dead executor is coming from. In OneOffExecutor you have in the constructor: super(owner, -1); this.item = item; In the super constructor (Executor), the last line is a call to Thread.start. If the thread is able to start and complete shouldRun before the "this.item = item" line is run, the executor finishes out and shows up dead. On a multi-processor computer this race condition is quite common because of the lack of locking/fencing. Two solutions to this, one move the start call out of the executor constructor, and the other is to put in necessary locking around the setting of that field. I would argue that the best solution be the moving of the start method, but that might be better saved for an enhancement at a later time. I will attach a patch in a few minutes that should fix this dead executor problem.

nairb774 added a comment - 2009-11-08 10:35

Created an attachment (id=995)
Patch to provide proper locking for OneOffExecutor

nairb774 added a comment - 2009-11-08 10:35 Created an attachment (id=995) Patch to provide proper locking for OneOffExecutor

hydraswitch added a comment - 2009-11-09 09:17

Here's to hoping this will appear in a build soon!
I'd really like to be able to try it out, but considering all I have
going on at the moment, I'll not be trying to build Hudson from svn.

Thanks for the work on this Brian!

hydraswitch added a comment - 2009-11-09 09:17 Here's to hoping this will appear in a build soon! I'd really like to be able to try it out, but considering all I have going on at the moment, I'll not be trying to build Hudson from svn. Thanks for the work on this Brian!

mdillon added a comment - 2009-11-10 17:21

Perhaps it would be better to just move the start() call out of the Executor
constructor. The only place that "new Executor(...)" is called is in
Computer.setNumExecutors(int) and the only class that extends Executor is
OneOffExecutor. The text "new Executor" or "extends Executor" doesn't show up in
any of the plugins either.

It seems like moving the call to Executor.start() from the Executor constructor
to the while loop in setNumExecutors would be workable and simpler solution.

mdillon added a comment - 2009-11-10 17:21 Perhaps it would be better to just move the start() call out of the Executor constructor. The only place that "new Executor(...)" is called is in Computer.setNumExecutors(int) and the only class that extends Executor is OneOffExecutor. The text "new Executor" or "extends Executor" doesn't show up in any of the plugins either. It seems like moving the call to Executor.start() from the Executor constructor to the while loop in setNumExecutors would be workable and simpler solution.

mdillon added a comment - 2009-11-10 17:26

In addition, Computer.startFlyWeightTask would also have to know to call start().

Another possibility would be to add an alternate Executor constructor that takes
something like this:

class Initializer<E extends Executor> {
void initialize(E e);
}

This thing could be called from the Executor constructor before calling start().
Subclasses would be expected to put anything they wanted to initialize before
start() is called in a new Initializer.

mdillon added a comment - 2009-11-10 17:26 In addition, Computer.startFlyWeightTask would also have to know to call start(). Another possibility would be to add an alternate Executor constructor that takes something like this: class Initializer<E extends Executor> { void initialize(E e); } This thing could be called from the Executor constructor before calling start(). Subclasses would be expected to put anything they wanted to initialize before start() is called in a new Initializer.

nairb774 added a comment - 2009-11-10 18:04

I would suggest the start method is moved. The one downside to the initalizer
approach is that from within the initalizer you will not be able to set final
fields.

Also, a good rule of thumb is to not expose objects to other threads until the
constructor returns. Looking in the JLS 17.5 there are the following lines:

"The detailed semantics of final fields are somewhat different from those of
normal fields. In particular, compilers have a great deal of freedom to move
reads of final fields across synchronization barriers and calls to arbitrary or
unknown methods. Correspondingly, compilers are allowed to keep the value of a
final field cached in a register and not reload it from memory in situations
where a non-final field would have to be reloaded."

and the comment:

"An object is considered to be completely initialized when its constructor
finishes. A thread that can only see a reference to an object after that object
has been completely initialized is guaranteed to see the correctly initialized
values for that object's final fields."

The implications (as I read it - and I am by no means an expert) is that even
with locking and other thread safe actions the only way to be sure final fields
will be properly visible is after the completion of the constructor. Any time
earlier then that you are left exposed to the whims compiler and caching
performed by the cpu.

We could provide a smarter start method and have it return the executor itself.
I.e: public Executor start()

{ super.start(); return this; }

By doing this you
can chain the constructor and the start in a nice compact manner:
executors.add(new Executor(...).start()); which is what this feels like it was
trying to do by placing start at the end of the constructor.

nairb774 added a comment - 2009-11-10 18:04 I would suggest the start method is moved. The one downside to the initalizer approach is that from within the initalizer you will not be able to set final fields. Also, a good rule of thumb is to not expose objects to other threads until the constructor returns. Looking in the JLS 17.5 there are the following lines: "The detailed semantics of final fields are somewhat different from those of normal fields. In particular, compilers have a great deal of freedom to move reads of final fields across synchronization barriers and calls to arbitrary or unknown methods. Correspondingly, compilers are allowed to keep the value of a final field cached in a register and not reload it from memory in situations where a non-final field would have to be reloaded." and the comment: "An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields." The implications (as I read it - and I am by no means an expert) is that even with locking and other thread safe actions the only way to be sure final fields will be properly visible is after the completion of the constructor. Any time earlier then that you are left exposed to the whims compiler and caching performed by the cpu. We could provide a smarter start method and have it return the executor itself. I.e: public Executor start() { super.start(); return this; } By doing this you can chain the constructor and the start in a nice compact manner: executors.add(new Executor(...).start()); which is what this feels like it was trying to do by placing start at the end of the constructor.

nairb774 added a comment - 2009-11-11 11:44

Another note from testing, the isIdle() method and related methods will need to
be updated. I have a plugin which interfaces with our internal cloud system and
uses the idle state of the computers to manage undeployment/retention. Fun
surprise when the computer hosting the parent project of a matrix project goes
away because it is "idle". I'll post a patch this evening of my fix if I get a
chance, but I am pretty sure it is missing a few changes to make it a proper
solution.

nairb774 added a comment - 2009-11-11 11:44 Another note from testing, the isIdle() method and related methods will need to be updated. I have a plugin which interfaces with our internal cloud system and uses the idle state of the computers to manage undeployment/retention. Fun surprise when the computer hosting the parent project of a matrix project goes away because it is "idle". I'll post a patch this evening of my fix if I get a chance, but I am pretty sure it is missing a few changes to make it a proper solution.

mdillon added a comment - 2009-11-12 10:01

Created an attachment (id=1006)
Patch to add an ExecutorList class

mdillon added a comment - 2009-11-12 10:01 Created an attachment (id=1006) Patch to add an ExecutorList class

mdillon added a comment - 2009-11-12 10:02

I've attached a patch with an alternate approach for the start() issue.

The patch adds an ExecutorList subclass of CopyOnWriteArrayList that will ensure
that the Executor has been started before it is added to the list and made
available to the rest of Hudson. This way, the start() is still contained to a
single class, but Executor can be safely subclassed.

mdillon added a comment - 2009-11-12 10:02 I've attached a patch with an alternate approach for the start() issue. The patch adds an ExecutorList subclass of CopyOnWriteArrayList that will ensure that the Executor has been started before it is added to the list and made available to the rest of Hudson. This way, the start() is still contained to a single class, but Executor can be safely subclassed.

mdillon added a comment - 2009-11-12 10:06

P.S. I only implemented add(E). If the add(int,E) and set(int,E) methods need to
be overridden too, I can update the patch. Looking at the docs for AbstractList,
it looks like overriding add(int,E) would have been preferable to overriding
add(E)... I guess I'll update the patch.

mdillon added a comment - 2009-11-12 10:06 P.S. I only implemented add(E). If the add(int,E) and set(int,E) methods need to be overridden too, I can update the patch. Looking at the docs for AbstractList, it looks like overriding add(int,E) would have been preferable to overriding add(E)... I guess I'll update the patch.

mdillon added a comment - 2009-11-12 10:09

Man, I hate when I comment before I finish looking at something thoroughly...

CopyOnWriteArrayList doesn't extend AbstractList, so it isn't clear what needs
to be overridden for a comprehensive fix without looking at the source. I guess
I'll leave this patch as-is for now.

mdillon added a comment - 2009-11-12 10:09 Man, I hate when I comment before I finish looking at something thoroughly... CopyOnWriteArrayList doesn't extend AbstractList, so it isn't clear what needs to be overridden for a comprehensive fix without looking at the source. I guess I'll leave this patch as-is for now.

emmulator added a comment - 2009-11-12 12:24

I have been using the system property
hudson.model.Hudson.flyweightSupport=true successfully for a couple of weeks now
to prevent the deadlock. I haven't put the Hudson cluster into 'production'
yet, but I have been running several builds a day as I determine how best to
utilize and configure Hudson.

I still think it might help with issue 1022 to be able to assign the parent Job
to a particular Node. But I have definitely seen child Jobs of the same Matrix
build running on the same Node as the parent without any deadlock.

emmulator added a comment - 2009-11-12 12:24 I have been using the system property hudson.model.Hudson.flyweightSupport=true successfully for a couple of weeks now to prevent the deadlock. I haven't put the Hudson cluster into 'production' yet, but I have been running several builds a day as I determine how best to utilize and configure Hudson. I still think it might help with issue 1022 to be able to assign the parent Job to a particular Node. But I have definitely seen child Jobs of the same Matrix build running on the same Node as the parent without any deadlock.

hydraswitch added a comment - 2009-11-12 12:41

Regarding the comment from emmulator -

At least for my situation, the problem is that I do not want the
jobs to run on the master. I want to be able to setup a multi-configuration
job that I specify the slave(s) it should be run on. Currently, for
multi-configuration jobs - Hudson insists on running the job on the master
as well. The deadlock occurs as a result of using 1 executor.
I was seeing dead threads when I turned on flyweightSupport.

Ideally, what I want is:

1 executor per slave (dependency issues in my build tree)
1 or more executors per master (only 1 master)
no actual build jobs to run on the master, only on the slaves

And I'll also add that I'm hopeful that with all the activity on this bug
we'll eventually get to something that works for me. In a week or two,
things will calm down enough here that I'll be able to build hudson from
source and apply that patches to try them out. But I'm still hoping that the
fix will appear in a "released" build before then.

hydraswitch added a comment - 2009-11-12 12:41 Regarding the comment from emmulator - At least for my situation, the problem is that I do not want the jobs to run on the master. I want to be able to setup a multi-configuration job that I specify the slave(s) it should be run on. Currently, for multi-configuration jobs - Hudson insists on running the job on the master as well. The deadlock occurs as a result of using 1 executor. I was seeing dead threads when I turned on flyweightSupport. Ideally, what I want is: 1 executor per slave (dependency issues in my build tree) 1 or more executors per master (only 1 master) no actual build jobs to run on the master, only on the slaves And I'll also add that I'm hopeful that with all the activity on this bug we'll eventually get to something that works for me. In a week or two, things will calm down enough here that I'll be able to build hudson from source and apply that patches to try them out. But I'm still hoping that the fix will appear in a "released" build before then.

mdillon added a comment - 2009-11-13 14:26

I took a minute to look at the code for CopyOnWriteArrayList. Unfortunately,
there is pretty much no code shared between any of the mutator methods, so the
only comprehensive way to do this would be to go and put Executor.start() calls
into an overridden method for each of them.

Since this class is intended for limited use inside of Hudson, it seems
reasonable to me to just override the ones other than add(E) to throw
UnsupportedOperationException instead. They can always be implemented later if
they are needed.

mdillon added a comment - 2009-11-13 14:26 I took a minute to look at the code for CopyOnWriteArrayList. Unfortunately, there is pretty much no code shared between any of the mutator methods, so the only comprehensive way to do this would be to go and put Executor.start() calls into an overridden method for each of them. Since this class is intended for limited use inside of Hudson, it seems reasonable to me to just override the ones other than add(E) to throw UnsupportedOperationException instead. They can always be implemented later if they are needed.

pcc added a comment - 2009-11-24 01:28

While testing the flyweight support I found that Hudson can sometimes schedule
the parent job on an offline slave leading to an NPE. This seems related to the
issue huybrechts reported. I developed a patch to prevent this which I am
attaching to this issue.

java.lang.NullPointerException
at hudson.model.Slave.createLauncher(Slave.java:309)
at hudson.model.AbstractBuild$AbstractRunner.createLauncher(AbstractBuild.java:417)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:369)
at hudson.model.Run.run(Run.java:1198)
at hudson.matrix.MatrixBuild.run(MatrixBuild.java:149)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:123)
at hudson.model.OneOffExecutor.run(OneOffExecutor.java:60)

pcc added a comment - 2009-11-24 01:28 While testing the flyweight support I found that Hudson can sometimes schedule the parent job on an offline slave leading to an NPE. This seems related to the issue huybrechts reported. I developed a patch to prevent this which I am attaching to this issue. java.lang.NullPointerException at hudson.model.Slave.createLauncher(Slave.java:309) at hudson.model.AbstractBuild$AbstractRunner.createLauncher(AbstractBuild.java:417) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:369) at hudson.model.Run.run(Run.java:1198) at hudson.matrix.MatrixBuild.run(MatrixBuild.java:149) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:123) at hudson.model.OneOffExecutor.run(OneOffExecutor.java:60)

pcc added a comment - 2009-11-24 01:31

Created an attachment (id=1036)
[PATCH] Do not attempt to start flyweight tasks on offline nodes

pcc added a comment - 2009-11-24 01:31 Created an attachment (id=1036) [PATCH] Do not attempt to start flyweight tasks on offline nodes

SCM/JIRA link daemon added a comment - 2009-11-27 16:57

Code changed in hudson
User: : kohsuke
Path:
trunk/hudson/main/core/src/main/java/hudson/model/Queue.java
http://fisheye4.cenqua.com/changelog/hudson/?cs=24149
Log:
~~JENKINS-936~~ Applied the patch from pcc.

SCM/JIRA link daemon added a comment - 2009-11-27 16:57 Code changed in hudson User: : kohsuke Path: trunk/hudson/main/core/src/main/java/hudson/model/Queue.java http://fisheye4.cenqua.com/changelog/hudson/?cs=24149 Log: JENKINS-936 Applied the patch from pcc.

Kohsuke Kawaguchi added a comment - 2009-11-27 18:48

Incorrectly marked as resolved due to a bug in scm_issue_link.

Kohsuke Kawaguchi added a comment - 2009-11-27 18:48 Incorrectly marked as resolved due to a bug in scm_issue_link.

SCM/JIRA link daemon added a comment - 2009-11-28 07:20

Code changed in hudson
User: : kohsuke
Path:
trunk/hudson/main/core/src/main/java/hudson/model/Computer.java
trunk/hudson/main/core/src/main/java/hudson/model/Executor.java
trunk/hudson/main/core/src/main/java/hudson/model/Hudson.java
trunk/www/changelog.html
http://fisheye4.cenqua.com/changelog/hudson/?cs=24159
Log:
[FIXED JENKINS-936] in 1.337 by moving out the start method from Executor.
Since not many code will own Executors, for now I didn't introduce any wrapper to hide the start method invocation.

SCM/JIRA link daemon added a comment - 2009-11-28 07:20 Code changed in hudson User: : kohsuke Path: trunk/hudson/main/core/src/main/java/hudson/model/Computer.java trunk/hudson/main/core/src/main/java/hudson/model/Executor.java trunk/hudson/main/core/src/main/java/hudson/model/Hudson.java trunk/www/changelog.html http://fisheye4.cenqua.com/changelog/hudson/?cs=24159 Log: [FIXED JENKINS-936] in 1.337 by moving out the start method from Executor. Since not many code will own Executors, for now I didn't introduce any wrapper to hide the start method invocation.

akostadinov added a comment - 2009-12-10 07:13

hallo,

I tested the feature with 1.335

It seem to be working fine except for two issues:
1. When all slaves are marked offline including master, then the matrix parent job is put in the queue waiting. This is totally correct. But once a slave is put online, the job does one of the following:
1.1. (if slave is "utilize as much as possible") executes by an existing executor
1.2. (if slave is "for tied jobs only") waits forever in the queue
2. Currently parent job executes on a random slave. IMHO it should try to run first on a slave that is "utilize as much as possible" or one should be able to specify where the job should execute. Otherwise one cannot guarantee job is not executed on a machine running a performance test and render the results invalid.

Thanks!

akostadinov added a comment - 2009-12-10 07:13 hallo, I tested the feature with 1.335 It seem to be working fine except for two issues: 1. When all slaves are marked offline including master, then the matrix parent job is put in the queue waiting. This is totally correct. But once a slave is put online, the job does one of the following: 1.1. (if slave is "utilize as much as possible") executes by an existing executor 1.2. (if slave is "for tied jobs only") waits forever in the queue 2. Currently parent job executes on a random slave. IMHO it should try to run first on a slave that is "utilize as much as possible" or one should be able to specify where the job should execute. Otherwise one cannot guarantee job is not executed on a machine running a performance test and render the results invalid. Thanks!

akostadinov added a comment - 2009-12-11 02:04

Excuse me for reopening. I looked at "fixed" field and didn't se a version there so thought just installed 1.335 here is recent enough. Now I notice last comment is talking about 1.337. So marking resolved again.

It would be nice to use the "Fixed" field in the future as that is what probably most users are used to.

Regards

akostadinov added a comment - 2009-12-11 02:04 Excuse me for reopening. I looked at "fixed" field and didn't se a version there so thought just installed 1.335 here is recent enough. Now I notice last comment is talking about 1.337. So marking resolved again. It would be nice to use the "Fixed" field in the future as that is what probably most users are used to. Regards

mdonohue added a comment - 2009-12-14 06:54

JENKINS-5076 is a request to avoid 'tied' slaves for the master checkout. Is that fixed in 1.337?

mdonohue added a comment - 2009-12-14 06:54 JENKINS-5076 is a request to avoid 'tied' slaves for the master checkout. Is that fixed in 1.337?

TimoTM added a comment - 2009-12-15 00:40 - edited

We have environment where some slaves (executing tests) do not have svn client installed. After upgrading to 1.337 (I don't know if it was just coincidence), multiconfiguration parent jobs started to get executed on such slaves. At least jobs that have been configured to do SCM polling fail in that point.

Is there no way to tell Hudson not to build the parent on such a slave? ("leave this machine for tied jobs only" has been configured)

TimoTM added a comment - 2009-12-15 00:40 - edited We have environment where some slaves (executing tests) do not have svn client installed. After upgrading to 1.337 (I don't know if it was just coincidence), multiconfiguration parent jobs started to get executed on such slaves. At least jobs that have been configured to do SCM polling fail in that point. Is there no way to tell Hudson not to build the parent on such a slave? ("leave this machine for tied jobs only" has been configured)

Axel Heider added a comment - 2010-08-05 08:55

Problem appears again with Hudson V1.369 wit a windows master and a linux slave node. I have two executors on the slave node and two matrix jobs tied to it (via tie-matrix-partent plugin and all matrix job). When they were started via SVN checkin the matrix job 1 parent took executor 1 and matrix job 2 parent took executor 2. Then both started their matrix-config jobs, but none can run as executors are blocked. I had to kill one parent job to resolve this. However, it keeps happening again.

Axel Heider added a comment - 2010-08-05 08:55 Problem appears again with Hudson V1.369 wit a windows master and a linux slave node. I have two executors on the slave node and two matrix jobs tied to it (via tie-matrix-partent plugin and all matrix job). When they were started via SVN checkin the matrix job 1 parent took executor 1 and matrix job 2 parent took executor 2. Then both started their matrix-config jobs, but none can run as executors are blocked. I had to kill one parent job to resolve this. However, it keeps happening again.

Sagar Khushalani added a comment - 2010-08-05 12:11

I have a project running in multi-configuration mode and it seems to work fine, except that the parent job seems to run on a random slave, which becomes problematic. Is it possible add an option to tie the parent to a particular slave?

Sagar Khushalani added a comment - 2010-08-05 12:11 I have a project running in multi-configuration mode and it seems to work fine, except that the parent job seems to run on a random slave, which becomes problematic. Is it possible add an option to tie the parent to a particular slave?

Axel Heider added a comment - 2010-08-05 12:16

there is a tie-matrix-parent pluging thatr became available some time ago, see http://wiki.jenkins-ci.org/display/JENKINS/Matrix+Tie+Parent+Plugin It will do what you want.

Axel Heider added a comment - 2010-08-05 12:16 there is a tie-matrix-parent pluging thatr became available some time ago, see http://wiki.jenkins-ci.org/display/JENKINS/Matrix+Tie+Parent+Plugin It will do what you want.

Tzuchien added a comment - 2010-10-29 04:40

What is the current status of this long-lived issue, please? According to the release note, this issued has been resolved in 1.337. I didn't have this issue with 1.368, but recently after I've upgraded to 1.382, it has just appeared again.

Tzuchien added a comment - 2010-10-29 04:40 What is the current status of this long-lived issue, please? According to the release note, this issued has been resolved in 1.337. I didn't have this issue with 1.368, but recently after I've upgraded to 1.382, it has just appeared again.

Kohsuke Kawaguchi added a comment - 2011-02-04 17:04

The original issue, which is that the matrix parent build consumes an executor, is fixed indeed in 1.337. In the executor table, the parent build will show up without an executor number — think of it as getting executed in a temporary executor.

People then started talking about other related but different issues, such as the fact that the matrix parent execution cannot be tied to a specific node.

Since this issue is getting overloaded with multiple things, I'm closing this bug once and for all.

If you think you hit a bug, please open a new one.

Thanks!

Kohsuke Kawaguchi added a comment - 2011-02-04 17:04 The original issue, which is that the matrix parent build consumes an executor, is fixed indeed in 1.337. In the executor table, the parent build will show up without an executor number — think of it as getting executed in a temporary executor. People then started talking about other related but different issues, such as the fact that the matrix parent execution cannot be tied to a specific node. Since this issue is getting overloaded with multiple things, I'm closing this bug once and for all. If you think you hit a bug, please open a new one . Thanks!

Régis Desgroppes added a comment - 2011-04-06 08:51

I agree with Kohsuke: closed.

Régis Desgroppes added a comment - 2011-04-06 08:51 I agree with Kohsuke: closed.

Ireneusz Makowski added a comment - 2015-02-22 17:35

I still have this issue on latest LTS. It blocks our executors when Restrict label for matrix job is used!

Ireneusz Makowski added a comment - 2015-02-22 17:35 I still have this issue on latest LTS. It blocks our executors when Restrict label for matrix job is used!

Ireneusz Makowski added a comment - 2015-02-22 17:35

See my latest comment

Ireneusz Makowski added a comment - 2015-02-22 17:35 See my latest comment

Daniel Beck added a comment - 2015-02-22 17:42

It blocks our executors when Restrict label for matrix job is used!

Please file a new issue, and include specific information on how to reproduce this.

https://wiki.jenkins-ci.org/display/JENKINS/How+to+report+an+issue

Daniel Beck added a comment - 2015-02-22 17:42 It blocks our executors when Restrict label for matrix job is used! Please file a new issue, and include specific information on how to reproduce this. https://wiki.jenkins-ci.org/display/JENKINS/How+to+report+an+issue

Jenkins

Details

Description

Attachments

Attachments

Issue Links

Activity

Collapse comment: Kohsuke Kawaguchi added a comment - 2007-10-22 18:08

Expand comment: Kohsuke Kawaguchi added a comment - 2007-10-22 18:08

Collapse comment: Régis Desgroppes added a comment - 2007-10-23 01:40

Expand comment: Régis Desgroppes added a comment - 2007-10-23 01:40

Collapse comment: Kohsuke Kawaguchi added a comment - 2007-10-29 17:35

Expand comment: Kohsuke Kawaguchi added a comment - 2007-10-29 17:35

Collapse comment: Kohsuke Kawaguchi added a comment - 2008-03-19 22:54

Expand comment: Kohsuke Kawaguchi added a comment - 2008-03-19 22:54

Collapse comment: musilt2 added a comment - 2008-04-18 04:00

Expand comment: musilt2 added a comment - 2008-04-18 04:00

Collapse comment: musilt2 added a comment - 2008-04-18 04:04

Expand comment: musilt2 added a comment - 2008-04-18 04:04

Collapse comment: mirilovic added a comment - 2008-04-18 08:35

Expand comment: mirilovic added a comment - 2008-04-18 08:35

Collapse comment: Lloyd Chang added a comment - 2008-07-06 23:56

Expand comment: Lloyd Chang added a comment - 2008-07-06 23:56

Collapse comment: SCM/JIRA link daemon added a comment - 2009-07-18 17:39

Expand comment: SCM/JIRA link daemon added a comment - 2009-07-18 17:39

Collapse comment: Kohsuke Kawaguchi added a comment - 2009-07-18 18:45

Expand comment: Kohsuke Kawaguchi added a comment - 2009-07-18 18:45

Collapse comment: Ringo De Smet added a comment - 2009-07-19 01:37

Expand comment: Ringo De Smet added a comment - 2009-07-19 01:37

Collapse comment: SCM/JIRA link daemon added a comment - 2009-07-20 18:58

Expand comment: SCM/JIRA link daemon added a comment - 2009-07-20 18:58

Collapse comment: Kohsuke Kawaguchi added a comment - 2009-07-21 13:57

Expand comment: Kohsuke Kawaguchi added a comment - 2009-07-21 13:57

Collapse comment: hydraswitch added a comment - 2009-08-27 12:15

Expand comment: hydraswitch added a comment - 2009-08-27 12:15

Collapse comment: emmulator added a comment - 2009-09-23 10:50

Expand comment: emmulator added a comment - 2009-09-23 10:50

Collapse comment: huybrechts added a comment - 2009-09-27 09:24

Expand comment: huybrechts added a comment - 2009-09-27 09:24

Collapse comment: mdonohue added a comment - 2009-10-01 23:04

Expand comment: mdonohue added a comment - 2009-10-01 23:04

Collapse comment: nairb774 added a comment - 2009-11-06 20:46

Expand comment: nairb774 added a comment - 2009-11-06 20:46

Collapse comment: nairb774 added a comment - 2009-11-08 10:25

Expand comment: nairb774 added a comment - 2009-11-08 10:25

Collapse comment: nairb774 added a comment - 2009-11-08 10:35

Expand comment: nairb774 added a comment - 2009-11-08 10:35

Collapse comment: hydraswitch added a comment - 2009-11-09 09:17

Expand comment: hydraswitch added a comment - 2009-11-09 09:17

Collapse comment: mdillon added a comment - 2009-11-10 17:21

Expand comment: mdillon added a comment - 2009-11-10 17:21

Collapse comment: mdillon added a comment - 2009-11-10 17:26

Expand comment: mdillon added a comment - 2009-11-10 17:26

Collapse comment: nairb774 added a comment - 2009-11-10 18:04

Expand comment: nairb774 added a comment - 2009-11-10 18:04

Collapse comment: nairb774 added a comment - 2009-11-11 11:44

Expand comment: nairb774 added a comment - 2009-11-11 11:44

Collapse comment: mdillon added a comment - 2009-11-12 10:01

Expand comment: mdillon added a comment - 2009-11-12 10:01

Collapse comment: mdillon added a comment - 2009-11-12 10:02

Expand comment: mdillon added a comment - 2009-11-12 10:02

Collapse comment: mdillon added a comment - 2009-11-12 10:06

Expand comment: mdillon added a comment - 2009-11-12 10:06

Collapse comment: mdillon added a comment - 2009-11-12 10:09

Expand comment: mdillon added a comment - 2009-11-12 10:09

Collapse comment: emmulator added a comment - 2009-11-12 12:24

Expand comment: emmulator added a comment - 2009-11-12 12:24

Collapse comment: hydraswitch added a comment - 2009-11-12 12:41

Expand comment: hydraswitch added a comment - 2009-11-12 12:41

Collapse comment: mdillon added a comment - 2009-11-13 14:26

Expand comment: mdillon added a comment - 2009-11-13 14:26

Collapse comment: pcc added a comment - 2009-11-24 01:28

Expand comment: pcc added a comment - 2009-11-24 01:28

Collapse comment: pcc added a comment - 2009-11-24 01:31

Expand comment: pcc added a comment - 2009-11-24 01:31

Collapse comment: SCM/JIRA link daemon added a comment - 2009-11-27 16:57

Expand comment: SCM/JIRA link daemon added a comment - 2009-11-27 16:57

Collapse comment: Kohsuke Kawaguchi added a comment - 2009-11-27 18:48

Expand comment: Kohsuke Kawaguchi added a comment - 2009-11-27 18:48