Some additional observations for this issue:
The Axis2 builds (there are in total 3 builds for 3 different branches) use locks. What we see is that sometimes two (or more) different builds (for different branches) where triggered at the same time by the completion of a common upstream build (Axiom trunk e.g.). Since the Axis2 builds use a common lock, one would expect that only one starts execution, while the others remain in the build queue. However, what happens is that sometimes, two builds start execution in parallel, with one waiting for the lock (i.e. instead of waiting in the build queue, it is assigned to an executor and waiting there).
Here is a screenshot that shows the problem (The three Axis2 builds all use the same lock):
http://people.apache.org/~veithen/axis2-builds.png
While the build is waiting on the executor to acquire the lock, it is reported as running on the master. E.g. the axis2-1.6 #225 build showed:
"Started 42 min ago
Build is being executed for 42 min on master"
In that particular case, the axis2-1.6 build eventually completed successfully. However, I think (I'm 90% sure) that I saw another Axis2 build that was triggered after the axis2-1.6 #225, but that actually started execution before. That would mean that when blocked builds are assigned to executors, they are no longer executed in FIFO style.
As a conclusion, I think that in order to make progress on this issue, one should first concentrate on the issues that occur when using locks:
- The fact that builds waiting for a lock (on an executor) may be reported as running on master makes it hard to debug things if something gets stuck.
- The fact that execution is not FIFO means that a build may appear to be stuck simply because there is a constant flow of other builds in the queue that use the same lock. (Unfortunately it is not possible to check if that is a valid explanation for the original issue reported in this JIRA)
Note: these observations where made with the following build: Jenkins ver. 1.447-SNAPSHOT (private-01/01/2012 20:43 GMT-olamy)
Some additional observations for this issue:
The Axis2 builds (there are in total 3 builds for 3 different branches) use locks. What we see is that sometimes two (or more) different builds (for different branches) where triggered at the same time by the completion of a common upstream build (Axiom trunk e.g.). Since the Axis2 builds use a common lock, one would expect that only one starts execution, while the others remain in the build queue. However, what happens is that sometimes, two builds start execution in parallel, with one waiting for the lock (i.e. instead of waiting in the build queue, it is assigned to an executor and waiting there).
Here is a screenshot that shows the problem (The three Axis2 builds all use the same lock):
http://people.apache.org/~veithen/axis2-builds.png
While the build is waiting on the executor to acquire the lock, it is reported as running on the master. E.g. the axis2-1.6 #225 build showed:
"Started 42 min ago
Build is being executed for 42 min on master"
In that particular case, the axis2-1.6 build eventually completed successfully. However, I think (I'm 90% sure) that I saw another Axis2 build that was triggered after the axis2-1.6 #225, but that actually started execution before. That would mean that when blocked builds are assigned to executors, they are no longer executed in FIFO style.
As a conclusion, I think that in order to make progress on this issue, one should first concentrate on the issues that occur when using locks:
Note: these observations where made with the following build: Jenkins ver. 1.447-SNAPSHOT (private-01/01/2012 20:43 GMT-olamy)