Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5125

Projects building at the same time despite "Block build when upstream is building" option

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core
    • None
    • Windows and hudson 1.337

      If you have multiple executors (or multiple nodes) then the "Block build when upstream project is building" advanced option is not blocking downstream project during the build of upstream project. Instead, it starts the downstream project immediatelly when upstream starts to build!

      This is the opposite behaviour than help describes: "When this option is checked, Hudson will prevent the project from building when a dependency of this project is in the queue, or building."

      How to reproduce this isse:
      1. Set up 2 or more executors (the same happens with multiple slave nodes) and set quiet period to 0 (to speed up the test)
      2. Create job1 whith these settings:
      -build periodically (or SCM poll), eg. */5 * * * *
      -add a lengthy build step (eg. ping 127.0.0.1 -w 1000 -n 600)
      3. Create job2 with these settings:
      -the same build period as job1 (or at least overlap the build step of job1)
      -set a lengthy build step (eg. ping 127.0.0.1 -w 1000 -n 600)
      -under "Advanced Project Options" check the "Block build when upstream project is building" option!
      -set the "Build after other projects are built" build trigger to job1
      4. Wait until job1 starts
      5. Check job2 build history! It will start building immediatelly!

      The very same happens when you have multiple slave nodes with 1-1 executors.

          [JENKINS-5125] Projects building at the same time despite "Block build when upstream is building" option

          Alan Harder added a comment -

          So in your steps are both job1 and job2 polling SCM and starting at the same time? That would explain your step 5, "start building immediately". Probably job2 is triggered by SCM polling before job1 has started up and started blocking job2.
          Try removing the SCM polling for job2.. when you see job1 startup, click "Build Now" for job2.. do you see the right behavior now?

          Alan Harder added a comment - So in your steps are both job1 and job2 polling SCM and starting at the same time? That would explain your step 5, "start building immediately". Probably job2 is triggered by SCM polling before job1 has started up and started blocking job2. Try removing the SCM polling for job2.. when you see job1 startup, click "Build Now" for job2.. do you see the right behavior now?

          mdonohue added a comment -

          If mutual exclusion is needed, then the 'locks and latches' plugin would be more appropriate.

          This feature to block downstream builds is more about efficiency, to avoid triggering downstream jobs more often than necessary. Because it's about efficiency, rather than correctness, I don't think this is a blocker.

          mdonohue added a comment - If mutual exclusion is needed, then the 'locks and latches' plugin would be more appropriate. This feature to block downstream builds is more about efficiency, to avoid triggering downstream jobs more often than necessary. Because it's about efficiency, rather than correctness, I don't think this is a blocker.

          balazsdan added a comment -

          Ok, you are right. I've tried to simplify the steps, but in the simplification it's lost the meaning.. my config is much more complex and I think that config is not working as it should.

          Here is a better reproduction of my config:
          1. Set up 2 or more executors (the same happens with multiple slave nodes) and set quiet period to 120 secs!
          2. Create job1 with these settings:
          -SCM config
          -SCM poll, eg. */5 * * * *
          -add a lengthy build step (eg. ping 127.0.0.1 -w 1000 -n 600)
          3. Create job2 with these settings:
          -under "Advanced Project Options" check the "Block build when upstream project is building" option!
          -set the "Build after other projects are built" build trigger to job1
          4. Wait until job1 starts, it will show: "pending - In the quiet period."
          5. When job1 started and it is in quiet period, start job2 manually. It will display "pending - Upstream project job1 is already building."
          6. When quiet period elapsed, then both job will start building!

          So not required to start both jobs at the same time!
          I think the transition from "pending - quiet period" state to "building" state should not start (or let start) downstream jobs! Am I right or may I misunderstand something?

          balazsdan added a comment - Ok, you are right. I've tried to simplify the steps, but in the simplification it's lost the meaning.. my config is much more complex and I think that config is not working as it should. Here is a better reproduction of my config: 1. Set up 2 or more executors (the same happens with multiple slave nodes) and set quiet period to 120 secs! 2. Create job1 with these settings: -SCM config -SCM poll, eg. */5 * * * * -add a lengthy build step (eg. ping 127.0.0.1 -w 1000 -n 600) 3. Create job2 with these settings: -under "Advanced Project Options" check the "Block build when upstream project is building" option! -set the "Build after other projects are built" build trigger to job1 4. Wait until job1 starts, it will show: "pending - In the quiet period." 5. When job1 started and it is in quiet period, start job2 manually. It will display "pending - Upstream project job1 is already building." 6. When quiet period elapsed, then both job will start building! So not required to start both jobs at the same time! I think the transition from "pending - quiet period" state to "building" state should not start (or let start) downstream jobs! Am I right or may I misunderstand something?

          balazsdan added a comment -

          Guys, I've checked the latest version (1.351). I see this issue has no progress yet, but I was hoping a side effect of other fixes.

          So, if a downstream project is blocked by upstream project during quiet period, then at the end of quiet period both jobs start to build. You can start the downstream project manually during upstream quiet period, the effect is the same. I think it's a bug, am I right?

          balazsdan added a comment - Guys, I've checked the latest version (1.351). I see this issue has no progress yet, but I was hoping a side effect of other fixes. So, if a downstream project is blocked by upstream project during quiet period, then at the end of quiet period both jobs start to build. You can start the downstream project manually during upstream quiet period, the effect is the same. I think it's a bug, am I right?

          drazi added a comment -

          I think I may have a solution to this issue. This is a showstopper for me so I've been spending some time trying to work out what's going on.

          Queue.contains(Task) checks if the task is in blockedProjects, buildables or waitingList. But tasks are moved from buildables to popped before they start to execute, so there is a short window where Queue.contains(Task) returns false and Job.getLastBuild().isBuilding() also returns false.

          Adding the following code to Queue.contains(Task) fixes this for me. All that's needed is to consider anything in popped still to be in the queue. Tasks don't get removed from popped until after they've created their new Build, so this closes the window.

          for(Item item: popped.keySet()) {
              if(item.task == t)
                  return true;
              }
          }
          

          drazi added a comment - I think I may have a solution to this issue. This is a showstopper for me so I've been spending some time trying to work out what's going on. Queue.contains(Task) checks if the task is in blockedProjects , buildables or waitingList . But tasks are moved from buildables to popped before they start to execute, so there is a short window where Queue.contains(Task) returns false and Job.getLastBuild().isBuilding() also returns false. Adding the following code to Queue.contains(Task) fixes this for me. All that's needed is to consider anything in popped still to be in the queue. Tasks don't get removed from popped until after they've created their new Build, so this closes the window. for (Item item: popped.keySet()) { if (item.task == t) return true ; } }

          drazi added a comment -

          After more testing, I've found that there is a second bug at play here. In addition to the fix that I described in my previous comment, the following fix is also required:

          Job.isBuilding() needs to be changed to:

          public boolean isBuilding() {
              RunT b = getLastBuild();
              return b!=null && b.isLogUpdated();
          }
          

          Say we have three projects, A, B and C.
          A is configured to trigger B and C.
          B is configured to trigger C.

          A completes, so B and C are added to the queue. B starts running immediately, and C is blocked because it depends on B.

          When B completes, it changes its state to POST_PRODUCTION. Run.isBuilding() then returns false for B which allows C to start executing. But this all happens before B triggers C. When the trigger occurs, C is already building so it gets put back into the queue and builds for a second time.

          The fix above causes C not to start building until after B has run its triggers. So when B's triggers run, C is still in the queue and does not get rescheduled.

          With the combination of this change and the previous change to Queue.contains(Task), I can now trigger a build of a complex tree of interdependent modules, without any of them executing more than once.

          drazi added a comment - After more testing, I've found that there is a second bug at play here. In addition to the fix that I described in my previous comment, the following fix is also required: Job.isBuilding() needs to be changed to: public boolean isBuilding() { RunT b = getLastBuild(); return b!= null && b.isLogUpdated(); } Say we have three projects, A, B and C. A is configured to trigger B and C. B is configured to trigger C. A completes, so B and C are added to the queue. B starts running immediately, and C is blocked because it depends on B. When B completes, it changes its state to POST_PRODUCTION . Run.isBuilding() then returns false for B which allows C to start executing. But this all happens before B triggers C. When the trigger occurs, C is already building so it gets put back into the queue and builds for a second time. The fix above causes C not to start building until after B has run its triggers. So when B's triggers run, C is still in the queue and does not get rescheduled. With the combination of this change and the previous change to Queue.contains(Task), I can now trigger a build of a complex tree of interdependent modules, without any of them executing more than once.

          Alan Harder added a comment -

          For the first item you mentioned, see this recent discussion:
          http://hudson.361315.n4.nabble.com/Patch-to-fix-concurrent-build-problem-td2229136.html
          So that "popped" structure added in 1.360 may be refactored soon.

          Alan Harder added a comment - For the first item you mentioned, see this recent discussion: http://hudson.361315.n4.nabble.com/Patch-to-fix-concurrent-build-problem-td2229136.html So that "popped" structure added in 1.360 may be refactored soon.

          Alan Harder added a comment -

          Can someone test a dev build of 1.363 and let us know if this issue is resolved?
          http://ci.hudson-labs.org/job/hudson_main_trunk/lastSuccessfulBuild/artifact/main/war/target/hudson.war

          Alan Harder added a comment - Can someone test a dev build of 1.363 and let us know if this issue is resolved? http://ci.hudson-labs.org/job/hudson_main_trunk/lastSuccessfulBuild/artifact/main/war/target/hudson.war

          shaberman added a comment -

          I just tried trunk [32088] and am still seeing up/down stream builds run at the same time. Timeline:

          1. I pushed out a change that affected both /baseproj/foo.txt and /subproj/bar.txt. They both got queued, but did not run because another project with their lock (locks and latches) was running.
          2. I canceled the job holding the lock
          3. Both baseproj and subproj started building due to their technically-separate-but-same-changeset SCM changes (should not have happened--subproj is downstream of baseproj and has Block Build when Upstream Building checked)
          4. subproj got done first
          5. baseproj got done and triggered downstream builds for subproj, subprojB, and subprojC
          6. All of subproj, subprojB, and subprojC started building (should not have happened, they all try and get the same lock from the basically never works locks and latches plugin)

          ...even stranger, subproj has two builds running at the same time now. Both say started by the same upstream baseproj build number. Like it fired them twice.

          Also strange, both of them are stuck, as in no console output and turning red because they're supposed to be done by now. Same thing with subprojC, even though subprojB seems to have won the lottery and is running fine.

          subprojB just got done running and subproj (build N) and subproj (build N+1) and subprojC are still stuck. Going to kill them now, I guess.

          shaberman added a comment - I just tried trunk [32088] and am still seeing up/down stream builds run at the same time. Timeline: 1. I pushed out a change that affected both /baseproj/foo.txt and /subproj/bar.txt. They both got queued, but did not run because another project with their lock (locks and latches) was running. 2. I canceled the job holding the lock 3. Both baseproj and subproj started building due to their technically-separate-but-same-changeset SCM changes (should not have happened--subproj is downstream of baseproj and has Block Build when Upstream Building checked) 4. subproj got done first 5. baseproj got done and triggered downstream builds for subproj, subprojB, and subprojC 6. All of subproj, subprojB, and subprojC started building (should not have happened, they all try and get the same lock from the basically never works locks and latches plugin) ...even stranger, subproj has two builds running at the same time now. Both say started by the same upstream baseproj build number. Like it fired them twice. Also strange, both of them are stuck, as in no console output and turning red because they're supposed to be done by now. Same thing with subprojC, even though subprojB seems to have won the lottery and is running fine. subprojB just got done running and subproj (build N) and subproj (build N+1) and subprojC are still stuck. Going to kill them now, I guess.

          drazi added a comment -

          I can confirm that it's not working with trunk [32135].

          I triggered a build of a project that has two downstream projects, one of which depends on the other. When the build completed, the two downstream projects were built concurrently when they should have been built in sequence.

          drazi added a comment - I can confirm that it's not working with trunk [32135] . I triggered a build of a project that has two downstream projects, one of which depends on the other. When the build completed, the two downstream projects were built concurrently when they should have been built in sequence.

          shaberman added a comment -

          I just had multiple builds of the same job happen at the same time and both get locked up. This also happened on a 16 Jun 2010 trunk build, but I don't see any commits between then and now that would change the behavior.

          It is odd that I never saw concurrent/stuck builds before this--from my standpoint, the "Patch to fix concurrent build problem" actually introduced the bug it was supposed to be fixing.

          shaberman added a comment - I just had multiple builds of the same job happen at the same time and both get locked up. This also happened on a 16 Jun 2010 trunk build, but I don't see any commits between then and now that would change the behavior. It is odd that I never saw concurrent/stuck builds before this--from my standpoint, the "Patch to fix concurrent build problem" actually introduced the bug it was supposed to be fixing.

          shaberman added a comment -

          Running [32268], this seems to be working for me. FWIW, I have synchronous polling turned on.

          drazi: this bug is not about downstream projects building in sequence vs. concurrently. AFAIK, you need something like the locks & latches plugin for that.

          shaberman added a comment - Running [32268] , this seems to be working for me. FWIW, I have synchronous polling turned on. drazi: this bug is not about downstream projects building in sequence vs. concurrently. AFAIK, you need something like the locks & latches plugin for that.

          balazsdan added a comment -

          Unfortunately, the fix is not working for me! Tried 1.364.
          As I wrote in my previous comment, after job1 "pending - quiet period" elapsed, then job2 immediatelly starts. So job1 is still building, but job2 starts..

          balazsdan added a comment - Unfortunately, the fix is not working for me! Tried 1.364. As I wrote in my previous comment, after job1 "pending - quiet period" elapsed, then job2 immediatelly starts. So job1 is still building, but job2 starts..

          Daniel Beck added a comment -

          Is anyone still experiencing this problem on recent Jenkins versions?

          Daniel Beck added a comment - Is anyone still experiencing this problem on recent Jenkins versions?

          Jason Davis added a comment -

          Yes - 1.570 (Windows). Also left a comment in JENKINS-5150

          Jason Davis added a comment - Yes - 1.570 (Windows). Also left a comment in JENKINS-5150

          pjdarton added a comment -

          Yes.
          I have a lot of chained jobs and I've noticed that if A triggers B which triggers C, then whilst B and C won't build while A is building, they may well build whilst A is pending, and C might well start to build when A stops and just before B starts.
          i.e. it seems to be race conditions where the blocking build is stopping and another would-be-blocking build is started "not quite immediately enough".

          My setup is a Windows master and dozens of slaves (mostly Windows, some Linux). Can't remember (offhand) the version number, but it's within a month or two of the latest.

          pjdarton added a comment - Yes. I have a lot of chained jobs and I've noticed that if A triggers B which triggers C, then whilst B and C won't build while A is building, they may well build whilst A is pending, and C might well start to build when A stops and just before B starts. i.e. it seems to be race conditions where the blocking build is stopping and another would-be-blocking build is started "not quite immediately enough". My setup is a Windows master and dozens of slaves (mostly Windows, some Linux). Can't remember (offhand) the version number, but it's within a month or two of the latest.

          Daniel Beck added a comment -

          pjdarton: Seems to behave as designed. There is no guarantee AFAIK that when A finishes, B takes precedence over C. That also seems to be the problem jedavis is experiencing. Note that the original issue report is very different: In it, pjdarton's "A" and "B" would build at the same time!

          Daniel Beck added a comment - pjdarton : Seems to behave as designed. There is no guarantee AFAIK that when A finishes, B takes precedence over C. That also seems to be the problem jedavis is experiencing. Note that the original issue report is very different: In it, pjdarton's "A" and "B" would build at the same time!

          pjdarton added a comment -

          I can't comment on whether or not this is "working as designed" as I've never seen any design specifications for Jenkins, but if this is the case then it simply means that this is a design fault instead of a coding fault. It's stil a fault, it's just that the "finger of blame" points at the design/algorithm instead of the coding itself.

          e.g. I've got a build "A" that creates a library, and a build "B" which consumes that library and provides a second library, and a build "C" which consumes both of those.
          How do I tell Jenkins not to build C until after B has built, and not to build B until after A has finished building, if not through this "block build when upstead is building" option (combined with the "build this one after that one" option)? That is, after all, what that "block build" functionality is intended to give me, is it not?
          If Jenkins goes running builds on C after A but before B then, when C builds, C will get a mismatched A & B and may get a build failure (followed by a working build once the second build has gone through).

          I /had/ rather expected that, if I told Jenkins that B depended on A, and C depended on B, and not to build either B or C "whilst an upstream build was building", then that should have stopped C from building until the builds of A and B had all settled down nicely, and this is "mostly" what happens, except for some race conditions. I don't like race conditions causing my builds to go red - I want my builds to only ever go red when there's a problem with my code, not just because Jenkins built something when it shouldn't have done.
          (I have enough trouble persuading my fellow developers that red builds mean there's a problem with their code, without Jenkins causing spurious redness)

          I maintain that this is a bug - I very much doubt that this is deliberate behaviour - it makes no sense as sometimes it blocks builds and sometimes it doesn't, and it's all down to timing. A build engine like Jenkins shouldn't leave behaviour down to "timing", it should be very very predictable and leave nothing to chance.
          One should be able to check-in changes to the SCMs for A, B and C all at once to make a set of self-consistent changes, and then walk away and let Jenkins build A, then B, then C in that order.

          pjdarton added a comment - I can't comment on whether or not this is "working as designed" as I've never seen any design specifications for Jenkins, but if this is the case then it simply means that this is a design fault instead of a coding fault. It's stil a fault, it's just that the "finger of blame" points at the design/algorithm instead of the coding itself. e.g. I've got a build "A" that creates a library, and a build "B" which consumes that library and provides a second library, and a build "C" which consumes both of those. How do I tell Jenkins not to build C until after B has built, and not to build B until after A has finished building, if not through this "block build when upstead is building" option (combined with the "build this one after that one" option)? That is, after all, what that "block build" functionality is intended to give me, is it not? If Jenkins goes running builds on C after A but before B then, when C builds, C will get a mismatched A & B and may get a build failure (followed by a working build once the second build has gone through). I /had/ rather expected that, if I told Jenkins that B depended on A, and C depended on B, and not to build either B or C "whilst an upstream build was building", then that should have stopped C from building until the builds of A and B had all settled down nicely, and this is "mostly" what happens, except for some race conditions. I don't like race conditions causing my builds to go red - I want my builds to only ever go red when there's a problem with my code, not just because Jenkins built something when it shouldn't have done. (I have enough trouble persuading my fellow developers that red builds mean there's a problem with their code, without Jenkins causing spurious redness) I maintain that this is a bug - I very much doubt that this is deliberate behaviour - it makes no sense as sometimes it blocks builds and sometimes it doesn't, and it's all down to timing. A build engine like Jenkins shouldn't leave behaviour down to "timing", it should be very very predictable and leave nothing to chance. One should be able to check-in changes to the SCMs for A, B and C all at once to make a set of self-consistent changes, and then walk away and let Jenkins build A, then B, then C in that order.

          Daniel Beck added a comment -

          pjdarton: Could you please confirm the following: At no time are builds of the projects 'A', 'B', and 'C' executing at the same time. If so, what you're experiencing (execution order different from what you're expecting) is a completely different issue from the one reported here, so please file it separately. Link it to this one so others can find it more easily.

          Is anyone experiencing the issue reported here, i.e. projects with the 'Block when X is building' option set still build at the same time, in recent Jenkins versions?

          Daniel Beck added a comment - pjdarton : Could you please confirm the following: At no time are builds of the projects 'A', 'B', and 'C' executing at the same time . If so, what you're experiencing (execution order different from what you're expecting) is a completely different issue from the one reported here, so please file it separately. Link it to this one so others can find it more easily. Is anyone experiencing the issue reported here, i.e. projects with the 'Block when X is building' option set still build at the same time, in recent Jenkins versions?

          Daniel Beck added a comment -

          Changed issue title to make it clearer what exactly isn't working.

          Daniel Beck added a comment - Changed issue title to make it clearer what exactly isn't working.

          Marko Macek added a comment -

          I just encountered it now in jenkins 1.574... not sure if that's recent (it was on above date). I will upgrade it now to the latest.

          Marko Macek added a comment - I just encountered it now in jenkins 1.574... not sure if that's recent (it was on above date). I will upgrade it now to the latest.

          Three issues have been in this JIra for the longest time, all related, all major or critical, and it's still a problem to this day.  JENKINS-22800 JENKINS-5125 JENKINS-5150 

          This is a pretty commonly used feature in heavy dependency chains, without hacking my own hard to manage "Build X after passing" for every one of my jobs, can we fix this long standing problem?

           

          We are on Jenkins 2.68

          Justin Rodante added a comment - Three issues have been in this JIra for the longest time, all related, all major or critical, and it's still a problem to this day.   JENKINS-22800   JENKINS-5125   JENKINS-5150   This is a pretty commonly used feature in heavy dependency chains, without hacking my own hard to manage "Build X after passing" for every one of my jobs, can we fix this long standing problem?   We are on Jenkins 2.68

            kohsuke Kohsuke Kawaguchi
            balazsdan balazsdan
            Votes:
            15 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: