Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5150

Race condition for feature "Block build when upstream project is building"

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core
    • None

      We have the use case of executing multiple build jobs in sequential and/or parallel order and want to block downstream jobs. We activated option "Block build when upstream project is building" for all downstream build jobs (mostly freestyle-projects), but sometimes downstream jobs are already executed while some (transitive) upstream project is still in progress of being queued (a downstream job could be triggered by more than one parallel executed upstream job that do not finish at the same time). So there might be some kind of race condition in the queue implementation.

          [JENKINS-5150] Race condition for feature "Block build when upstream project is building"

          To fix this properly without a giant lock, we need to be able to take a snapshot of the queue/executor status and base computation on that snapshot.

          Then there's additional problem is that there's a small time window after a build is considered finished and before the build triggers kick in to schedule downstream jobs (this is needed so that downstream builds always see its upstream as completed.)

          Kohsuke Kawaguchi added a comment - To fix this properly without a giant lock, we need to be able to take a snapshot of the queue/executor status and base computation on that snapshot. Then there's additional problem is that there's a small time window after a build is considered finished and before the build triggers kick in to schedule downstream jobs (this is needed so that downstream builds always see its upstream as completed.)

          Oleg Nenashev added a comment -

          The issue seems to be fixed in newer versions. BTW, a confirmation is required

          Oleg Nenashev added a comment - The issue seems to be fixed in newer versions. BTW, a confirmation is required

          Daniel Beck added a comment -

          Can anyone reproduce this issue on Jenkins versions released this year, or is this report obsolete?

          Daniel Beck added a comment - Can anyone reproduce this issue on Jenkins versions released this year, or is this report obsolete?

          Jason Davis added a comment -

          I just experienced/noticed this issue in 1.570. I have an upstream project that was choked out for an extended period of time while downstream builds that are configured to be blocked by the upstream built anyway. The downstream builds entered the queue after the parent, but they built first. Additional, new downstream builds compounded the problem until checkins in those builds quieted down. In some cases, three builds of children that should have built after the parent built before they should have according to "Block build when upstream project is building" rules.

          Jason Davis added a comment - I just experienced/noticed this issue in 1.570. I have an upstream project that was choked out for an extended period of time while downstream builds that are configured to be blocked by the upstream built anyway. The downstream builds entered the queue after the parent, but they built first. Additional, new downstream builds compounded the problem until checkins in those builds quieted down. In some cases, three builds of children that should have built after the parent built before they should have according to "Block build when upstream project is building" rules.

          We had the same issue while using parallel builds & transitive upstreams. We reduced our build chain so that the downstream that should wait has exactly 3 upstream projects it should wait for.

          • All are triggered at the same time
          • One of those projects is a matrix build job (this is the longest running of those three)
          • We're currently using Jenkins version 1.609.2

          Now what happens is that the downstream waits until the matrix is almost finished. The downstream is started right before it is triggered again by the matrix parent.
          See following example. This snippet is from the matrix job which triggers the downstream:

          ...
          2016-01-24 22:19:35 Started calculate disk usage of build
          2016-01-24 22:19:35 Finished Calculation of disk usage of build in 0 seconds
          2016-01-24 22:20:05 Finished Calculation of disk usage of workspace in  29 second
          2016-01-24 22:20:05 Warning: you have no plugins providing access control for builds, so falling back to legacy behavior of permitting any downstream builds to be triggered
          2016-01-24 22:20:05 Triggering a new build of DOWNSTREAM
          2016-01-24 22:20:05 Finished: SUCCESS
          

          At this time the job called DOWNSTREAM was started once at 2016-01-24 22:19:40 and is triggered then again at 2016-01-24 22:20:05.
          So even without considering transitive dependencies there's some issue with the "Block build when upstream is building"
          Hope that it can be solved soon - or can we expected this already fixed in latest LTS?

          Joerg Schwaerzler added a comment - We had the same issue while using parallel builds & transitive upstreams. We reduced our build chain so that the downstream that should wait has exactly 3 upstream projects it should wait for. All are triggered at the same time One of those projects is a matrix build job (this is the longest running of those three) We're currently using Jenkins version 1.609.2 Now what happens is that the downstream waits until the matrix is almost finished. The downstream is started right before it is triggered again by the matrix parent. See following example. This snippet is from the matrix job which triggers the downstream: ... 2016-01-24 22:19:35 Started calculate disk usage of build 2016-01-24 22:19:35 Finished Calculation of disk usage of build in 0 seconds 2016-01-24 22:20:05 Finished Calculation of disk usage of workspace in 29 second 2016-01-24 22:20:05 Warning: you have no plugins providing access control for builds, so falling back to legacy behavior of permitting any downstream builds to be triggered 2016-01-24 22:20:05 Triggering a new build of DOWNSTREAM 2016-01-24 22:20:05 Finished: SUCCESS At this time the job called DOWNSTREAM was started once at 2016-01-24 22:19:40 and is triggered then again at 2016-01-24 22:20:05. So even without considering transitive dependencies there's some issue with the "Block build when upstream is building" Hope that it can be solved soon - or can we expected this already fixed in latest LTS?

          Dirk Thomas added a comment -

          I think we have the same problem with the current LTS 1.642.1.

          Please see the detailed description of our case: https://github.com/ros-infrastructure/ros_buildfarm/pull/194

          Dirk Thomas added a comment - I think we have the same problem with the current LTS 1.642.1. Please see the detailed description of our case: https://github.com/ros-infrastructure/ros_buildfarm/pull/194

          Alexey Sergin added a comment - - edited

          I've hit the same bug on Jenkins version 1.642.4

          IMHO, the ability to perform concurrent jobs execution with respect to dependency graph is a very basic feature. For example, it was implemented in "make -jN" utility decades ago. It is very frustrating that this feature is broken in jenkins.

          Alexey Sergin added a comment - - edited I've hit the same bug on Jenkins version 1.642.4 IMHO, the ability to perform concurrent jobs execution with respect to dependency graph is a very basic feature. For example, it was implemented in "make -jN" utility decades ago. It is very frustrating that this feature is broken in jenkins.

          Three issues have been in this JIra for the longest time, all related, all major or critical, and it's still a problem to this day.  JENKINS-22800 JENKINS-5125 JENKINS-5150 

          This is a pretty commonly used feature in heavy dependency chains, without hacking my own hard to manage "Build X after passing" for every one of my jobs, can we fix this long standing problem?

           

          We are on Jenkins 2.68

          Justin Rodante added a comment - Three issues have been in this JIra for the longest time, all related, all major or critical, and it's still a problem to this day.   JENKINS-22800   JENKINS-5125   JENKINS-5150   This is a pretty commonly used feature in heavy dependency chains, without hacking my own hard to manage "Build X after passing" for every one of my jobs, can we fix this long standing problem?   We are on Jenkins 2.68

            Unassigned Unassigned
            uroell uroell
            Votes:
            14 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated: