Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-39454

Race condition in WorkUnitContext copying actions vs. Executor picking off a pending item

XMLWordPrintable

      I found that under some timing conditions it could happen that a BuildableItem in the pendings list would be considered a duplication target by scheduleInternal. In other words,

      • makePending is called—an executor is ready
      • scheduleInternal is called on the same Task, and seeing this entry in pendings, considers it a duplicate (ScheduleResult.Existing)
      • onStartExecuting is called

      Normally this would probably not matter. However the WorkUnitContext constructor is called right before makePending, and at this time it makes a (shallow) copy of the actions in the item. If scheduleInternal comes along later and finds some FoldableAction, foldIntoExisting will be called on an item whose action list has already been copied. So when the Executor copies the actions to the Executable, it will be copying the list as it existed prior to the foldIntoExisting call.

      If the foldIntoExisting implementation mutates an Action in place, as for example CauseAction does, this is generally OK because Executor will see the mutated object (let us hope the Action is thread-safe). But if the implementation adds a new action, the update will be silently ignored.

      The symptom that I saw was that sometimes if two threads of an upstream Pipeline build running the build step (or I suppose different builds or even jobs) happened to schedule the same downstream job around the same time, occasionally the resulting downstream Run would get just one BuildTriggerAction when it should have had two, as a result of which the stranded build step would show a status of unsure what happened to downstream build and would just hang even after the downstream build completed.

      The fix seems to simple: do not consider pendings from scheduleInternal. Just create a fresh item if this one is already "out the door". allowNewBuildableTask should anyway ensure that we are not trying to run too many builds in parallel. Note that the time interval between moving into the pending list and actually starting to execute should generally be very short—all Jenkins is doing is starting a new thread via Executor.start—so we are not reducing the system's inherent concurrency much.

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: