Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28604

Parallel step with node blocks for the same agent will create 2nd executor on single-executor slaves

      Create slaves with 1 executor, labels, and "Only build jobs with label restrictions matching this node." Create a build with a dozen long-running parallel steps that contain a node step which match the slave labels. Of about 10% of the builds executed, Workflow will spawn a 2nd executor on one of these slaves, as evidenced by a 2nd workspace folder, despite only having 1 executor available.

          [JENKINS-28604] Parallel step with node blocks for the same agent will create 2nd executor on single-executor slaves

          A C added a comment -

          Workflow 1.7 has made this problem worse. Now whenever a slave fails a long running process, Workflow tries to run all the queued parallel builds at once when the node is freed, and generates a new workspace for each one - so I end up with 6 workspaces on a slave machine that only has 1 executor.

          This is sometimes, but not always correlated with Workflow hanging on a batchfile step that failed.

          From the outside, this partly looks like a race condition in node assignment for parallel steps.

          A C added a comment - Workflow 1.7 has made this problem worse. Now whenever a slave fails a long running process, Workflow tries to run all the queued parallel builds at once when the node is freed, and generates a new workspace for each one - so I end up with 6 workspaces on a slave machine that only has 1 executor. This is sometimes, but not always correlated with Workflow hanging on a batchfile step that failed. From the outside, this partly looks like a race condition in node assignment for parallel steps.

          Jesse Glick added a comment -

          Sounds like a core bug. Any steps to reproduce?

          Jesse Glick added a comment - Sounds like a core bug. Any steps to reproduce?

          A C added a comment - - edited

          I don't have the time right now to write a full test case, but here's a cut-down version of the idiom I am trying to use, hopefully helpful:

          • 1 restricted master executor with no labels, 2 restricted slaves with labels and 1 executor each (restricted = only run jobs with matching labels)

          try {
          parallel {
          node( 'slave' )

          { <batch step that fails> }

          <repeat node( 'slave' ) several times>
          }
          } catch( Exception e ) {
          node( 'slave' )

          { <another batch step that fails> }

          throw e
          }

          A C added a comment - - edited I don't have the time right now to write a full test case, but here's a cut-down version of the idiom I am trying to use, hopefully helpful: 1 restricted master executor with no labels, 2 restricted slaves with labels and 1 executor each (restricted = only run jobs with matching labels) try { parallel { node( 'slave' ) { <batch step that fails> } <repeat node( 'slave' ) several times> } } catch( Exception e ) { node( 'slave' ) { <another batch step that fails> } throw e }

          A C added a comment -

          After browsing the bug DB some more, one other potentially relevant note - these slaves do have a high latency connection to the master, on the order of many seconds to (rarely) a few minutes.

          A C added a comment - After browsing the bug DB some more, one other potentially relevant note - these slaves do have a high latency connection to the master, on the order of many seconds to (rarely) a few minutes.

          Jesse Glick added a comment -

          Might share some underlying cause with JENKINS-28759.

          Jesse Glick added a comment - Might share some underlying cause with JENKINS-28759 .

          Jesse Glick added a comment -

          Not known to be reproducible, and no hypothesized cause.

          Jesse Glick added a comment - Not known to be reproducible, and no hypothesized cause.

            Unassigned Unassigned
            sumdumgai A C
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: