Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42538

Build Flow jobs stuck waiting on next available executor when using label parameter to restrict job to vpshere slave

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None

      Testing Jenkins upgrade from 2.7.2 to 2.32.2 with installed plugins upgraded as well.

      We use a lot of flow jobs and post upgrade they don't work when combined with a label parameter with a value that isn't 'master' i.e. we can't get a flow job to run on a slave using the label parameter.

      1. As per screen shots, create a simple flow job with a single Label parameter
      2. Build the job with the label value of master and it works.
      3. Rebuild the job using a label for a connected slave that has free executors, the job permanently sits in the Build Queue - 'Waiting for next available executor'
      4. Alternatively, change the job configuration to no longer have a parameter but select 'Restrict where this project can be run' and set the slave name. Rebuild the job and same issue - sits in Build Queue.
      5. With the job sat in the Build Queue I can submit a separate 'free style project' also configured with a Label parameter for the same slave name and it completes ok.

      During the upgrade steps neither if the two obvious plugins were upgraded:

      • Node and Label parameter plugin still at 1.7.2
      • Build Flow plugin still at 0.20

      The Node and Label  parameter had dependencies that were upgraded, but reverting both, and using the Jenkins auto restart option from the plugin page, does not solve the issue.

      • parameterized-trigger Upgraded from 2.32 to 2.33
      • token-macro Upgraded from 1.12.1 to 2.0

      I'm about to restart the entire upgrade process again to try to narrow down which change causes this, but any advice/thoughts would be appreciated.

          [JENKINS-42538] Build Flow jobs stuck waiting on next available executor when using label parameter to restrict job to vpshere slave

          Elliott Jones added a comment -

          I thought it could be related to this issue https://issues.jenkins-ci.org/browse/JENKINS-34969 due to similar plugins but nothing on that ticket helps.

          Elliott Jones added a comment - I thought it could be related to this issue https://issues.jenkins-ci.org/browse/JENKINS-34969  due to similar plugins but nothing on that ticket helps.

          Elliott Jones added a comment -

          Tried this with a clean install windows intall of 2.32.2 using just the Build Flow and Node and Label parameter plugins:

          ant 1.4 true false
          antisamy-markup-formatter 1.5 true false
          bouncycastle-api 2.16.0 true false
          build-flow-plugin 0.20 true false
          display-url-api 1.1.1 true false
          external-monitor-job 1.7 true false
          icon-shim 2.0.3 true false
          javadoc 1.4 true false
          jquery 1.11.2-0 true false
          junit 1.20 true false
          ldap 1.14 true false
          mailer 1.19 true false
          matrix-auth 1.4 true false
          matrix-project 1.8 true false
          nodelabelparameter 1.7.2 true false
          pam-auth 1.3 true false
          script-security 1.27 true false
          structs 1.6 true false
          token-macro 2.0 true false
          windows-slaves 1.2 true false

          Issue cannot be reproduced, the flow job completes ok when using a slave label that isn't master. Key difference the slave used in this instance was a 'permanent agent' rather than a vsphere slave in envt where problem shown. Will try adding to the mix.

          Elliott Jones added a comment - Tried this with a clean install windows intall of 2.32.2 using just the Build Flow and Node and Label parameter plugins: ant 1.4 true false antisamy-markup-formatter 1.5 true false bouncycastle-api 2.16.0 true false build-flow-plugin 0.20 true false display-url-api 1.1.1 true false external-monitor-job 1.7 true false icon-shim 2.0.3 true false javadoc 1.4 true false jquery 1.11.2-0 true false junit 1.20 true false ldap 1.14 true false mailer 1.19 true false matrix-auth 1.4 true false matrix-project 1.8 true false nodelabelparameter 1.7.2 true false pam-auth 1.3 true false script-security 1.27 true false structs 1.6 true false token-macro 2.0 true false windows-slaves 1.2 true false Issue cannot be reproduced, the flow job completes ok when using a slave label that isn't master. Key difference the slave used in this instance was a 'permanent agent' rather than a vsphere slave in envt where problem shown. Will try adding to the mix.

          Elliott Jones added a comment - - edited

          That seems to be the key factor, during my upgrade I upgraded vSphere Plugin from 2.4 to 2.15. Downgrading back to 2.4 and the flow job works with the label parameter correctly. Updating ticket details appropriately.

           

          Marking minor as workaround is simple for me, plugin downgrade.

          Elliott Jones added a comment - - edited That seems to be the key factor, during my upgrade I upgraded vSphere Plugin from 2.4 to 2.15. Downgrading back to 2.4 and the flow job works with the label parameter correctly. Updating ticket details appropriately.   Marking minor as workaround is simple for me, plugin downgrade.

          Elliott Jones added a comment -

          I also had this issue https://issues.jenkins-ci.org/browse/JENKINS-41384 and thus had to apply jenkins.slaves.DefaultJnlpSlaveReceiver.disableStrictVerification=true

          Elliott Jones added a comment - I also had this issue https://issues.jenkins-ci.org/browse/JENKINS-41384  and thus had to apply jenkins.slaves.DefaultJnlpSlaveReceiver.disableStrictVerification=true

          pjdarton added a comment -

          elliottjones Can you see if the issue is fixed in version 2.7 but broken in version 2.8?

          FYI a change was made that disallowed "flyweight tasks" from running on vSphere nodes. This was done because vSphere nodes can be turned off, restarted etc when they've finished their main work and, if a "flyweight task" happens to be running on them when that happens, it'll break the "flyweight task" process.

          I'm unfamiliar with flow jobs but I do wonder if these make use of "flyweight tasks" to run and that'd be why they're unable to run on vSphere nodes on newer versions of the plugin.

          If that's the case, it'd be relatively straightforward to make the "don't allow flyweight tasks to run here" behavior a configurable tick-box option...

          pjdarton added a comment - elliottjones Can you see if the issue is fixed in version 2.7 but broken in version 2.8? FYI a change was made that disallowed "flyweight tasks" from running on vSphere nodes. This was done because vSphere nodes can be turned off, restarted etc when they've finished their main work and, if a "flyweight task" happens to be running on them when that happens, it'll break the "flyweight task" process. I'm unfamiliar with flow jobs but I do wonder if these make use of "flyweight tasks" to run and that'd be why they're unable to run on vSphere nodes on newer versions of the plugin. If that's the case, it'd be relatively straightforward to make the "don't allow flyweight tasks to run here" behavior a configurable tick-box option...

            Unassigned Unassigned
            elliottjones Elliott Jones
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: