Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-61779

Regression: Job stuck in queue waiting forever after upgrade

      I "apt dist-upgrade"-d my build Jenkins server, and was updated from 2.204.4 to 2.222.1, and now I am running into issues.

      The major issue is that my throttled builds do not work. I have a build (see attached creenshot)
      that is configured with throttling:

      <hudson.plugins.throttleconcurrents.ThrottleJobProperty plugin="throttle-concurrents@2.0.2">
      <maxConcurrentPerNode>1</maxConcurrentPerNode>
      <maxConcurrentTotal>1</maxConcurrentTotal>
      <categories class="java.util.concurrent.CopyOnWriteArrayList">
      <string>vm-installer</string>
      <string>kickstart-repo</string>
      </categories>
      <throttleEnabled>true</throttleEnabled>
      <throttleOption>category</throttleOption>
      <limitOneJobWithMatchingParams>true</limitOneJobWithMatchingParams>
      <matrixOptions>
      <throttleMatrixBuilds>false</throttleMatrixBuilds>
      <throttleMatrixConfigurations>true</throttleMatrixConfigurations>
      </matrixOptions>
      <paramsToUseForLimit></paramsToUseForLimit>
      <configVersion>1</configVersion>
      </hudson.plugins.throttleconcurrents.ThrottleJobProperty>

      It is set up to build on a single node:

      <hudson.matrix.LabelAxis>
      <name>label</name>
      <values>
      <string>kickstartbuild</string>
      </values>
      </hudson.matrix.LabelAxis>

      The pop-up says that is waiting for the next available executor, which is "master", which is the only one satisfying this label. It is idle, but Jenkins is still waiting.

      The configuration worked fine before upgrading. Restarting Jenkins does not help.

          [JENKINS-61779] Regression: Job stuck in queue waiting forever after upgrade

          Alex Gray added a comment -

          I have the same issue. I'm on 2.222.1. It ONLY happened when I upgraded to 2.222.1 from 2.204.1
          I have a label_expression on a matrix project set to "master_node".
          When I run it, it says "(pending—Waiting for next available executor on ‘master’)"
          Note that it says "master", not "master_node". I have no idea why it's looking for something called "master", not "master_node".

          I'm attaching a gif of what I see.
          Here are the logs that Daniel Beck requested too:

          Apr 08, 2020 12:16:14 AM FINEST hudson.model.Queue
          Queue.Snapshot{waitingList=[];blockedProjects=[hudson.model.Queue$BlockedItem:hudson.model.FreeStyleProject@502099a9[util-slave-manager]:152871];buildables=[hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@598a8bc[util-check-for-container-spread]:152734, hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@4fe1c89a[util-generate-datadog-monitors]:152745];pendings=[]} → Queue.Snapshot{waitingList=[];blockedProjects=[hudson.model.Queue$BlockedItem:hudson.model.FreeStyleProject@502099a9[util-slave-manager]:152871];buildables=[hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@598a8bc[util-check-for-container-spread]:152734, hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@4fe1c89a[util-generate-datadog-monitors]:152745];pendings=[]}; leftItems={152845=hudson.model.Queue$LeftItem:hudson.matrix.MatrixProject@33385024[util-check-for-low-ips]:152845, 152866=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@2e88aabc[util-check-hung-ecs-tasks-prod-ca]:152866, 152865=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@1c17e4db[util-check-hung-ecs-tasks-prod]:152865, 152819=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@502099a9[util-slave-manager]:152819, 152868=hudson.model.Queue$LeftItem:ExecutorStepExecution.PlaceholderTask{runId=util-alert-jenkins-vs-jumpcloud-users#4201,label=,context=CpsStepContext[4:node]:Owner[util-alert-jenkins-vs-jumpcloud-users/4201:util-alert-jenkins-vs-jumpcloud-users #4201],cookie=f8461e2f-66d2-48fb-9acc-79bb1b98a0ad,auth=null}:152868, 152867=hudson.model.Queue$LeftItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@13b79664[util-alert-jenkins-vs-jumpcloud-users]:152867, 152870=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@188db571[util-check-hung-ecs-tasks-stage]:152870, 152872=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@1ae51106[databases-without-termination-protection]:152872, 152869=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@60957b04[util-check-asg-thrashing]:152869}
          

          Alex Gray added a comment - I have the same issue. I'm on 2.222.1. It ONLY happened when I upgraded to 2.222.1 from 2.204.1 I have a label_expression on a matrix project set to "master_node". When I run it, it says "(pending—Waiting for next available executor on ‘master’)" Note that it says "master", not "master_node". I have no idea why it's looking for something called "master", not "master_node". I'm attaching a gif of what I see. Here are the logs that Daniel Beck requested too: Apr 08, 2020 12:16:14 AM FINEST hudson.model.Queue Queue.Snapshot{waitingList=[];blockedProjects=[hudson.model.Queue$BlockedItem:hudson.model.FreeStyleProject@502099a9[util-slave-manager]:152871];buildables=[hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@598a8bc[util-check-for-container-spread]:152734, hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@4fe1c89a[util-generate-datadog-monitors]:152745];pendings=[]} → Queue.Snapshot{waitingList=[];blockedProjects=[hudson.model.Queue$BlockedItem:hudson.model.FreeStyleProject@502099a9[util-slave-manager]:152871];buildables=[hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@598a8bc[util-check-for-container-spread]:152734, hudson.model.Queue$BuildableItem:hudson.matrix.MatrixProject@4fe1c89a[util-generate-datadog-monitors]:152745];pendings=[]}; leftItems={152845=hudson.model.Queue$LeftItem:hudson.matrix.MatrixProject@33385024[util-check-for-low-ips]:152845, 152866=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@2e88aabc[util-check-hung-ecs-tasks-prod-ca]:152866, 152865=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@1c17e4db[util-check-hung-ecs-tasks-prod]:152865, 152819=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@502099a9[util-slave-manager]:152819, 152868=hudson.model.Queue$LeftItem:ExecutorStepExecution.PlaceholderTask{runId=util-alert-jenkins-vs-jumpcloud-users#4201,label=,context=CpsStepContext[4:node]:Owner[util-alert-jenkins-vs-jumpcloud-users/4201:util-alert-jenkins-vs-jumpcloud-users #4201],cookie=f8461e2f-66d2-48fb-9acc-79bb1b98a0ad,auth=null}:152868, 152867=hudson.model.Queue$LeftItem:org.jenkinsci.plugins.workflow.job.WorkflowJob@13b79664[util-alert-jenkins-vs-jumpcloud-users]:152867, 152870=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@188db571[util-check-hung-ecs-tasks-stage]:152870, 152872=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@1ae51106[databases-without-termination-protection]:152872, 152869=hudson.model.Queue$LeftItem:hudson.model.FreeStyleProject@60957b04[util-check-asg-thrashing]:152869}

          Alex Gray added a comment -

          One more piece of information:

          If you configure your job, and select "Restrict where this project can be run" and enter "master_node" as the node label, you'll get "Waiting for next available executor on 'master'".  NOT "master_node".  It's like it's tripping up on the underscore.  If I change that to "masternode", then it correctly says "Waiting for next available executor on 'masternodde'" (but it still doesn't work, even if I have a node with that label.)

          Hope this bit of information helps!

          Alex Gray added a comment - One more piece of information: If you configure your job, and select "Restrict where this project can be run" and enter "master_node" as the node label, you'll get "Waiting for next available executor on 'master'".  NOT "master_node".  It's like it's tripping up on the underscore.  If I change that to "masternode", then it correctly says "Waiting for next available executor on 'masternodde'" (but it still doesn't work, even if I have a node with that label.) Hope this bit of information helps!

          Alain Campeau added a comment -

          Ran into the same problem. Everything is fine using version 2.204.6 but the problem shows up with version 2.205. So as previously suggested, https://github.com/jenkinsci/jenkins/pull/3983 seems the likely cause.

          After some investigation it seems that the cause of the problem is the use of a label used to refer to the master node in jobs (for node restriction purposes). For instance, we have the "dispatch" label configured for the "master" node and use it in multi-configuration jobs which are the ones that get stuck.

          If I modify a stuck job to use "master" instead of the "DISPATCH" label, the job gets triggered as before. Same thing if I just configure the job to no longer have node restrictions.

          Alain Campeau added a comment - Ran into the same problem. Everything is fine using version 2.204.6 but the problem shows up with version 2.205. So as previously suggested,  https://github.com/jenkinsci/jenkins/pull/3983  seems the likely cause. After some investigation it seems that the cause of the problem is the use of a label used to refer to the master node in jobs (for node restriction purposes). For instance, we have the "dispatch" label configured for the "master" node and use it in multi-configuration jobs which are the ones that get stuck. If I modify a stuck job to use "master" instead of the "DISPATCH" label, the job gets triggered as before. Same thing if I just configure the job to no longer have node restrictions.

          Daniel Beck added a comment -

          I tried to reproduce this issue, but failed to do so.

          If any of you could try to figure out instructions how to reproduce this problem from scratch, please provide detailed and complete instructions how to do that.

          Daniel Beck added a comment - I tried to reproduce this issue, but failed to do so. If any of you could try to figure out instructions how to reproduce this problem from scratch, please provide detailed and complete instructions how to do that.

          Alain Campeau added a comment -

          I've managed to reproduce from scratch on a newly set up Jenkins server (Windows) configured with a single node/agent (Windows). This way I've stripped off a lot of things from our production setup as to eliminate as many possible causes for this.

          Here are the minimum steps I needed to repro:

          • Install latest Jenkins LTS release (2.222.3) on a Windows machine with the default set of plugins
          • Configure the master node with a label such as DISPATCH
          • Add a new node and configure it as a new agent. Due to limited resources I configured the new node/agent to run on the same Windows machine. Configure it with its own label such as WINDOWS and make sure "Usage" configuration is set to "Only build jobs with label expressions matching the node"
          •  Created a new Multi-configuration project job:
            • Configure this job's "Restrict where this project can run" setting so its "Label Expression" value is the one specified for master, so DISPATCH
            • Configure this job's "Configuration Matrix" by adding:
              • a "User-defined matrix" axis with a "Name" of "TARGET" and "Values" of "XboxOne PS4 Switch" (any strings to mimic building for various platforms)
              • a "Slaves" axis with a "Name" of "TARGET_POOL" and make sure to check the "WINDOWS" checkbox - Configure this job's "Build" section by adding a dummy "Execute Windows batch command" whose content is simply an "@echo Hello world!"

          If I launch this job using Jenkins 2.222.3, 2.205 or anything in between, the job is stuck waiting for an executor on master when there are 2 available and the agent using the WINDOWS label is free with at least a single executor.

          If I launch this job using Jenkins 2.204.6 or earlier, the job successfully launches and sequentially runs all three XboxOne, PS4 and Switch configurations on the sole agent using the WINDOWS label while the job itself "runs" on the master node (even though all it does is dispatch really).

          On our production server we use the "Dynamic Axis" plugin to dynamically build an axis of all platforms to build and have multiple Windows, Linux and Mac build machines using an OS-specific labels. But for the sake of keeping these repro steps simple, I've dropped all but one OS and removed the "Dynamic Axis" plugin usage. It doesn't logically make much sense but shows the behavior difference starting with the Jenkins 2.205 release.

          Alain Campeau added a comment - I've managed to reproduce from scratch on a newly set up Jenkins server (Windows) configured with a single node/agent (Windows). This way I've stripped off a lot of things from our production setup as to eliminate as many possible causes for this. Here are the minimum steps I needed to repro: Install latest Jenkins LTS release (2.222.3) on a Windows machine with the default set of plugins Configure the master node with a label such as DISPATCH Add a new node and configure it as a new agent. Due to limited resources I configured the new node/agent to run on the same Windows machine. Configure it with its own label such as WINDOWS and make sure "Usage" configuration is set to "Only build jobs with label expressions matching the node"  Created a new Multi-configuration project job: Configure this job's "Restrict where this project can run" setting so its "Label Expression" value is the one specified for master, so DISPATCH Configure this job's "Configuration Matrix" by adding: a "User-defined matrix" axis with a "Name" of "TARGET" and "Values" of "XboxOne PS4 Switch" (any strings to mimic building for various platforms) a "Slaves" axis with a "Name" of "TARGET_POOL" and make sure to check the "WINDOWS" checkbox - Configure this job's "Build" section by adding a dummy "Execute Windows batch command" whose content is simply an "@echo Hello world!" If I launch this job using Jenkins 2.222.3, 2.205 or anything in between, the job is stuck waiting for an executor on master when there are 2 available and the agent using the WINDOWS label is free with at least a single executor. If I launch this job using Jenkins 2.204.6 or earlier, the job successfully launches and sequentially runs all three XboxOne, PS4 and Switch configurations on the sole agent using the WINDOWS label while the job itself "runs" on the master node (even though all it does is dispatch really). On our production server we use the "Dynamic Axis" plugin to dynamically build an axis of all platforms to build and have multiple Windows, Linux and Mac build machines using an OS-specific labels. But for the sake of keeping these repro steps simple, I've dropped all but one OS and removed the "Dynamic Axis" plugin usage. It doesn't logically make much sense but shows the behavior difference starting with the Jenkins 2.205 release.

          Daniel Beck added a comment -

          acampeau Thanks for these steps, I'll try to reproduce them when I have some time.

          About

          • Configure this job's "Restrict where this project can run" setting so its "Label Expression" value is the one specified for master, so DISPATCH

          What happens when you don't check that box, or specify "master" here? Would that be a viable workaround for this problem, and if not, why not?

          Daniel Beck added a comment - acampeau Thanks for these steps, I'll try to reproduce them when I have some time. About Configure this job's "Restrict where this project can run" setting so its "Label Expression" value is the one specified for master, so DISPATCH What happens when you don't check that box, or specify "master" here? Would that be a viable workaround for this problem, and if not, why not?

          Chris McAfee added a comment - - edited

          We are seeing this problem on LTS 2.222.4

          oleg_nenashev, can you take a look at this one?

          Chris McAfee added a comment - - edited We are seeing this problem on LTS 2.222.4 oleg_nenashev , can you take a look at this one?

          Any news on this?

          I have pinned Jenkins to 2.204.6 on our sever to be able to use it, but it is increasingly complaining about unfixed vulnerabilities.

          Peter Krefting added a comment - Any news on this? I have pinned Jenkins to 2.204.6 on our sever to be able to use it, but it is increasingly complaining about unfixed vulnerabilities.

          The problem persists in the current version (2.289.3), when I try to start a tied job manually after upgrading to this version, it blocks in "Waiting for next available executor on 'master'" despite no other jobs being active across the system.

          Peter Krefting added a comment - The problem persists in the current version (2.289.3), when I try to start a tied job manually after upgrading to this version, it blocks in "Waiting for next available executor on 'master'" despite no other jobs being active across the system.

          What's the status of this issue?

          We tried to use the Job-Restriction plugin to allow only a certain job on master. But that also blocks the execution of other jobs even though they are actually not executed on the master but (always) on slaves.
          For these "slave jobs" it says in the build queue: "(pending—Waiting for next available executor on ‘master’)".
          So it looks to be caused by this issue.

          Valentin Maechler added a comment - What's the status of this issue? We tried to use the Job-Restriction plugin to allow only a certain job on master. But that also blocks the execution of other jobs even though they are actually not executed on the master but (always) on slaves. For these "slave jobs" it says in the build queue: "(pending—Waiting for next available executor on ‘master’)". So it looks to be caused by this issue.

            Unassigned Unassigned
            nafmo Peter Krefting
            Votes:
            3 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: