Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50306

"Still waiting to schedule task" indicates a flaw in the Jenkins pipelining design in my opinion

      In my Jenkins server, I sometimes have a little bit of queueing and you'll see messages like this in a pipeline which has recently been submitted:

      Still waiting to schedule task Waiting for next available executor on ip-172-31-141-11.us-west-2.compute.internal

      I use node labels in order to select which agents may be used. However, the above message suggests to me that the Jenkins pipeline runner makes a selection as to which agent will receive the job at the moment it encounters the node() command in my Jenkinsfile.

      The reason that I believe this logic to be flawed is that the particular node in question (ip-172-31-141-11.us-west-2.compute.internal) might get killed off while the current job is running which suggests that my queued job will be stuck waiting forever because there is almost no chance that AWS will relaunch the same node with the same hostname.

      A better strategy would be one in which I request a node via something like node("mac") and then jenkins tells me that its waiting to schedule an executor on the next node labeled "mac" as opposed to selecting an individual machine which might go away.

          [JENKINS-50306] "Still waiting to schedule task" indicates a flaw in the Jenkins pipelining design in my opinion

          Jon B created issue -
          Jon B made changes -
          Description Original: In my Jenkins server, I sometimes have a little bit of queueing and you'll see messages like this in a pipeline:

          ---

          Still waiting to schedule task Waiting for next available executor on ip-172-31-141-11.us-west-2.compute.internal

          ---

          I use node labels in order to select which agents may be used. However, the above message suggests to me that the Jenkins pipeline runner makes a selection at the moment it encounters the node() command.

          The reason that I believe this logic to be flawed is that the particular node in question (ip-172-31-141-11.us-west-2.compute.internal) might get killed off while the current job is running which suggests that my queued job will be stuck waiting forever because there is almost no chance that AWS will relaunch the same node with the same hostname.

          A better strategy would be on in which I request a node via something like node("mac") and then jenkins tells me that its waiting to schedule an executor on the next node labeled "mac" as opposed to selecting an individual machine which might go away.
          New: In my Jenkins server, I sometimes have a little bit of queueing and you'll see messages like this in a pipeline which has recently been submitted:



          Still waiting to schedule task Waiting for next available executor on ip-172-31-141-11.us-west-2.compute.internal



          I use node labels in order to select which agents may be used. However, the above message suggests to me that the Jenkins pipeline runner makes a selection at the moment it encounters the node() command.

          The reason that I believe this logic to be flawed is that the particular node in question (ip-172-31-141-11.us-west-2.compute.internal) might get killed off while the current job is running which suggests that my queued job will be stuck waiting forever because there is almost no chance that AWS will relaunch the same node with the same hostname.

          A better strategy would be on in which I request a node via something like node("mac") and then jenkins tells me that its waiting to schedule an executor on the next node labeled "mac" as opposed to selecting an individual machine which might go away.
          Jon B made changes -
          Summary Original: "Still waiting to schedule task" indicates a flaw in the design in my opinion New: "Still waiting to schedule task" indicates a flaw in the Jenkins pipelining design in my opinion
          Jon B made changes -
          Description Original: In my Jenkins server, I sometimes have a little bit of queueing and you'll see messages like this in a pipeline which has recently been submitted:



          Still waiting to schedule task Waiting for next available executor on ip-172-31-141-11.us-west-2.compute.internal



          I use node labels in order to select which agents may be used. However, the above message suggests to me that the Jenkins pipeline runner makes a selection at the moment it encounters the node() command.

          The reason that I believe this logic to be flawed is that the particular node in question (ip-172-31-141-11.us-west-2.compute.internal) might get killed off while the current job is running which suggests that my queued job will be stuck waiting forever because there is almost no chance that AWS will relaunch the same node with the same hostname.

          A better strategy would be on in which I request a node via something like node("mac") and then jenkins tells me that its waiting to schedule an executor on the next node labeled "mac" as opposed to selecting an individual machine which might go away.
          New: In my Jenkins server, I sometimes have a little bit of queueing and you'll see messages like this in a pipeline which has recently been submitted:



          Still waiting to schedule task Waiting for next available executor on ip-172-31-141-11.us-west-2.compute.internal



          I use node labels in order to select which agents may be used. However, the above message suggests to me that the Jenkins pipeline runner makes a selection as to which agent will receive the job at the moment it encounters the node() command in my Jenkinsfile.

          The reason that I believe this logic to be flawed is that the particular node in question (ip-172-31-141-11.us-west-2.compute.internal) might get killed off while the current job is running which suggests that my queued job will be stuck waiting forever because there is almost no chance that AWS will relaunch the same node with the same hostname.

          A better strategy would be on in which I request a node via something like node("mac") and then jenkins tells me that its waiting to schedule an executor on the next node labeled "mac" as opposed to selecting an individual machine which might go away.
          Jon B made changes -
          Description Original: In my Jenkins server, I sometimes have a little bit of queueing and you'll see messages like this in a pipeline which has recently been submitted:



          Still waiting to schedule task Waiting for next available executor on ip-172-31-141-11.us-west-2.compute.internal



          I use node labels in order to select which agents may be used. However, the above message suggests to me that the Jenkins pipeline runner makes a selection as to which agent will receive the job at the moment it encounters the node() command in my Jenkinsfile.

          The reason that I believe this logic to be flawed is that the particular node in question (ip-172-31-141-11.us-west-2.compute.internal) might get killed off while the current job is running which suggests that my queued job will be stuck waiting forever because there is almost no chance that AWS will relaunch the same node with the same hostname.

          A better strategy would be on in which I request a node via something like node("mac") and then jenkins tells me that its waiting to schedule an executor on the next node labeled "mac" as opposed to selecting an individual machine which might go away.
          New: In my Jenkins server, I sometimes have a little bit of queueing and you'll see messages like this in a pipeline which has recently been submitted:



          Still waiting to schedule task Waiting for next available executor on ip-172-31-141-11.us-west-2.compute.internal



          I use node labels in order to select which agents may be used. However, the above message suggests to me that the Jenkins pipeline runner makes a selection as to which agent will receive the job at the moment it encounters the node() command in my Jenkinsfile.

          The reason that I believe this logic to be flawed is that the particular node in question (ip-172-31-141-11.us-west-2.compute.internal) might get killed off while the current job is running which suggests that my queued job will be stuck waiting forever because there is almost no chance that AWS will relaunch the same node with the same hostname.

          A better strategy would be one in which I request a node via something like node("mac") and then jenkins tells me that its waiting to schedule an executor on the next node labeled "mac" as opposed to selecting an individual machine which might go away.
          Oleg Nenashev made changes -
          Component/s New: pipeline [ 21692 ]
          Component/s Original: core [ 15593 ]
          Andrew Bayer made changes -
          Component/s New: workflow-durable-task-step-plugin [ 21715 ]
          Component/s Original: pipeline [ 21692 ]

          Andrew Bayer added a comment -

          Yes, when the node step is executed, it goes onto the agent. If you're seeing the build start, get to the node step, and immediately get that message, something's wrong somewhere, but there is a common case for that message, sadly, on master restart and running Pipelines resuming - often, dynamically provisioned agents no longer exist but Jenkins still tries to resume Pipelines onto the agents they were on at the time the master stopped, so things get gummed up. If that's not the situation where you're seeing this, is there any chance you could include your Jenkinsfile, or even better a minimal reproduction case? Oh, and are you using the throttle step by any chance?

          Andrew Bayer added a comment - Yes, when the node step is executed, it goes onto the agent. If you're seeing the build start, get to the node step, and immediately get that message, something's wrong somewhere, but there is a common case for that message, sadly, on master restart and running Pipelines resuming - often, dynamically provisioned agents no longer exist but Jenkins still tries to resume Pipelines onto the agents they were on at the time the master stopped, so things get gummed up. If that's not the situation where you're seeing this, is there any chance you could include your Jenkinsfile, or even better a minimal reproduction case? Oh, and are you using the throttle step by any chance?

          Jon B added a comment - - edited

          Indeed I am using the "Throttle Concurrent Builds Plug-in" because my monolith's main pipeline is very busy and if i don't throttle, the agent capacity would get slathered across too many concurrent job runs making them all take forever.

          My pipeline logic is quite complex but I'm pretty sure the cause must be a flaw in the concurrent builds plugin. If you think it is essential to triage, I will create a watered down pipeline that repros this behavior but I'll hold on that until you confirm its worth my time to produce such a Jenkinsfile.

          Jon B added a comment - - edited Indeed I am using the "Throttle Concurrent Builds Plug-in" because my monolith's main pipeline is very busy and if i don't throttle, the agent capacity would get slathered across too many concurrent job runs making them all take forever. My pipeline logic is quite complex but I'm pretty sure the cause must be a flaw in the concurrent builds plugin. If you think it is essential to triage, I will create a watered down pipeline that repros this behavior but I'll hold on that until you confirm its worth my time to produce such a Jenkinsfile.
          Andrew Bayer made changes -
          Component/s New: throttle-concurrent-builds-plugin [ 15745 ]
          Component/s Original: workflow-durable-task-step-plugin [ 21715 ]

            Unassigned Unassigned
            piratejohnny Jon B
            Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: