Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-41569

Pipeline hangs waiting for resume on an agent which never was

XMLWordPrintable

      We have a Pipeline run which has been blocked for a number of days with:

      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
      

      We're using the Azure VM Agents plugin for dynamic provisioning. While I haven't had the time to fully reproduce this, here's what I believe is happening:

      • Pipeline requests node labelled "windows".
      • Azure VM Agents plugin begins dynamically provisioning a VM matching that "windows' label.
      • The Azure cloud allocates and bootstraps a VM, meaning that there is an instance which exists, has an IP address, etc.
      • The Azure VM Agents creates a Node in Jenkins with a generated name ("win2012-b19510") which is in a suspended state
      • Pipeline says "great, I have win2012-b19510, that's where I am going to execute"
      • Azure VM Agents plugin runs its defined "Init Script" to actually bootstrap the Jenkins agent software on the VM
      • The "Init script" fails to complete successfully
      • Time elapses and the Azure VM Agents thread sees the "win2012-b19510" instance as stale, and reaps the VM accordingly.
      • Poor little Pipeline sits forever awaiting a VM which will never come back

      I won't have time to reproduce this today, but will try to at my next available free moment (ha!).

      I'm not sure if it's possible, but if my hypothesis is correct, to only pin a "node() { }" in Pipeline to an agent which has actually come online and was able to perform work.

            zackliu Chenyang Liu
            rtyler R. Tyler Croy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: