-
Bug
-
Resolution: Fixed
-
Minor
We have a Pipeline run which has been blocked for a number of days with:
Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’ Waiting to resume part of Core » jenkins » PR-2726 #2: There are no nodes with the label ‘win2012-b19510’
We're using the Azure VM Agents plugin for dynamic provisioning. While I haven't had the time to fully reproduce this, here's what I believe is happening:
- Pipeline requests node labelled "windows".
- Azure VM Agents plugin begins dynamically provisioning a VM matching that "windows' label.
- The Azure cloud allocates and bootstraps a VM, meaning that there is an instance which exists, has an IP address, etc.
- The Azure VM Agents creates a Node in Jenkins with a generated name ("win2012-b19510") which is in a suspended state
- Pipeline says "great, I have win2012-b19510, that's where I am going to execute"
- Azure VM Agents plugin runs its defined "Init Script" to actually bootstrap the Jenkins agent software on the VM
- The "Init script" fails to complete successfully
- Time elapses and the Azure VM Agents thread sees the "win2012-b19510" instance as stale, and reaps the VM accordingly.
- Poor little Pipeline sits forever awaiting a VM which will never come back
I won't have time to reproduce this today, but will try to at my next available free moment (ha!).
I'm not sure if it's possible, but if my hypothesis is correct, to only pin a "node() { }" in Pipeline to an agent which has actually come online and was able to perform work.
- is related to
-
JENKINS-35905 Add option to Fail the build if node label does not exist or if it cannot be provisioned within a timeout
- Reopened
-
JENKINS-36013 Automatically abort ExecutorPickle rehydration from an ephemeral node
- Closed
- relates to
-
JENKINS-36013 Automatically abort ExecutorPickle rehydration from an ephemeral node
- Closed