-
Improvement
-
Resolution: Unresolved
-
Minor
-
Windows Server 2008 R2
Java 1.8.0_66
Jenkins 1.644
vsphere-cloud 2.15
When a vSphere host recently entered maintenance mode for several hours, I noticed that the Tasks&Events (in vSphere Client software) started to periodically receive connection attempt failures, apparently caused by Jenkins which seemed unaware of the maintenance operation.
Although non-severe, observed negative effects were:
- Eventually, the "flooding" rendered all tasks logged in vShered before the maintenance operation useless
- The task failures cause some overhead in Jenkins (logging exceptions) and might even cause overhead in vSphere, too (if there are several hosts and each has a few nodes and Jenkins is attempting to start them...)
Goal would be to improve the behavior in such scenarios. A couple ideas follow:
- Detect when the host for the desired node is in maintenance mode - possibly by using vSphere APIs - and avoid performing the current task (Power On, etc.); still, log a message in the Jenkins node so that the situation is flagged
- Introduce a new configuration for specifying which should be the pooling interval when a maintenance is underway; that might also be in the form of a factor (1x to have the current behavior, 2x to take twice the time between attempts, etc.)
In terms of potentially related settings, the node has the following set:
- Force VM Launch: Launches the virtual machine when necessary.
- Wait for VMTools: [x]
Delay between launch and boot complete: 120 - Availability: Take this slave online when on-demand and off-line when idle
Excerpt of the node log while the host was in maintenance:
[MyVirtualMachineName] Starting Virtual Machine... [MyVirtualMachineName] Powering on VM [MyVirtualMachineName] EXCEPTION while starting VM org.jenkinsci.plugins.vsphere.tools.VSphereException: vSphere Error: VM cannot be started org.jenkinsci.plugins.vsphere.tools.VSphereException: vSphere Error: VM cannot be started at org.jenkinsci.plugins.vsphere.tools.VSphere.startVm(VSphere.java:383) at org.jenkinsci.plugins.vSphereCloudLauncher.launch(vSphereCloudLauncher.java:202) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:253) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins java.lang.RuntimeException: org.jenkinsci.plugins.vsphere.tools.VSphereException: vSphere Error: VM cannot be started at org.jenkinsci.plugins.vSphereCloudLauncher.launch(vSphereCloudLauncher.java:253) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:253) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.jenkinsci.plugins.vsphere.tools.VSphereException: vSphere Error: VM cannot be started at org.jenkinsci.plugins.vsphere.tools.VSphere.startVm(VSphere.java:383) at org.jenkinsci.plugins.vSphereCloudLauncher.launch(vSphereCloudLauncher.java:202) ... 6 more