• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • vsphere-cloud-plugin
    • Jenkins 2.32.1
      vSphere cloud plugin 2.13
      Jenkins "cloud" configuration set to launch a limited number of vSphere VMs to satisfy build demand.

      The Jenkins-vSphere plugin and the vSphere hypervisor are getting out of step : I'm seeing VMs in vSphere (started by Jenkins) that Jenkins doesn't know about (either as Jenkins slaves or in the vSphere plugin's internals).

       

      The Jenkins plugin successfully creates slave VMs in vSphere, the Jenkins<->Slave connection establishes, build(s) are run - everything looks good ... until it doesn't.

      Jenkins starts complaining (in the log) that it can't create a VM called "myslave-1" because "myslave-1" already exists (which is true - there is a VM in vSphere with that name even though the plugin has no record of it), except Jenkins doesn't know about any slave myslave-1 (there's no node entry and the plugin doesn't know about it either).

      i.e. we end up in a situation where the vSphere hypervisor has Jenkins slave VMs running which Jenkins is not aware of.

       

      In my case, I've told the plugin to limit the number of slaves for each type of slave to a fixed number (rather than a total number for the cloud as a whole) so the plugin chooses slave names like myslave-1, myslave-2 ... myslave-N, so when the plugin "forgets" about a slave, it then tries to create myslave-1 and vSphere refuses because myslave-1 already exists in vSphere.

      I suspect that if I'd not limited the number of slaves of each kind, and thus the plugin would be using pseudo-random numbering, I'd not see "VM already exists" errors but instead of end up running far more VMs than I'd bargained for.

      Unfortunately I have yet to determine exactly what triggers the "leak" (thus far, I've only gone looking in the logs when we're failing to create VMs, which is a long time after we lost track of the VMs).

       

      What we need to do is EITHER to not "leak" these VMs in the first place (i.e. not forget about a VM until it's really gone) OR to have some form of self-healing mechanism whereby the plugin will find out about slaves which exist that it didn't already know about and then either tell Jenkins about them or kill them off.

      (or, ideally, a combination, whereby it doesn't get out of sync unless users go in and delete slaves, but it'll still cope if users manually start messing around with slave creation/deletion in Jenkins/vSphere)

          [JENKINS-44796] vSphere "leaks" slave VMs

          pjdarton added a comment -

          Example exception from Jenkins failing to provision a slave:-

          provision(slave,1): 0 existing slaves (=0 executors), templates available are [Template[prefix=slave_012, provisioned=[slave_012_1, slave_012_2], planned=[], max=8, fullness=25.000%], Template[prefix=slave_013, provisioned=[], planned=[], max=8, fullness=0.000%]]
          Jun 08, 2017 5:44:48 PM INFO org.jenkinsci.plugins.vSphereCloud provision
          provision(slave,1): Provisioning 1 new =[slave_013_1]
          Jun 08, 2017 5:44:48 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
          Started provisioning slave_013_1 from vSphereCloud with 1 executors. Remaining excess workload: -0.001
          Jun 08, 2017 5:44:49 PM WARNING org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode$1 call
          Failed to provision new slave slave_013_1
          org.jenkinsci.plugins.vsphere.tools.VSphereException: vSphere Error: VM "slave_013_1" already exists
              at org.jenkinsci.plugins.vsphere.tools.VSphere.cloneOrDeployVm(VSphere.java:201)
              at org.jenkinsci.plugins.vSphereCloudSlaveTemplate.provision(vSphereCloudSlaveTemplate.java:374)
              at org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode.provisionNewNode(vSphereCloud.java:442)
              at org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode.access$000(vSphereCloud.java:405)
              at org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode$1.call(vSphereCloud.java:418)
              at org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode$1.call(vSphereCloud.java:415)
              at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
              at java.util.concurrent.FutureTask.run(FutureTask.java:274)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
              at java.lang.Thread.run(Thread.java:863)

          pjdarton added a comment - Example exception from Jenkins failing to provision a slave:- provision(slave,1): 0 existing slaves (=0 executors), templates available are [Template[prefix=slave_012, provisioned= [slave_012_1, slave_012_2] , planned=[], max=8, fullness=25.000%], Template[prefix=slave_013, provisioned=[], planned=[], max=8, fullness=0.000%]] Jun 08, 2017 5:44:48 PM INFO org.jenkinsci.plugins.vSphereCloud provision provision(slave,1): Provisioning 1 new = [slave_013_1] Jun 08, 2017 5:44:48 PM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply Started provisioning slave_013_1 from vSphereCloud with 1 executors. Remaining excess workload: -0.001 Jun 08, 2017 5:44:49 PM WARNING org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode$1 call Failed to provision new slave slave_013_1 org.jenkinsci.plugins.vsphere.tools.VSphereException: vSphere Error: VM "slave_013_1" already exists     at org.jenkinsci.plugins.vsphere.tools.VSphere.cloneOrDeployVm(VSphere.java:201)     at org.jenkinsci.plugins.vSphereCloudSlaveTemplate.provision(vSphereCloudSlaveTemplate.java:374)     at org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode.provisionNewNode(vSphereCloud.java:442)     at org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode.access$000(vSphereCloud.java:405)     at org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode$1.call(vSphereCloud.java:418)     at org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode$1.call(vSphereCloud.java:415)     at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)     at java.util.concurrent.FutureTask.run(FutureTask.java:274)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)     at java.lang.Thread.run(Thread.java:863)

          pjdarton added a comment -

          Example exception from Jenkins failing to get as far as terminating a slave:

          Jun 16, 2017 4:28:33 PM org.jenkinsci.plugins.vSphereCloud InternalLog
          INFO: [slave_012_1] Got an exception
          org.jenkinsci.plugins.vsphere.tools.VSphereException: java.rmi.RemoteException: Exception caught trying to invoke method RetrieveServiceContent; nested exception is:
              java.net.ConnectException: Connection timed out: connect
              at org.jenkinsci.plugins.vsphere.tools.VSphere.<init>(VSphere.java:78)
              at org.jenkinsci.plugins.vsphere.tools.VSphere.connect(VSphere.java:95)
              at org.jenkinsci.plugins.vSphereCloud.vSphereInstance(vSphereCloud.java:287)
              at org.jenkinsci.plugins.vSphereCloudLauncher.afterDisconnect(vSphereCloudLauncher.java:303)
              at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:626)
              at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:483)
              at java.util.concurrent.FutureTask.run(FutureTask.java:274)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
              at java.lang.Thread.run(Thread.java:863)
          Caused by: java.rmi.RemoteException: Exception caught trying to invoke method RetrieveServiceContent; nested exception is:
              java.net.ConnectException: Connection timed out: connect
              at com.vmware.vim25.ws.WSClient.invoke(WSClient.java:106)
              at com.vmware.vim25.ws.VimStub.retrieveServiceContent(VimStub.java:1675)
              at com.vmware.vim25.mo.ServiceInstance.retrieveServiceContent(ServiceInstance.java:246)
              at com.vmware.vim25.mo.ServiceInstance.constructServiceInstance(ServiceInstance.java:126)
              at com.vmware.vim25.mo.ServiceInstance.<init>(ServiceInstance.java:79)
              at com.vmware.vim25.mo.ServiceInstance.<init>(ServiceInstance.java:69)
              at org.jenkinsci.plugins.vsphere.tools.VSphere.<init>(VSphere.java:76)
              ... 10 more
          Caused by: java.net.ConnectException: Connection timed out: connect
              at java.net.DualStackPlainSocketImpl.connect0(Native Method)
              at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:91)
              at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:370)
              at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:231)
              at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:213)
              at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:192)
              at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:404)
              at java.net.Socket.connect(Socket.java:643)
              at com.ibm.jsse2.aq.connect(aq.java:680)
              at com.ibm.jsse2.ap.connect(ap.java:14)
              at sun.net.NetworkClient.doConnect(NetworkClient.java:193)
              at sun.net.www.http.HttpClient.openServer(HttpClient.java:462)
              at sun.net.www.http.HttpClient.openServer(HttpClient.java:557)
              at com.ibm.net.ssl.www2.protocol.https.c.<init>(c.java:26)
              at com.ibm.net.ssl.www2.protocol.https.c.a(c.java:105)
              at com.ibm.net.ssl.www2.protocol.https.d.getNewHttpClient(d.java:49)
              at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:946)
              at com.ibm.net.ssl.www2.protocol.https.d.connect(d.java:32)
              at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1104)
              at com.ibm.net.ssl.www2.protocol.https.b.getOutputStream(b.java:11)
              at com.vmware.vim25.ws.WSClient.post(WSClient.java:168)
              at com.vmware.vim25.ws.WSClient.invoke(WSClient.java:94)
              ... 16 more

          pjdarton added a comment - Example exception from Jenkins failing to get as far as terminating a slave: Jun 16, 2017 4:28:33 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [slave_012_1] Got an exception org.jenkinsci.plugins.vsphere.tools.VSphereException: java.rmi.RemoteException: Exception caught trying to invoke method RetrieveServiceContent; nested exception is:     java.net.ConnectException: Connection timed out: connect     at org.jenkinsci.plugins.vsphere.tools.VSphere.<init>(VSphere.java:78)     at org.jenkinsci.plugins.vsphere.tools.VSphere.connect(VSphere.java:95)     at org.jenkinsci.plugins.vSphereCloud.vSphereInstance(vSphereCloud.java:287)     at org.jenkinsci.plugins.vSphereCloudLauncher.afterDisconnect(vSphereCloudLauncher.java:303)     at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:626)     at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:483)     at java.util.concurrent.FutureTask.run(FutureTask.java:274)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)     at java.lang.Thread.run(Thread.java:863) Caused by: java.rmi.RemoteException: Exception caught trying to invoke method RetrieveServiceContent; nested exception is:     java.net.ConnectException: Connection timed out: connect     at com.vmware.vim25.ws.WSClient.invoke(WSClient.java:106)     at com.vmware.vim25.ws.VimStub.retrieveServiceContent(VimStub.java:1675)     at com.vmware.vim25.mo.ServiceInstance.retrieveServiceContent(ServiceInstance.java:246)     at com.vmware.vim25.mo.ServiceInstance.constructServiceInstance(ServiceInstance.java:126)     at com.vmware.vim25.mo.ServiceInstance.<init>(ServiceInstance.java:79)     at com.vmware.vim25.mo.ServiceInstance.<init>(ServiceInstance.java:69)     at org.jenkinsci.plugins.vsphere.tools.VSphere.<init>(VSphere.java:76)     ... 10 more Caused by: java.net.ConnectException: Connection timed out: connect     at java.net.DualStackPlainSocketImpl.connect0(Native Method)     at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:91)     at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:370)     at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:231)     at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:213)     at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:192)     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:404)     at java.net.Socket.connect(Socket.java:643)     at com.ibm.jsse2.aq.connect(aq.java:680)     at com.ibm.jsse2.ap.connect(ap.java:14)     at sun.net.NetworkClient.doConnect(NetworkClient.java:193)     at sun.net.www.http.HttpClient.openServer(HttpClient.java:462)     at sun.net.www.http.HttpClient.openServer(HttpClient.java:557)     at com.ibm.net.ssl.www2.protocol.https.c.<init>(c.java:26)     at com.ibm.net.ssl.www2.protocol.https.c.a(c.java:105)     at com.ibm.net.ssl.www2.protocol.https.d.getNewHttpClient(d.java:49)     at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:946)     at com.ibm.net.ssl.www2.protocol.https.d.connect(d.java:32)     at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1104)     at com.ibm.net.ssl.www2.protocol.https.b.getOutputStream(b.java:11)     at com.vmware.vim25.ws.WSClient.post(WSClient.java:168)     at com.vmware.vim25.ws.WSClient.invoke(WSClient.java:94)     ... 16 more

          pjdarton added a comment -

          I've come up with a set of changes to address this from multiple angles...

          https://github.com/jenkinsci/vsphere-cloud-plugin/pull/73 ensures that we recognize VMs that we created and then forgot about, which will prevent the "Error: VM ... already exists" issue from being a fatal error.

          https://github.com/jenkinsci/vsphere-cloud-plugin/pull/77 gives much more reliable deletion of VMs, so we don't tend to leave them running anymore.

          https://github.com/jenkinsci/vsphere-cloud-plugin/pull/75 prevents the exception we saw during slave disposal.

          PRs 72, 74, 76 and 78 are improvements that aided the development of the above.

          pjdarton added a comment - I've come up with a set of changes to address this from multiple angles... https://github.com/jenkinsci/vsphere-cloud-plugin/pull/73 ensures that we recognize VMs that we created and then forgot about, which will prevent the "Error: VM ... already exists" issue from being a fatal error. https://github.com/jenkinsci/vsphere-cloud-plugin/pull/77 gives much more reliable deletion of VMs, so we don't tend to leave them running anymore. https://github.com/jenkinsci/vsphere-cloud-plugin/pull/75 prevents the exception we saw during slave disposal. PRs 72, 74, 76 and 78 are improvements that aided the development of the above.

          Code changed in jenkins
          User: Peter Darton
          Path:
          src/main/java/org/jenkinsci/plugins/vSphereCloud.java
          src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/VSphere.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/VSphereDuplicateException.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/VSphereNotFoundException.java
          http://jenkins-ci.org/commit/vsphere-cloud-plugin/f87d1dea22dc23d3c51517f44118bf2fe20e8dd6
          Log:
          Merge pull request #73 from pjdarton/reconnect_to_existing_vm

          JENKINS-44796 Recognize VMs that our Jenkins created and permit reconnection.

          Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/0cb327751111...f87d1dea22dc

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Peter Darton Path: src/main/java/org/jenkinsci/plugins/vSphereCloud.java src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java src/main/java/org/jenkinsci/plugins/vsphere/tools/VSphere.java src/main/java/org/jenkinsci/plugins/vsphere/tools/VSphereDuplicateException.java src/main/java/org/jenkinsci/plugins/vsphere/tools/VSphereNotFoundException.java http://jenkins-ci.org/commit/vsphere-cloud-plugin/f87d1dea22dc23d3c51517f44118bf2fe20e8dd6 Log: Merge pull request #73 from pjdarton/reconnect_to_existing_vm JENKINS-44796 Recognize VMs that our Jenkins created and permit reconnection. Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/0cb327751111...f87d1dea22dc

          Code changed in jenkins
          User: Peter Darton
          Path:
          src/main/java/org/jenkinsci/plugins/vSphereCloudLauncher.java
          http://jenkins-ci.org/commit/vsphere-cloud-plugin/421fbe60eea35660c79cd2df32aad0f6e28d2c1a
          Log:
          Merge pull request #75 from pjdarton/launcher_improvement

          JENKINS-44796 vSphereCloudLauncher enhancement

          Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/f87d1dea22dc...421fbe60eea3

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Peter Darton Path: src/main/java/org/jenkinsci/plugins/vSphereCloudLauncher.java http://jenkins-ci.org/commit/vsphere-cloud-plugin/421fbe60eea35660c79cd2df32aad0f6e28d2c1a Log: Merge pull request #75 from pjdarton/launcher_improvement JENKINS-44796 vSphereCloudLauncher enhancement Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/f87d1dea22dc...421fbe60eea3

          Code changed in jenkins
          User: Peter Darton
          Path:
          src/main/java/org/jenkinsci/plugins/vSphereCloud.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningAlgorithm.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningRecord.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningState.java
          src/test/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningAlgorithmTest.java
          src/test/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningStateTest.java
          http://jenkins-ci.org/commit/vsphere-cloud-plugin/f3c7aacec91be824cb7d3fbcea683c9611b584c9
          Log:
          Merge pull request #77 from pjdarton/retry_deleting_unwanted_vm

          JENKINS-44796 Retry deletion of unwanted VMs

          Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/10270a9c45b2...f3c7aacec91b

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Peter Darton Path: src/main/java/org/jenkinsci/plugins/vSphereCloud.java src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningAlgorithm.java src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningRecord.java src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningState.java src/test/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningAlgorithmTest.java src/test/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningStateTest.java http://jenkins-ci.org/commit/vsphere-cloud-plugin/f3c7aacec91be824cb7d3fbcea683c9611b584c9 Log: Merge pull request #77 from pjdarton/retry_deleting_unwanted_vm JENKINS-44796 Retry deletion of unwanted VMs Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/10270a9c45b2...f3c7aacec91b

          pjdarton added a comment -

          Code changes made.  Should be fixed in the next release.

          pjdarton added a comment - Code changes made.  Should be fixed in the next release.

          Code changed in jenkins
          User: Peter Darton
          Path:
          src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/VSphere.java
          http://jenkins-ci.org/commit/vsphere-cloud-plugin/894d449ca68ef297c08311a5227d331c490bbfa2
          Log:
          Merge pull request #85 from pjdarton/robustness

          JENKINS-44796 Robustness improvements

          Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/17016511981d...894d449ca68e

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Peter Darton Path: src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java src/main/java/org/jenkinsci/plugins/vsphere/tools/VSphere.java http://jenkins-ci.org/commit/vsphere-cloud-plugin/894d449ca68ef297c08311a5227d331c490bbfa2 Log: Merge pull request #85 from pjdarton/robustness JENKINS-44796 Robustness improvements Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/17016511981d...894d449ca68e

          pjdarton added a comment -

          Code changes merged.  Fixed in version 2.16.

          pjdarton added a comment - Code changes merged.  Fixed in version 2.16.

            Unassigned Unassigned
            pjdarton pjdarton
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: