Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-44796

vSphere "leaks" slave VMs

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Fixed
    • vsphere-cloud-plugin
    • Jenkins 2.32.1
      vSphere cloud plugin 2.13
      Jenkins "cloud" configuration set to launch a limited number of vSphere VMs to satisfy build demand.

    Description

      The Jenkins-vSphere plugin and the vSphere hypervisor are getting out of step : I'm seeing VMs in vSphere (started by Jenkins) that Jenkins doesn't know about (either as Jenkins slaves or in the vSphere plugin's internals).

       

      The Jenkins plugin successfully creates slave VMs in vSphere, the Jenkins<->Slave connection establishes, build(s) are run - everything looks good ... until it doesn't.

      Jenkins starts complaining (in the log) that it can't create a VM called "myslave-1" because "myslave-1" already exists (which is true - there is a VM in vSphere with that name even though the plugin has no record of it), except Jenkins doesn't know about any slave myslave-1 (there's no node entry and the plugin doesn't know about it either).

      i.e. we end up in a situation where the vSphere hypervisor has Jenkins slave VMs running which Jenkins is not aware of.

       

      In my case, I've told the plugin to limit the number of slaves for each type of slave to a fixed number (rather than a total number for the cloud as a whole) so the plugin chooses slave names like myslave-1, myslave-2 ... myslave-N, so when the plugin "forgets" about a slave, it then tries to create myslave-1 and vSphere refuses because myslave-1 already exists in vSphere.

      I suspect that if I'd not limited the number of slaves of each kind, and thus the plugin would be using pseudo-random numbering, I'd not see "VM already exists" errors but instead of end up running far more VMs than I'd bargained for.

      Unfortunately I have yet to determine exactly what triggers the "leak" (thus far, I've only gone looking in the logs when we're failing to create VMs, which is a long time after we lost track of the VMs).

       

      What we need to do is EITHER to not "leak" these VMs in the first place (i.e. not forget about a VM until it's really gone) OR to have some form of self-healing mechanism whereby the plugin will find out about slaves which exist that it didn't already know about and then either tell Jenkins about them or kill them off.

      (or, ideally, a combination, whereby it doesn't get out of sync unless users go in and delete slaves, but it'll still cope if users manually start messing around with slave creation/deletion in Jenkins/vSphere)

      Attachments

        Activity

          Code changed in jenkins
          User: Peter Darton
          Path:
          src/main/java/org/jenkinsci/plugins/vSphereCloudLauncher.java
          http://jenkins-ci.org/commit/vsphere-cloud-plugin/421fbe60eea35660c79cd2df32aad0f6e28d2c1a
          Log:
          Merge pull request #75 from pjdarton/launcher_improvement

          JENKINS-44796 vSphereCloudLauncher enhancement

          Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/f87d1dea22dc...421fbe60eea3

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Peter Darton Path: src/main/java/org/jenkinsci/plugins/vSphereCloudLauncher.java http://jenkins-ci.org/commit/vsphere-cloud-plugin/421fbe60eea35660c79cd2df32aad0f6e28d2c1a Log: Merge pull request #75 from pjdarton/launcher_improvement JENKINS-44796 vSphereCloudLauncher enhancement Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/f87d1dea22dc...421fbe60eea3

          Code changed in jenkins
          User: Peter Darton
          Path:
          src/main/java/org/jenkinsci/plugins/vSphereCloud.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningAlgorithm.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningRecord.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningState.java
          src/test/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningAlgorithmTest.java
          src/test/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningStateTest.java
          http://jenkins-ci.org/commit/vsphere-cloud-plugin/f3c7aacec91be824cb7d3fbcea683c9611b584c9
          Log:
          Merge pull request #77 from pjdarton/retry_deleting_unwanted_vm

          JENKINS-44796 Retry deletion of unwanted VMs

          Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/10270a9c45b2...f3c7aacec91b

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Peter Darton Path: src/main/java/org/jenkinsci/plugins/vSphereCloud.java src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningAlgorithm.java src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningRecord.java src/main/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningState.java src/test/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningAlgorithmTest.java src/test/java/org/jenkinsci/plugins/vsphere/tools/CloudProvisioningStateTest.java http://jenkins-ci.org/commit/vsphere-cloud-plugin/f3c7aacec91be824cb7d3fbcea683c9611b584c9 Log: Merge pull request #77 from pjdarton/retry_deleting_unwanted_vm JENKINS-44796 Retry deletion of unwanted VMs Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/10270a9c45b2...f3c7aacec91b
          pjdarton pjdarton added a comment -

          Code changes made.  Should be fixed in the next release.

          pjdarton pjdarton added a comment - Code changes made.  Should be fixed in the next release.

          Code changed in jenkins
          User: Peter Darton
          Path:
          src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java
          src/main/java/org/jenkinsci/plugins/vsphere/tools/VSphere.java
          http://jenkins-ci.org/commit/vsphere-cloud-plugin/894d449ca68ef297c08311a5227d331c490bbfa2
          Log:
          Merge pull request #85 from pjdarton/robustness

          JENKINS-44796 Robustness improvements

          Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/17016511981d...894d449ca68e

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Peter Darton Path: src/main/java/org/jenkinsci/plugins/vSphereCloudSlaveTemplate.java src/main/java/org/jenkinsci/plugins/vsphere/tools/VSphere.java http://jenkins-ci.org/commit/vsphere-cloud-plugin/894d449ca68ef297c08311a5227d331c490bbfa2 Log: Merge pull request #85 from pjdarton/robustness JENKINS-44796 Robustness improvements Compare: https://github.com/jenkinsci/vsphere-cloud-plugin/compare/17016511981d...894d449ca68e
          pjdarton pjdarton added a comment -

          Code changes merged.  Fixed in version 2.16.

          pjdarton pjdarton added a comment - Code changes merged.  Fixed in version 2.16.

          People

            Unassigned Unassigned
            pjdarton pjdarton
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: