Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42811

Vsphere Cloud plugin stops working after a number of builds

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • vsphere-cloud-plugin
    • None

      I am using the Vsphere Cloud plugin to spin Jenkins agents (slaves) on demand in new virtual machines using linked clones and keep-until-idle retention strategy. In the Jenkins instance there are a mix of "static virtual machines" Jenkins agents and Jenkins agents on demand

      The plugin works correctly until a certain time or maybe limited number of agents created. Approximately after +-70 builds or 12 hours the virtual machines are created correctly, connected to Jenkins but they are shutdown before the build is allocated in the agent or virtual machine.

      The following is a piece of log of the main Jenkins.log file when this situation occurs, this is happening over and over.
       

      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Starting Virtual Machine...
      Feb 21, 2017 3:49:19 PM org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode$1 call
      INFO: Provisioned new slave ess-lin_baulim19krk8qcnwn1k5yloml
      Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] VM already powered on
      Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Waiting for VMTools
      Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] VM Tools are running
      Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Finished wait for VMTools
      Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Waiting for 60 seconds before asking hudson.plugins.sshslaves.SSHLauncher@8c91844 to launch slave.
      Feb 21, 2017 3:49:29 PM hudson.slaves.NodeProvisioner$2 run
      INFO: ess-lin_baulim19krk8qcnwn1k5yloml provisioning successfully completed. We have now 5 computer(s)
      [02/21/17 15:49:58] SSH Launch of unknown on X.X.X.X failed in 63,139 ms
      Feb 21, 2017 3:49:58 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_6c1eramupwzkjyu9n3t4cdl5] Slave online
      Feb 21, 2017 3:50:05 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
      [ess-lin_baulim19krk8qcnwn1k5yloml] Disconnected computer
      Feb 21, 2017 3:50:15 PM hudson.slaves.CloudRetentionStrategy check
      INFO: Disconnecting ess-lin_baulim19krk8qcnwn1k5yloml
      Feb 21, 2017 3:50:15 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Disconnected computer
      Feb 21, 2017 3:50:15 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Ignoring disconnect attempt because a connect attempt is in progress.
      Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Asking SSHLauncher to launch slave.
      Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Ignoring disconnect attempt because a connect attempt is in progress.
      Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud calculateMaxAdditionalSlavesPermitted
      INFO: There are 0 VMs in this cloud. The instance cap for the cloud is 300, so we have room for more
      Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud provision
      INFO: provision(xxx,1): 0 existing slaves (=0 executors), templates available are [Template[prefix=ess-lin, provisioned=[], planned=[], max=2147483647, fullness=0.000%]]

      If we take a look to the log of the agent in the Jenkins UI the following can be seen

      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Starting Virtual Machine...
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] VM already powered on
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Waiting for VMTools
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] VM Tools are running
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Finished wait for VMTools
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Waiting for 60 seconds before asking hudson.plugins.sshslaves.SSHLauncher@c8db95d to launch slave.
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Ignoring disconnect attempt because a connect attempt is in progress.
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Asking SSHLauncher to launch slave.
      [02/21/17 17:55:45] [SSH] Opening SSH connection to X.X.X.X:22.
      [02/21/17 17:55:46] [SSH] Authentication successful.
      [02/21/17 17:55:46] [SSH] The remote users environment is:
      BASH=/bin/bash
      BASHOPTS=cmdhist:complete_fullquote:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="3" [2]="11" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
      BASH_VERSION='4.3.11(1)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/home/jenkins
      HOSTNAME=xxx
      HOSTTYPE=x86_64
      IFS=$' \t\n'
      LANG=en_US.UTF-8
      LD_LIBRARY_PATH=/usr/local/lib
      LOGNAME=jenkins
      MACHTYPE=x86_64-pc-linux-gnu
      MAIL=/var/mail/jenkins
      MAKEFLAGS='-j 8'
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
      PIPESTATUS=([0]="0")
      PPID=1916
      PS4='+ '
      PWD=/home/jenkins
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='X.X.X.X 36503 22'
      SSH_CONNECTION='X.X.X.X 36503 X.X.X.X 22'
      TERM=dumb
      UID=1000
      USER=jenkins
      XDG_RUNTIME_DIR=/run/user/1000
      XDG_SESSION_ID=1
      _=MAKEFLAGS
      [02/21/17 17:55:46] [SSH] Checking java version of java
      [02/21/17 17:55:46] [SSH] java -version returned 1.7.0_79.
      [02/21/17 17:55:46] [SSH] Starting sftp client.
      [02/21/17 17:55:47] [SSH] Copying latest slave.jar...
      [02/21/17 17:55:48] [SSH] Copied 506,667 bytes.
      Expanded the channel window size to 4MB
      [02/21/17 17:55:48] [SSH] Starting slave process: cd "/home/jenkins" && java  -jar slave.jar
      <===[JENKINS REMOTING CAPACITY]===>channel started
      Slave.jar version: 2.53.3
      This is a Unix slave
      Evacuated stdout
      Slave successfully connected and online
      [ess-lin_8hbmdsbdw82ekvakhf1667bk] Slave online
      
      
      
      
      
      HTTP ERROR 404
      
      
      Problem accessing /computer/ess-lin_8hbmdsbdw82ekvakhf1667bk/logText/progressiveHtml. Reason:
      
          Not Found
      Powered by Jetty://
      

      This situation is fixed when I set back to 0 the instance cap limit (for some reason instead of 0 the Jenkins UI shows a number of 2147483647 instead of 0 and the plugin starts immediately working fine again.

      When I set it to 0, the following error appears in the main Jenkins.log file:
       

      Mar 15, 2017 5:14:06 PM hudson.model.Run execute
      INFO: ess/ess_monitorSlaves #2662 main build action completed: SUCCESS
      Mar 15, 2017 5:14:09 PM hudson.ivy.IvyBuildTrigger$DescriptorImpl configure
      INFO: IvyConfigurations: 0
      Mar 15, 2017 5:14:09 PM hudson.plugins.jabber.im.transport.JabberPublisherDescriptor configure
      INFO: No hostname specified.
      Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
      WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
      Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
      WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
      Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
      WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
      Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate
      WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"}
      Mar 15, 2017 5:14:09 PM org.jenkinsci.plugins.vSphereCloud InternalLog
      INFO: STARTING VSPHERE CLOUD
      

      Is there something I can do to fix this behaviour ?

      Thank you in advance

            Unassigned Unassigned
            jorgepena jpena
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: