-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Jenkins version : 2.19.4
Plugins and dependencies
+ vsphere-cloud : version => '2.15'
+ workflow-api : version => '2.11'
+ workflow-basic-steps' : version => '2.4'
+ workflow_step_api version => '2.9'
+ scm_api version => '2.0.7'
+ ssh_credentials version => '1.12'
+ credentials version => '2.1.5'
+ ssh_slaves version => '1.13'
+ mailer version => '1.16'
+ cloudbees_folder version => '5.13'
+ node_iterator_api version => '1.5'Jenkins version : 2.19.4 Plugins and dependencies + vsphere-cloud : version => '2.15' + workflow-api : version => '2.11' + workflow-basic-steps' : version => '2.4' + workflow_step_api version => '2.9' + scm_api version => '2.0.7' + ssh_credentials version => '1.12' + credentials version => '2.1.5' + ssh_slaves version => '1.13' + mailer version => '1.16' + cloudbees_folder version => '5.13' + node_iterator_api version => '1.5'
I am using the Vsphere Cloud plugin to spin Jenkins agents (slaves) on demand in new virtual machines using linked clones and keep-until-idle retention strategy. In the Jenkins instance there are a mix of "static virtual machines" Jenkins agents and Jenkins agents on demand
The plugin works correctly until a certain time or maybe limited number of agents created. Approximately after +-70 builds or 12 hours the virtual machines are created correctly, connected to Jenkins but they are shutdown before the build is allocated in the agent or virtual machine.
The following is a piece of log of the main Jenkins.log file when this situation occurs, this is happening over and over.
INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Starting Virtual Machine... Feb 21, 2017 3:49:19 PM org.jenkinsci.plugins.vSphereCloud$VSpherePlannedNode$1 call INFO: Provisioned new slave ess-lin_baulim19krk8qcnwn1k5yloml Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] VM already powered on Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Waiting for VMTools Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] VM Tools are running Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Finished wait for VMTools Feb 21, 2017 3:49:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Waiting for 60 seconds before asking hudson.plugins.sshslaves.SSHLauncher@8c91844 to launch slave. Feb 21, 2017 3:49:29 PM hudson.slaves.NodeProvisioner$2 run INFO: ess-lin_baulim19krk8qcnwn1k5yloml provisioning successfully completed. We have now 5 computer(s) [02/21/17 15:49:58] SSH Launch of unknown on X.X.X.X failed in 63,139 ms Feb 21, 2017 3:49:58 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [ess-lin_6c1eramupwzkjyu9n3t4cdl5] Slave online Feb 21, 2017 3:50:05 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor [ess-lin_baulim19krk8qcnwn1k5yloml] Disconnected computer Feb 21, 2017 3:50:15 PM hudson.slaves.CloudRetentionStrategy check INFO: Disconnecting ess-lin_baulim19krk8qcnwn1k5yloml Feb 21, 2017 3:50:15 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Disconnected computer Feb 21, 2017 3:50:15 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Ignoring disconnect attempt because a connect attempt is in progress. Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Asking SSHLauncher to launch slave. Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: [ess-lin_baulim19krk8qcnwn1k5yloml] Ignoring disconnect attempt because a connect attempt is in progress. Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud calculateMaxAdditionalSlavesPermitted INFO: There are 0 VMs in this cloud. The instance cap for the cloud is 300, so we have room for more Feb 21, 2017 3:50:21 PM org.jenkinsci.plugins.vSphereCloud provision INFO: provision(xxx,1): 0 existing slaves (=0 executors), templates available are [Template[prefix=ess-lin, provisioned=[], planned=[], max=2147483647, fullness=0.000%]]
If we take a look to the log of the agent in the Jenkins UI the following can be seen
[ess-lin_8hbmdsbdw82ekvakhf1667bk] Starting Virtual Machine... [ess-lin_8hbmdsbdw82ekvakhf1667bk] VM already powered on [ess-lin_8hbmdsbdw82ekvakhf1667bk] Waiting for VMTools [ess-lin_8hbmdsbdw82ekvakhf1667bk] VM Tools are running [ess-lin_8hbmdsbdw82ekvakhf1667bk] Finished wait for VMTools [ess-lin_8hbmdsbdw82ekvakhf1667bk] Waiting for 60 seconds before asking hudson.plugins.sshslaves.SSHLauncher@c8db95d to launch slave. [ess-lin_8hbmdsbdw82ekvakhf1667bk] Ignoring disconnect attempt because a connect attempt is in progress. [ess-lin_8hbmdsbdw82ekvakhf1667bk] Asking SSHLauncher to launch slave. [02/21/17 17:55:45] [SSH] Opening SSH connection to X.X.X.X:22. [02/21/17 17:55:46] [SSH] Authentication successful. [02/21/17 17:55:46] [SSH] The remote users environment is: BASH=/bin/bash BASHOPTS=cmdhist:complete_fullquote:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath BASH_ALIASES=() BASH_ARGC=() BASH_ARGV=() BASH_CMDS=() BASH_EXECUTION_STRING=set BASH_LINENO=() BASH_SOURCE=() BASH_VERSINFO=([0]="4" [1]="3" [2]="11" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu") BASH_VERSION='4.3.11(1)-release' DIRSTACK=() EUID=1000 GROUPS=() HOME=/home/jenkins HOSTNAME=xxx HOSTTYPE=x86_64 IFS=$' \t\n' LANG=en_US.UTF-8 LD_LIBRARY_PATH=/usr/local/lib LOGNAME=jenkins MACHTYPE=x86_64-pc-linux-gnu MAIL=/var/mail/jenkins MAKEFLAGS='-j 8' OPTERR=1 OPTIND=1 OSTYPE=linux-gnu PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games PIPESTATUS=([0]="0") PPID=1916 PS4='+ ' PWD=/home/jenkins SHELL=/bin/bash SHELLOPTS=braceexpand:hashall:interactive-comments SHLVL=1 SSH_CLIENT='X.X.X.X 36503 22' SSH_CONNECTION='X.X.X.X 36503 X.X.X.X 22' TERM=dumb UID=1000 USER=jenkins XDG_RUNTIME_DIR=/run/user/1000 XDG_SESSION_ID=1 _=MAKEFLAGS [02/21/17 17:55:46] [SSH] Checking java version of java [02/21/17 17:55:46] [SSH] java -version returned 1.7.0_79. [02/21/17 17:55:46] [SSH] Starting sftp client. [02/21/17 17:55:47] [SSH] Copying latest slave.jar... [02/21/17 17:55:48] [SSH] Copied 506,667 bytes. Expanded the channel window size to 4MB [02/21/17 17:55:48] [SSH] Starting slave process: cd "/home/jenkins" && java -jar slave.jar <===[JENKINS REMOTING CAPACITY]===>channel started Slave.jar version: 2.53.3 This is a Unix slave Evacuated stdout Slave successfully connected and online [ess-lin_8hbmdsbdw82ekvakhf1667bk] Slave online HTTP ERROR 404 Problem accessing /computer/ess-lin_8hbmdsbdw82ekvakhf1667bk/logText/progressiveHtml. Reason: Not Found Powered by Jetty://
This situation is fixed when I set back to 0 the instance cap limit (for some reason instead of 0 the Jenkins UI shows a number of 2147483647 instead of 0 and the plugin starts immediately working fine again.
When I set it to 0, the following error appears in the main Jenkins.log file:
Mar 15, 2017 5:14:06 PM hudson.model.Run execute INFO: ess/ess_monitorSlaves #2662 main build action completed: SUCCESS Mar 15, 2017 5:14:09 PM hudson.ivy.IvyBuildTrigger$DescriptorImpl configure INFO: IvyConfigurations: 0 Mar 15, 2017 5:14:09 PM hudson.plugins.jabber.im.transport.JabberPublisherDescriptor configure INFO: No hostname specified. Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"} Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"} Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"} Mar 15, 2017 5:14:09 PM hudson.model.Descriptor$NewInstanceBindInterceptor instantiate WARNING: Descriptor not found. Falling back to default instantiation org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy {"stapler-class":"org.jenkinsci.plugins.vsphere.VSphereCloudRetentionStrategy","idleMinutes":"120"} Mar 15, 2017 5:14:09 PM org.jenkinsci.plugins.vSphereCloud InternalLog INFO: STARTING VSPHERE CLOUD
Is there something I can do to fix this behaviour ?
Thank you in advance