Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-13837

vSphere Plugin not powering up virtual machines

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Critical Critical
    • Virtualization: VMware vSphere ESXI 5.0.0
      Jenkins Server: Ubuntu 11.04, Jenkins-CI 1.472, vSphere Cloud Plugin 0.10
      2X Ubuntu slave: 11.04
      2X Windows slaves: Win 7

      Steps:
      1. Upgrade vSphere Cloud Plugin to 0.9
      2. Try to start one of the Virtual slaves using the plugin
      3. Watch the vSphere client console
      4. Watch the Slave log

      Expected:
      3. The VM should power up
      4. The log should show some out.

      Actual:

      3. The VM not starting
      4. Nothing in the log except of the spinning circle.

          [JENKINS-13837] vSphere Plugin not powering up virtual machines

          I have upgraded Jenkins to 1.471 and still same problem.

          Moshe Belostotsky added a comment - I have upgraded Jenkins to 1.471 and still same problem.

          Jason Swager added a comment -

          Did you also upgrade the vSphere plugin to v.10? There was code in v0.9 that attempted to work around the Jenkins problems, but ultimately, just made a worse mess of things.

          Jason Swager added a comment - Did you also upgrade the vSphere plugin to v.10? There was code in v0.9 that attempted to work around the Jenkins problems, but ultimately, just made a worse mess of things.

          Yes, IM on v0.10

          Moshe Belostotsky added a comment - Yes, IM on v0.10

          Jason Swager added a comment -

          That takes care of the easy troubleshooting - now for the hard stuff.

          Are you trying to start the slave via the slave page (Launch Slave Agent), or are you trying to start the slave by having a job that requires the slave?

          Is the slave setup to talk to a VM and optional snapshot properly? In other words, does the "Test VM Connection" pass? Also, if using a snapshot, do you have any other slaves using the same VM with different snapshots?

          Do you have "Force VM Launch" enabled? And is "Delay between launch and boot complete" set to a reasonable value?

          How is the slave agent set to connect - what is the value(s) fo the "Secondary Launch Method"? What about the "Availability" settings?

          Do you see any actions from Jenkins in the vCenter logs?

          Jason Swager added a comment - That takes care of the easy troubleshooting - now for the hard stuff. Are you trying to start the slave via the slave page (Launch Slave Agent), or are you trying to start the slave by having a job that requires the slave? Is the slave setup to talk to a VM and optional snapshot properly? In other words, does the "Test VM Connection" pass? Also, if using a snapshot, do you have any other slaves using the same VM with different snapshots? Do you have "Force VM Launch" enabled? And is "Delay between launch and boot complete" set to a reasonable value? How is the slave agent set to connect - what is the value(s) fo the "Secondary Launch Method"? What about the "Availability" settings? Do you see any actions from Jenkins in the vCenter logs?

          1. I have tried both methods.
          2. Test VM connection works ok, I am not using snapshots.
          3. "Force VM Launch" is enabled.
          Delay set to 60.
          4. Availability set to "Take this slave on-line when in demand..."
          The secondary launch method is Java Web Start.
          5. Yes, I saw some connect and disconnect of user.

          Something strange that it sometimes do start machines, But still i cannot c anything related to the vSphere plugin in the slave logs.

          Moshe Belostotsky added a comment - 1. I have tried both methods. 2. Test VM connection works ok, I am not using snapshots. 3. "Force VM Launch" is enabled. Delay set to 60. 4. Availability set to "Take this slave on-line when in demand..." The secondary launch method is Java Web Start. 5. Yes, I saw some connect and disconnect of user. Something strange that it sometimes do start machines, But still i cannot c anything related to the vSphere plugin in the slave logs.

          Jason Swager added a comment -

          Try increasing the delay. When using Java Web Start, the plugin is expecting that the VM will initiate it's own connection to Jenkins during the delay. The delay time starts at either 1) the presence of VMTools, if that option is selected, or 2) when the machine is powered on.

          As for the logs, I'll have to investigate that further. There SHOULD be at least a few vSphere log lines that appear during the startup. For all start methods except Java Web Start, they get wiped out - each of the other start mechanisms starts out by wiping the log lines. But there should have been a few lines present for a bit.

          Jason Swager added a comment - Try increasing the delay. When using Java Web Start, the plugin is expecting that the VM will initiate it's own connection to Jenkins during the delay. The delay time starts at either 1) the presence of VMTools, if that option is selected, or 2) when the machine is powered on. As for the logs, I'll have to investigate that further. There SHOULD be at least a few vSphere log lines that appear during the startup. For all start methods except Java Web Start, they get wiped out - each of the other start mechanisms starts out by wiping the log lines. But there should have been a few lines present for a bit.

          Srinath C added a comment - - edited

          Logs collected while launching the vSphere slaves (slave-logs.zip)

          Srinath C added a comment - - edited Logs collected while launching the vSphere slaves (slave-logs.zip)

          Srinath C added a comment - - edited

          I'm currently facing the same issue.

          My setup:
          Virtualization: VMware vSphere ESXI 5.0.0
          Jenkins Server: Centos 6, Jenkins-CI 1.470, vSphere Cloud Plugin 0.10
          VMs: Centos 6.2, snapshot set to "jenkins_setup"

          Have attached the logs for the slave (slave-logs.zip).
          Note that I can see the vm being reverted to the configured snapshot but beyond that the slave does not proceed.

          Srinath C added a comment - - edited I'm currently facing the same issue. My setup: Virtualization: VMware vSphere ESXI 5.0.0 Jenkins Server: Centos 6, Jenkins-CI 1.470, vSphere Cloud Plugin 0.10 VMs: Centos 6.2, snapshot set to "jenkins_setup" Have attached the logs for the slave (slave-logs.zip). Note that I can see the vm being reverted to the configured snapshot but beyond that the slave does not proceed.

          Srinath C added a comment -

          I got the issue resolved. VMs are successfully getting launched on the vSphere server now.
          It was not even related to jenkins or the vSphere Cloud plugin.
          It was the Leap Second bug on the jenkins server that was causing the problem.

          I took a jstack of the jenkins process while the slave was stuck at launching and found that the thread was hung at :
          "pool-6-thread-8" daemon prio=10 tid=0x00007f75d8252800 nid=0x7292 sleeping[0x00007f768651c000]
          java.lang.Thread.State: TIMED_WAITING (sleeping)
          at java.lang.Thread.sleep(Native Method)
          at com.vmware.vim25.mo.Task.waitForTask(Task.java:229)
          at com.vmware.vim25.mo.Task.waitForTask(Task.java:152)
          at org.jenkinsci.plugins.vSphereCloudLauncher.launch(vSphereCloudLauncher.java:169)
          at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:200)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
          at java.util.concurrent.FutureTask.run(FutureTask.java:166)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
          at java.lang.Thread.run(Thread.java:679)

          After spending a lot of time peeking into the source code I realized that even a simple program to sleep for 1 second using Thread.currentThread().sleep(1000) was also getting stuck indefinitely.

          Fortunately, I found the reason at http://stackoverflow.com/questions/11294573/thread-sleep-never-returns and a simple reboot of the server solved the problem.

          Srinath C added a comment - I got the issue resolved. VMs are successfully getting launched on the vSphere server now. It was not even related to jenkins or the vSphere Cloud plugin. It was the Leap Second bug on the jenkins server that was causing the problem. I took a jstack of the jenkins process while the slave was stuck at launching and found that the thread was hung at : "pool-6-thread-8" daemon prio=10 tid=0x00007f75d8252800 nid=0x7292 sleeping [0x00007f768651c000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at com.vmware.vim25.mo.Task.waitForTask(Task.java:229) at com.vmware.vim25.mo.Task.waitForTask(Task.java:152) at org.jenkinsci.plugins.vSphereCloudLauncher.launch(vSphereCloudLauncher.java:169) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:200) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) After spending a lot of time peeking into the source code I realized that even a simple program to sleep for 1 second using Thread.currentThread().sleep(1000) was also getting stuck indefinitely. Fortunately, I found the reason at http://stackoverflow.com/questions/11294573/thread-sleep-never-returns and a simple reboot of the server solved the problem.

          Jason Swager added a comment -

          Problem was in Java, not in this plugin.

          Jason Swager added a comment - Problem was in Java, not in this plugin.

            Unassigned Unassigned
            mbelosto_12 Moshe Belostotsky
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: