Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-16889

No cleanup SSH connections when slave action occurs

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • None
    • Debian 6.0.6 x86_64, Jenkins ver. 1.502, Jenkins Libvirt Slaves plugin 1.7

      I noticed some days ago that connections to libvirt from java binding
      with SSH URL leaves an SSH shell opened even after a
      connection is closed.

      The version of libvirt-java could be the problem. I think they fixed this behavour after the 0.4.2 version (src: http://libvirt.org/git/?p=libvirt-java.git;a=shortlog).

      Thank you in advance.

      Regards,

          [JENKINS-16889] No cleanup SSH connections when slave action occurs

          Dan Doyle added a comment -

          +1 on this. I was very excited to see 1.7 pop up after so long because it fixed a number of things, but we had to revert to 1.6 because of the not closing connections issue. With lots of start/stop on temporary slaves this winds up happening:

          /var/log/libvirt/libvirtd.log
          2013-02-24 03:59:54.919+0000: 32024: error : virNetServerDispatchNewClient:246 : Too many active clients (40), dropping connection from 127.0.0.1;0

          Restarting Jenkins or libvirt solves the issue temporarily. Reverting back to 1.6 also solves the issue, so most likely a regression in 1.7. So far as we're concerned this is a show stopper bug as it renders Jenkins unusable.

          This is on a RHEL6 x86_64, Jenkins version 1.502 (though also tested with 1.478).

          Thanks for picking up this plugin!

          Dan Doyle added a comment - +1 on this. I was very excited to see 1.7 pop up after so long because it fixed a number of things, but we had to revert to 1.6 because of the not closing connections issue. With lots of start/stop on temporary slaves this winds up happening: /var/log/libvirt/libvirtd.log 2013-02-24 03:59:54.919+0000: 32024: error : virNetServerDispatchNewClient:246 : Too many active clients (40), dropping connection from 127.0.0.1;0 Restarting Jenkins or libvirt solves the issue temporarily. Reverting back to 1.6 also solves the issue, so most likely a regression in 1.7. So far as we're concerned this is a show stopper bug as it renders Jenkins unusable. This is on a RHEL6 x86_64, Jenkins version 1.502 (though also tested with 1.478). Thanks for picking up this plugin!

          Dennis Ditte added a comment -

          Same here,

          CentOS 2.6.32-279.19.1.el6.x86_64, Jenkins Version 1.499

          Hope that this would be fixed soon...

          Dennis Ditte added a comment - Same here, CentOS 2.6.32-279.19.1.el6.x86_64, Jenkins Version 1.499 Hope that this would be fixed soon...

          Thanks for the report. You can expect a release 1.8 until the end of march.

          Philipp Bartsch added a comment - Thanks for the report. You can expect a release 1.8 until the end of march.

          Code changed in jenkins
          User: tastybug
          Path:
          src/main/java/hudson/plugins/libvirt/Hypervisor.java
          src/main/java/hudson/plugins/libvirt/VirtualMachineLauncher.java
          http://jenkins-ci.org/commit/libvirt-slave-plugin/eb73184a9bcbd8a398db153afad0ce49c0a3c661
          Log:
          Fixing JENKINS-16889.
          Discarding the libvirt connection after each getDomains use was too
          naive, from now on there will a single shared connection for all VMs.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: tastybug Path: src/main/java/hudson/plugins/libvirt/Hypervisor.java src/main/java/hudson/plugins/libvirt/VirtualMachineLauncher.java http://jenkins-ci.org/commit/libvirt-slave-plugin/eb73184a9bcbd8a398db153afad0ce49c0a3c661 Log: Fixing JENKINS-16889 . Discarding the libvirt connection after each getDomains use was too naive, from now on there will a single shared connection for all VMs.

          Resolved in 1.8

          Philipp Bartsch added a comment - Resolved in 1.8

          G. Kr. added a comment -

          i had a look at the code and have a few questions:

          • shouldn't the connection be closed at some point? e.g. after it has not been in use for a few minutes/hours?
            probably latest when jenkins is shutdown? the Connect object has a finalize method that closes the connection but the Domain objects retrieved by the public getDomains() method contain a reference to that object, so it is not guaranteed that the garbage collector actually calls finalize(), is it?
          • in case of long running connections wouldn't it be necessary to check whether the Connect still isConnected()

          thanks

          G. Kr. added a comment - i had a look at the code and have a few questions: shouldn't the connection be closed at some point? e.g. after it has not been in use for a few minutes/hours? probably latest when jenkins is shutdown? the Connect object has a finalize method that closes the connection but the Domain objects retrieved by the public getDomains() method contain a reference to that object, so it is not guaranteed that the garbage collector actually calls finalize(), is it? in case of long running connections wouldn't it be necessary to check whether the Connect still isConnected() thanks

          1) The domain instances them self are not referenced by any other objects. When the gc collects those domain objects, the associated connection instance is going to be finalized as well and thus closed.
          Nonethelss I don't want to deal with manual disconnects right now as calling disconnect doesn't close the underlying SSH connection immediately. If I open and close too many connections within a short period of time, I once again might end up with 20 dead, not-yet-closed SSH connections which would prevent the creation of further sessions.
          For the moment I'll stick to one shared connection for the duration of the Jenkins lifecycle as that is simply the safest approach.

          2) Good point regarding the "stale" connections. Considering that libvirtd might crash or get restarted while Jenkins is running, I have to make sure that the connection is in fact working.

          I'm releasing a 1.8.1 to address this issue for the sake of reliability. Thanks for your input, much appreciated.

          Philipp Bartsch added a comment - 1) The domain instances them self are not referenced by any other objects. When the gc collects those domain objects, the associated connection instance is going to be finalized as well and thus closed. Nonethelss I don't want to deal with manual disconnects right now as calling disconnect doesn't close the underlying SSH connection immediately. If I open and close too many connections within a short period of time, I once again might end up with 20 dead, not-yet-closed SSH connections which would prevent the creation of further sessions. For the moment I'll stick to one shared connection for the duration of the Jenkins lifecycle as that is simply the safest approach. 2) Good point regarding the "stale" connections. Considering that libvirtd might crash or get restarted while Jenkins is running, I have to make sure that the connection is in fact working. I'm releasing a 1.8.1 to address this issue for the sake of reliability. Thanks for your input, much appreciated.

          G. Kr. added a comment -

          Thanks for clarification.

          I think having one shared connection is the best solution. Only the assumed "never closing" got me confused. That the SSH connection is not shutdown before close() returns sounds imho like a bug in upstream.

          G. Kr. added a comment - Thanks for clarification. I think having one shared connection is the best solution. Only the assumed "never closing" got me confused. That the SSH connection is not shutdown before close() returns sounds imho like a bug in upstream.

          G. Kr. added a comment -

          closing as fixed

          G. Kr. added a comment - closing as fixed

            tastybug Philipp Bartsch
            sox Florent Poinsaut
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: