Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-65994

ssh connections to provisioned nodes remain forever after node was destroyed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • jclouds-plugin
    • None
    • podman container w/ centos 8 running on fedora linux host

      I use the jClouds plugin to create workers on digitalocean using the digitalocean2 plugin; these are spun up, jobs are run and then they are later torn down.  Other than the issue I reported at JENKINS-63731, this works fine.

       

      however looking in my netstat I have learned that none of the SSH connections to these nodes are ever terminated; even though the actual host that was connected was long ago destroyed (e.g., weeks ago).    This suggests when the jclouds plugin tears down a node it isn't doing anything to close out the ssh agent connectivity.  There is nothing unusual in the jenkins logs, normal activity such as:

       

      2021-06-27 13:46:53.280+0000 [id=37] INFO j.p.jclouds.compute.JCloudsSlave#_terminate: Terminating node: basic-f0d
      2021-06-27 13:46:53.281+0000 [id=37] INFO o.jclouds.logging.jdk.JDKLogger#logInfo: >> destroying node(252394927)
      2021-06-27 13:47:06.851+0000 [id=37] INFO o.jclouds.logging.jdk.JDKLogger#logInfo: << destroyed node(252394927) success(true)
      2021-06-27 13:50:52.967+0000 [id=31] INFO j.p.j.c.JCloudsRetentionStrategy#check: Retention time of 30 min for basic-c3c has expired.
      2021-06-27 13:50:52.967+0000 [id=31] INFO j.p.j.c.JCloudsRetentionStrategy#fastTerminate: Setting basic-c3c to be deleted.
      2021-06-27 13:50:52.969+0000 [id=31] INFO j.p.j.compute.JCloudsComputer#deleteSlave: Deleting agent: basic-c3c
      2021-06-27 13:50:53.444+0000 [id=31] INFO j.p.jclouds.compute.JCloudsSlave#_terminate: Terminating node: basic-c3c
      2021-06-27 13:50:53.444+0000 [id=31] INFO o.jclouds.logging.jdk.JDKLogger#logInfo: >> destroying node(252394935)
      2021-06-27 13:51:07.100+0000 [id=31] INFO o.jclouds.logging.jdk.JDKLogger#logInfo: << destroyed node(252394935) success(true)
      2021-06-27 13:52:52.966+0000 [id=25] INFO j.p.j.c.JCloudsRetentionStrategy#check: Retention time of 30 min for basic-6e5 has expired.
      2021-06-27 13:52:52.967+0000 [id=25] INFO j.p.j.c.JCloudsRetentionStrategy#fastTerminate: Setting basic-6e5 to be deleted.
      2021-06-27 13:52:52.968+0000 [id=25] INFO j.p.j.compute.JCloudsComputer#deleteSlave: Deleting agent: basic-6e5
      2021-06-27 13:52:53.587+0000 [id=25] INFO j.p.jclouds.compute.JCloudsSlave#_terminate: Terminating node: basic-6e5
      2021-06-27 13:52:53.587+0000 [id=25] INFO o.jclouds.logging.jdk.JDKLogger#logInfo: >> destroying node(252394929)
      2021-06-27 13:53:07.239+0000 [id=25] INFO o.jclouds.logging.jdk.JDKLogger#logInfo: << destroyed node(252394929) success(true)

       

      But look in netstat on the host (I am running the containers in net=host) and there's lots of ESTABLISHED to ip numbers of machines that no longer exist (note that I have manually redacted my source IP number for the purposes of this issue):

       

      1. netstat -ntp | grep 22 | grep EST
        tcp 0 0 xx.xx.xxx.xx (redacted):60428 206.81.15.71:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):39796 157.230.218.91:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):41648 147.182.133.168:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):37864 206.189.206.49:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):36924 167.172.143.110:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):37826 157.230.214.15:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):53174 167.99.151.209:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):51830 206.189.196.106:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):34982 147.182.141.25:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):49008 167.99.13.112:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):40750 167.99.236.147:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):38416 147.182.141.12:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):39472 167.99.224.161:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):40052 157.230.214.45:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):36068 157.230.218.75:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):36556 206.189.206.128:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):34280 157.230.210.233:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):44838 147.182.141.58:22 ESTABLISHED 49608/java
        tcp 0 0 xx.xx.xxx.xx (redacted):40504 167.99.232.244:22 ESTABLISHED 49608/java

       

      Above, all those IP numbers are from digitalocean hosts that were provisioned and torn down (fully destroyed).    the pid is that of the jenkins server.  If I look in my digitalocean dashboard, i have no droplets up at all, and jenkins display also shows no cloud nodes exist.

      those are tcp4 connections, the issue also happens if they are tcp6, as I was hoping changing that via "-Djava.net.preferIPv4Stack=true" might help things but no luck on that.

       

      The immediate impact of this issue is that over a period of weeks, there are many hundreds of these ESTABLISHED lines and my machine begins to lose connectivity from running out of available ports.

       

      The ESTABLISHED connections go away once the jenkins server is restarted.

       

       

            felfert Fritz Elfert
            zzzeek mike bayer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: