Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48799

JCloudsCleanupThread indiscriminately deletes *all* idle Floating IPs -- even those FIPS it didn't create

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • openstack-cloud-plugin
    • None
    • OpenStack Plugin 2.29
    • 2.53

      Hello,

      I recently ran into a hard to troubleshoot problem, the root cause I believe is caused by an overly aggressive JDCloudsCleanup thread in the OpenStack Plugin. I'm using 2.29, but this behavior has been on master since Add functionality to clean stranded OpenStack floating IPs - #84 was fixed in February 2017 with Fixed #84: Destroy leaked floating IPs. Our plugin isn't configured to use FIPS and I don't have direct access to the OpenStack logs, so it took a while for me and someone from our OpenStack support to figure out what user was triggering the FIP deletion – which much to my surprise finally pointed us back to the plugin as the smoking gun.

      At issue is the approach

      JCloudsCleanupThread

      is taking to remove idle Floating IPs in

      JCloudsCleanupThread::cleanOrphanedFips()

      here

      As best I can tell, this code indescriminately releases all idle FIPS it finds – even those the plugin didn't create. The relevant code is here:

      // JCloudsCleanupThread::cleanOrphanedFips
      // foreach cloud do the following
       List<String> leaked = new ArrayList<>(cloud.getOpenstack().getFreeFipIds());
       List<String> freed = new ArrayList<>(leaked);
       leaked.retainAll(cloudStillFips); // Free on 2 checks
       freed.removeAll(leaked); // Just freed
      
       synchronized (stillFips) {
           cloudStillFips.clear();
           cloudStillFips.addAll(freed);
       }
      
      for (String fip : leaked) {
          try {
              cloud.getOpenstack().destroyFip(fip);
          } catch (ClientResponseException ex) {
              // The tenant is probably reusing pre-allocated FIPs without permission to (de)allocate new.
              // https://github.com/jenkinsci/openstack-cloud-plugin/issues/66#issuecomment-207296059
              if (ex.getStatusCode() == StatusCode.FORBIDDEN) {
                  continue;
              }
              LOGGER.log(Level.WARNING, "Unable to release leaked floating IP", ex);
          } catch (Exception ex) {
              LOGGER.log(Level.WARNING, "Unable to release leaked floating IP", ex);
          }
      }
      

      And

      // Openstack::getFreeFipIds
      public List<String> getFreeFipIds() {
          ArrayList<String> free = new ArrayList<>();
          for (NetFloatingIP ip : clientProvider.get().networking().floatingip().list()) {
              if (ip.getFixedIpAddress() == null) {
                  free.add(ip.getId());
              }
          }
          return free;
      }
      

      I'm not quite sure how the stillFips or freed lists are being used.

      However, it seems pretty clear we're iterating over our list of clouds, asking each cloud for its list of "free" FIPS, then treating all "free" FIPS as "leaked" and proceeding to destroy all "leaked" FIPS we found.

      I think this behavior is heavy handed. I would expect for the plugin to only delete those FIPS it created and leave other FIPS alone.

      For some context, our company maintains an internal OpenStack cloud that my team uses to host its Jenkins environment on. The Jenkins server is assigned a Floating IP (so that way it is routable from our desks) and the OpenStack plugin is configured to only spawn nodes within our private network (non-routable) segment. In our setup, the OpenStack Plugin never uses floating IPs for instances it spawns. This internal OpenStack cloud (and our build server) is not otherwise accessible from the internet, so in order to grant external developers access to our Jenkins, my team has to configure firewall rules specific to our Jenkins instance. This means we often allocate a persistent floating IP (routable internally) and configure the appropriate firewall rules to allow external users access to our internal Jenkins via the VPN.

      OpenStack does not automatically release allocated (but unassociated) FIPs – so we never lose our Jenkins master FIP and the firewall rules stay in sync. Furthermore, we have a persistent long running Heat stack that is responsible for creating various long running persistent objects (volumes, FIPS, etc). And the intent is that as long as this stack is running, those resources should not be reclaimed.

      However, the OpenStack plugin breaks this key assumption in our persistent Heat stack because it deletes any and all unassociated FIPS it finds.

      In practice, I can find ways to mitigate against this – we only have the one persistent FIP that I care about – and it's pretty easy to make sure we don't have some Jenkins test instance running (testing upgrades), which could delete the unassigned FIP behind my back. But in general, the act of deleting a resource created by some other stack seems a little broken.

      I think the proper way to fix Add functionality to clean stranded OpenStack floating IPs - #84 would be to keep track of which FIPS the OpenStack plugin created and only delete those.

      I assume this would be straightforward (famous last words), but I haven't tried hacking together a PR yet. Time permitting I'll try and throw something together.

      Thanks for all your hard work maintaining this (and contributing to many other) plugins! I see your name everywhere on all the plugins we use and your time and effort is greatly appreciated

      Kind Regards,
      Ryan

            olivergondza Oliver Gondža
            thorntonryan Ryan Thornton
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: