Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31098

New delete-node CLI command breaks node cleanup in cloud plugins

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • cli, core
    • Jenkins 1.625.1 LTS

      The new implementation of the delete-node command badly breaks cleanup of cloud-managed VMs (at least in jclouds-plugin and openstack-plugin)

      The cloud plugins extend AbstractCloudComputer and then override doDoDelete() in order to perform any cleanup (and delay the actual deleteion until this is done). The new delete-node command does not invoke this and thus when using this command, lots of unused VM's are kept around which get never cleaned up. In fact, there is no "hook" whatsoever anymore which allows a cloud plugin to intercept node deletion.

      If there's some other (new) functionality to intercept node-deletion please let me know.
      Unfortunately this made it into 1.625.1 LTS and I only noticed that change now. It is not even mentioned in the changelog.

          [JENKINS-31098] New delete-node CLI command breaks node cleanup in cloud plugins

          Fritz Elfert created issue -

          Fritz Elfert added a comment -

          Even if doDoDelete() is a Stapler call designed for UIs, it is currently the only possible place to intercept node-deletion (that I know of).
          I have not looked into other cloud plugins right now but at least JClouds as well as OpenStack need something like this in order to
          shutdown and/or delete unused VMs.

          Fritz Elfert added a comment - Even if doDoDelete() is a Stapler call designed for UIs, it is currently the only possible place to intercept node-deletion (that I know of). I have not looked into other cloud plugins right now but at least JClouds as well as OpenStack need something like this in order to shutdown and/or delete unused VMs.

          Oliver Gondža added a comment - - edited

          As far as I can tell, only CLI is affected as it does no loner delegates to the UI method. Deleting from UI or the JClouds cleanup thread should still work.

          While I agree with Oleg that this is rather hackish, I see no other way to politely ask computer-slave pair to terminate when ready. I propose to document this purpose of Computer#doDoDelete() and fix the delete-node implementation to call that method again.

          Oliver Gondža added a comment - - edited As far as I can tell, only CLI is affected as it does no loner delegates to the UI method. Deleting from UI or the JClouds cleanup thread should still work. While I agree with Oleg that this is rather hackish, I see no other way to politely ask computer-slave pair to terminate when ready. I propose to document this purpose of Computer#doDoDelete() and fix the delete-node implementation to call that method again.

          felfert if I am not mistaken, the delayed machine termination is in place not to block the thread while sending remote request to cloud service. In theory, we can try not to rely on slave being offline but remove it immediately and keep the reference to the cloud machine elsewhere for the collector thread. I lean towards fixing this in core as there might be different plugins affected.

          Oliver Gondža added a comment - felfert if I am not mistaken, the delayed machine termination is in place not to block the thread while sending remote request to cloud service. In theory, we can try not to rely on slave being offline but remove it immediately and keep the reference to the cloud machine elsewhere for the collector thread. I lean towards fixing this in core as there might be different plugins affected.

          Fritz Elfert added a comment -

          Yes, clicking the Delete button still works of course. However we almost never do that manually What we do here (In jclouds, I implemented a CLI cmd for launching new nodes) from inside a job:

          jclouds-provision SomeTemplateName (blocks until ready and returns IP-Address and Nodename)
          invoke parameterized integration test job
          delete-node TheNodeNameFromAbove <<< At this point we now end up with an unused VM since 1.625.1

          Fritz Elfert added a comment - Yes, clicking the Delete button still works of course. However we almost never do that manually What we do here (In jclouds, I implemented a CLI cmd for launching new nodes) from inside a job: jclouds-provision SomeTemplateName (blocks until ready and returns IP-Address and Nodename) invoke parameterized integration test job delete-node TheNodeNameFromAbove <<< At this point we now end up with an unused VM since 1.625.1
          Pavel Janoušek made changes -
          Assignee New: Pavel Janoušek [ pajasoft ]

          Hello, as the author of that change I can fix the behavior and delegate deletion of computer back to Computer.doDoDelete(). It should solve your issue.

          Working on it...

          Pavel Janoušek added a comment - Hello, as the author of that change I can fix the behavior and delegate deletion of computer back to Computer.doDoDelete(). It should solve your issue. Working on it...

          Fritz Elfert added a comment -

          olivergondza Nope, jclouds itself does not block during deletion - Well not for the duration of the deletion, only for the time it takes to send the POST.
          I always assumed that the delay ist due to the Channel object needing time to shutdown. About having a separate reference: I thought about that before, but there is one big disadvantage. Since the node is not persisted anymore at that point, we might still end up with stray VMs if jenkins is restarted during that time.

          Fritz Elfert added a comment - olivergondza Nope, jclouds itself does not block during deletion - Well not for the duration of the deletion, only for the time it takes to send the POST. I always assumed that the delay ist due to the Channel object needing time to shutdown. About having a separate reference: I thought about that before, but there is one big disadvantage. Since the node is not persisted anymore at that point, we might still end up with stray VMs if jenkins is restarted during that time.

          PR sent.

          Pavel Janoušek added a comment - PR sent.

          Fritz Elfert added a comment -

          Thanks pajasoft, I will have a look this evening.

          Fritz Elfert added a comment - Thanks pajasoft , I will have a look this evening.

            pajasoft Pavel Janoušek
            felfert Fritz Elfert
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: