Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31098

New delete-node CLI command breaks node cleanup in cloud plugins

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • cli, core
    • Jenkins 1.625.1 LTS

      The new implementation of the delete-node command badly breaks cleanup of cloud-managed VMs (at least in jclouds-plugin and openstack-plugin)

      The cloud plugins extend AbstractCloudComputer and then override doDoDelete() in order to perform any cleanup (and delay the actual deleteion until this is done). The new delete-node command does not invoke this and thus when using this command, lots of unused VM's are kept around which get never cleaned up. In fact, there is no "hook" whatsoever anymore which allows a cloud plugin to intercept node deletion.

      If there's some other (new) functionality to intercept node-deletion please let me know.
      Unfortunately this made it into 1.625.1 LTS and I only noticed that change now. It is not even mentioned in the changelog.

          [JENKINS-31098] New delete-node CLI command breaks node cleanup in cloud plugins

          Fritz Elfert added a comment -

          Even if doDoDelete() is a Stapler call designed for UIs, it is currently the only possible place to intercept node-deletion (that I know of).
          I have not looked into other cloud plugins right now but at least JClouds as well as OpenStack need something like this in order to
          shutdown and/or delete unused VMs.

          Fritz Elfert added a comment - Even if doDoDelete() is a Stapler call designed for UIs, it is currently the only possible place to intercept node-deletion (that I know of). I have not looked into other cloud plugins right now but at least JClouds as well as OpenStack need something like this in order to shutdown and/or delete unused VMs.

          Oliver Gondža added a comment - - edited

          As far as I can tell, only CLI is affected as it does no loner delegates to the UI method. Deleting from UI or the JClouds cleanup thread should still work.

          While I agree with Oleg that this is rather hackish, I see no other way to politely ask computer-slave pair to terminate when ready. I propose to document this purpose of Computer#doDoDelete() and fix the delete-node implementation to call that method again.

          Oliver Gondža added a comment - - edited As far as I can tell, only CLI is affected as it does no loner delegates to the UI method. Deleting from UI or the JClouds cleanup thread should still work. While I agree with Oleg that this is rather hackish, I see no other way to politely ask computer-slave pair to terminate when ready. I propose to document this purpose of Computer#doDoDelete() and fix the delete-node implementation to call that method again.

          felfert if I am not mistaken, the delayed machine termination is in place not to block the thread while sending remote request to cloud service. In theory, we can try not to rely on slave being offline but remove it immediately and keep the reference to the cloud machine elsewhere for the collector thread. I lean towards fixing this in core as there might be different plugins affected.

          Oliver Gondža added a comment - felfert if I am not mistaken, the delayed machine termination is in place not to block the thread while sending remote request to cloud service. In theory, we can try not to rely on slave being offline but remove it immediately and keep the reference to the cloud machine elsewhere for the collector thread. I lean towards fixing this in core as there might be different plugins affected.

          Fritz Elfert added a comment -

          Yes, clicking the Delete button still works of course. However we almost never do that manually What we do here (In jclouds, I implemented a CLI cmd for launching new nodes) from inside a job:

          jclouds-provision SomeTemplateName (blocks until ready and returns IP-Address and Nodename)
          invoke parameterized integration test job
          delete-node TheNodeNameFromAbove <<< At this point we now end up with an unused VM since 1.625.1

          Fritz Elfert added a comment - Yes, clicking the Delete button still works of course. However we almost never do that manually What we do here (In jclouds, I implemented a CLI cmd for launching new nodes) from inside a job: jclouds-provision SomeTemplateName (blocks until ready and returns IP-Address and Nodename) invoke parameterized integration test job delete-node TheNodeNameFromAbove <<< At this point we now end up with an unused VM since 1.625.1

          Hello, as the author of that change I can fix the behavior and delegate deletion of computer back to Computer.doDoDelete(). It should solve your issue.

          Working on it...

          Pavel Janoušek added a comment - Hello, as the author of that change I can fix the behavior and delegate deletion of computer back to Computer.doDoDelete(). It should solve your issue. Working on it...

          Fritz Elfert added a comment -

          olivergondza Nope, jclouds itself does not block during deletion - Well not for the duration of the deletion, only for the time it takes to send the POST.
          I always assumed that the delay ist due to the Channel object needing time to shutdown. About having a separate reference: I thought about that before, but there is one big disadvantage. Since the node is not persisted anymore at that point, we might still end up with stray VMs if jenkins is restarted during that time.

          Fritz Elfert added a comment - olivergondza Nope, jclouds itself does not block during deletion - Well not for the duration of the deletion, only for the time it takes to send the POST. I always assumed that the delay ist due to the Channel object needing time to shutdown. About having a separate reference: I thought about that before, but there is one big disadvantage. Since the node is not persisted anymore at that point, we might still end up with stray VMs if jenkins is restarted during that time.

          PR sent.

          Pavel Janoušek added a comment - PR sent.

          Fritz Elfert added a comment -

          Thanks pajasoft, I will have a look this evening.

          Fritz Elfert added a comment - Thanks pajasoft , I will have a look this evening.

          Fritz Elfert added a comment -

          lgtm - works perfectly

          Fritz Elfert added a comment - lgtm - works perfectly

          Code changed in jenkins
          User: Ing. Pavel Janousek
          Path:
          core/src/main/java/hudson/cli/DeleteNodeCommand.java
          test/src/test/java/hudson/cli/DeleteNodeCommandTest.java
          http://jenkins-ci.org/commit/jenkins/04a874ac9572245314208fdeea485383d621fcf6
          Log:
          JENKINS-31098 Deletage remove a node back to Computer.doDoDelete()

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Ing. Pavel Janousek Path: core/src/main/java/hudson/cli/DeleteNodeCommand.java test/src/test/java/hudson/cli/DeleteNodeCommandTest.java http://jenkins-ci.org/commit/jenkins/04a874ac9572245314208fdeea485383d621fcf6 Log: JENKINS-31098 Deletage remove a node back to Computer.doDoDelete()

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/src/main/java/hudson/cli/DeleteNodeCommand.java
          test/src/test/java/hudson/cli/DeleteNodeCommandTest.java
          http://jenkins-ci.org/commit/jenkins/0a56c1d5078bc6f36a305ee25538eeff1a0e2442
          Log:
          Merge pull request #1882 from pjanouse/JENKINS-31098

          JENKINS-31098 Deletage remove a node back to Computer.doDoDelete()

          Compare: https://github.com/jenkinsci/jenkins/compare/be8c57460ce2...0a56c1d5078b

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/src/main/java/hudson/cli/DeleteNodeCommand.java test/src/test/java/hudson/cli/DeleteNodeCommandTest.java http://jenkins-ci.org/commit/jenkins/0a56c1d5078bc6f36a305ee25538eeff1a0e2442 Log: Merge pull request #1882 from pjanouse/ JENKINS-31098 JENKINS-31098 Deletage remove a node back to Computer.doDoDelete() Compare: https://github.com/jenkinsci/jenkins/compare/be8c57460ce2...0a56c1d5078b

          dogfood added a comment -

          Integrated in jenkins_main_trunk #4340
          JENKINS-31098 Deletage remove a node back to Computer.doDoDelete() (Revision 04a874ac9572245314208fdeea485383d621fcf6)

          Result = SUCCESS
          pjanouse : 04a874ac9572245314208fdeea485383d621fcf6
          Files :

          • core/src/main/java/hudson/cli/DeleteNodeCommand.java
          • test/src/test/java/hudson/cli/DeleteNodeCommandTest.java

          dogfood added a comment - Integrated in jenkins_main_trunk #4340 JENKINS-31098 Deletage remove a node back to Computer.doDoDelete() (Revision 04a874ac9572245314208fdeea485383d621fcf6) Result = SUCCESS pjanouse : 04a874ac9572245314208fdeea485383d621fcf6 Files : core/src/main/java/hudson/cli/DeleteNodeCommand.java test/src/test/java/hudson/cli/DeleteNodeCommandTest.java

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: changelog.html http://jenkins-ci.org/commit/jenkins/e3b3638f76107c020fa43d545890329d92e2bcdf Log: Noting fix of JENKINS-31098 in #1882 Compare: https://github.com/jenkinsci/jenkins/compare/fef832fb9ff5...e3b3638f7610

          Code changed in jenkins
          User: Ing. Pavel Janousek
          Path:
          core/src/main/java/hudson/cli/DeleteNodeCommand.java
          test/src/test/java/hudson/cli/DeleteNodeCommandTest.java
          http://jenkins-ci.org/commit/jenkins/24785d545971e10ece89b29fa2dfb50c29c85fd9
          Log:
          JENKINS-31098 Deletage remove a node back to Computer.doDoDelete()

          (cherry picked from commit 04a874ac9572245314208fdeea485383d621fcf6)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Ing. Pavel Janousek Path: core/src/main/java/hudson/cli/DeleteNodeCommand.java test/src/test/java/hudson/cli/DeleteNodeCommandTest.java http://jenkins-ci.org/commit/jenkins/24785d545971e10ece89b29fa2dfb50c29c85fd9 Log: JENKINS-31098 Deletage remove a node back to Computer.doDoDelete() (cherry picked from commit 04a874ac9572245314208fdeea485383d621fcf6)

          dogfood added a comment -

          Integrated in jenkins_main_trunk #4373
          JENKINS-31098 Deletage remove a node back to Computer.doDoDelete() (Revision 24785d545971e10ece89b29fa2dfb50c29c85fd9)

          Result = SUCCESS
          ogondza : 24785d545971e10ece89b29fa2dfb50c29c85fd9
          Files :

          • test/src/test/java/hudson/cli/DeleteNodeCommandTest.java
          • core/src/main/java/hudson/cli/DeleteNodeCommand.java

          dogfood added a comment - Integrated in jenkins_main_trunk #4373 JENKINS-31098 Deletage remove a node back to Computer.doDoDelete() (Revision 24785d545971e10ece89b29fa2dfb50c29c85fd9) Result = SUCCESS ogondza : 24785d545971e10ece89b29fa2dfb50c29c85fd9 Files : test/src/test/java/hudson/cli/DeleteNodeCommandTest.java core/src/main/java/hudson/cli/DeleteNodeCommand.java

          dogfood added a comment -

          Integrated in jenkins_2.0 #5
          JENKINS-31098 Deletage remove a node back to Computer.doDoDelete() (Revision 24785d545971e10ece89b29fa2dfb50c29c85fd9)

          Result = SUCCESS
          ogondza : 24785d545971e10ece89b29fa2dfb50c29c85fd9
          Files :

          • core/src/main/java/hudson/cli/DeleteNodeCommand.java
          • test/src/test/java/hudson/cli/DeleteNodeCommandTest.java

          dogfood added a comment - Integrated in jenkins_2.0 #5 JENKINS-31098 Deletage remove a node back to Computer.doDoDelete() (Revision 24785d545971e10ece89b29fa2dfb50c29c85fd9) Result = SUCCESS ogondza : 24785d545971e10ece89b29fa2dfb50c29c85fd9 Files : core/src/main/java/hudson/cli/DeleteNodeCommand.java test/src/test/java/hudson/cli/DeleteNodeCommandTest.java

            pajasoft Pavel Janoušek
            felfert Fritz Elfert
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: