Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-27471

jClouds deletes nodes that are running flyweight builds

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Component/s: jclouds-plugin
    • Labels:
    • Environment:
      Jenkins 1.599, jClouds-plugin 2.9-SNAPSHOT (private-3bd37c37-jenkins)
    • Similar Issues:

      Description

      When a matrix build begins, Jenkins creates a "flyweight" build that checks out the code and decides which (if any) of the project axes need to be rebuilt. The "flyweight" build starts the subordinate jobs and waits until they are complete before it finishes. However, Jenkins doesn't allocate an executor to the "flyweight" builds or consider them when reporting whether a node is "idle". Because JCloudsRetentionStrategy.check() only uses Computer.isIdle() to determine if the node is busy, it kills off nodes running "flyweight" builds.

      Our cloud is set to create instances with one executor each. In the scenario where a matrix build starts two subordinate builds, the "flyweight" will run alongside one build while the other goes to a second node. If the job on the second node takes more than the retention time to complete, the first node will be considered "idle" and jClouds-plugin will terminate it, which stops the "flyweight" build with an error, which causes the entire build to fail.

      "Flyweight" builds don't occupy executors, so the only way I've found to detect them one is by enumerating every job in the system, testing to see if it's building, then checking the name of the node it's building on. (If there's a better way, I would very much like to know about it!) Some Groovy code:
      for (node in Jenkins.getInstance().getComputers()) {
      num_jobs = 0;
      for (job in Jenkins.getInstance().getAllItems()) {
      if (job.isBuilding()) {
      if (job.getLastBuild().getBuiltOnStr() == node.getName())

      { num_jobs++; }

      }
      }

      if (num_jobs > 0)

      { println("Node " + node.getName() + " is running " + num_jobs + " - don't kill it!"); }

      }

      (I know this is really a bug in upstream Jenkins, but the developers there seem determined to treat "flyweight" builds as undetectable non-entities, so I think fixing jCloud-plugin's behavior will be a much easier fix.)

      Thanks for all your great work!

        Attachments

          Activity

          samsomething Sam Clippinger created issue -
          felfert Fritz Elfert made changes -
          Field Original Value New Value
          Assignee abayer [ abayer ] Fritz Elfert [ felfert ]
          rtyler R. Tyler Croy made changes -
          Workflow JNJira [ 161679 ] JNJira + In-Review [ 180791 ]
          felfert Fritz Elfert made changes -
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]

            People

            Assignee:
            felfert Fritz Elfert
            Reporter:
            samsomething Sam Clippinger
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: