Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50405

runATH leads to deadlock of resource consumption for core PR builds

XMLWordPrintable

    • Evergreen - Milestone 1

      This weekend we experienced a denial-of-service on ci.jenkins.io due to this resource contention caused by the runATH step in the core Jenkinsfile.

      Basically, an executor on the "linux" label was occupied while blocking and waiting for an executor on "docker&&highmem". When Jenkins couldn't provision "highmem" due to capacity issues, the runATH step blocks the "linux" executor indefinitely.

      At the bottom of the Jenkinsfile for core, is some code along these lines:

      node('linux') {
        /* some setup */
        runAth()
      }
      

      In runATH(), the first ensureInNode statement ensure that the Pipeline only uses on node, since the execution is already in a "linux" NODE_LABEL.

      When the second ensureInNode executes, it's attempting to ensure that the execution is in docker&&highmem, which it is of course not. This causes Pipeline to block waiting for this node, while occupying the outer "linux" node declaration.

      This is kind of a big problem and will cause additional resource contention whenever more than one or two core PRs are merged in quick succession.

            rarabaolaza Raul Arabaolaza
            rtyler R. Tyler Croy
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: