Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53223

Finished pipeline jobs appear to occupy executor slots long after completion

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Incomplete
    • Icon: Minor Minor
    • core, pipeline

      We have been observing an issue where jobs that are completed occupy executor slots on our Jenkins slaves (AWS EC2 instances), and this seems to be causing a backup in our build queue that is usually managed by the EC2 cloud plugin spinning up/down nodes as needed. When this problem manifests, we usually see it correspond with the ec2 cloud plugin failing to autoscale new nodes and and a subsequent massive buildup in our build queue until we have to restart the master and kill all jobs to recover

      These "zombie executor slots" do clear themselves up after 5-60+ minutes pass it seems, and often they are downstream jobs of still-ongoing parent jobs, but not always (sometimes the parent jobs are also completed but the executor still remains occupied). CPU and memory don't seem too strained when this problem manifests. 
       
      The general job heirarchy goes where this manifests looks like {1 root job} -> {produces 1-6 child "target building" jobs in parallel} -> {each produces 5-80 "unit testing jobs" in parallel}. We usually see the issue manifest on this group of jobs (the only ones really running on this cluster) when it's under medium-high load, running 100+ jobs simultaneously across tens of nodes.
       
      I'm attaching a thread dump I downloaded from a slave exhibiting this behavior of having its executors occupied (all 4/4 of them) with jobs that are finished running. I'm actually attaching two dumps, the second taken a few minutes after the first on the same slave, because it seems like there is some activity happening with new threads spinning up, although I'm not sure what exactly their purpose is. I will try to generated and submit the zip from the core support plugin the next time I see the problem manifesting.

        1. after.tar.gz
          3 kB
          Basil Crow
        2. before.tar.gz
          3 kB
          Basil Crow
        3. build.xml
          21 kB
          Basil Crow
        4. zombie-executor-slots-threadDump.rtf
          39 kB
          Elliot Babchick
        5. zombie-executor-slots-threadDump-2-min-later.rtf
          17 kB
          Elliot Babchick

            dnusbaum Devin Nusbaum
            elliotb Elliot Babchick
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: