Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-72815

Jenkins jobs stuck on agents/workers

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Blocker Blocker
    • core

      I've observed that a few of our Jenkins jobs, especially those scheduled as cron jobs, are occasionally getting stuck. This happens more frequently since these jobs run more often but we are running into this issue at least once a week. Here is what the console output looks like when the issue occurs:

      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Declarative: Checkout SCM)
      [Pipeline] checkout
      The recommended git tool is: git

      or some jobs with

       

      The recommended git tool is: NONE

       

       

      Note: These jobs runs perfectly fine almost all the time.

       

      In our setup, we have a policy for dynamic workers to terminate after a certain number of runs (approximately 30-50). We're using

      • Jenkins: 2.263.4
      • OpenJDK 11 on both the master and worker nodes, all running on Ubuntu 20.04.

       

      Here's what I've noticed specifically with workers:

      1. When a new worker spins up, and the first job gets executed, if it hangs as described, the subsequent job (the second one) also gets stuck.
      2. Even when I attempt to terminate these stuck jobs, any further jobs assigned to that agent encounter similar issues. It's important to note that this problem doesn't seem exclusive to the git plugin; it also occurs with simple shell command jobs.
      3. A temporary fix seems to be manually disconnecting the agent and then reconnecting it.

      This leads me to suspect that something might be going wrong during the creation of the agent.

       I've looked for related bugs but haven't found anything that matches our issue.

       

      There was one it was closed https://issues.jenkins.io/browse/JENKINS-71759.

       

       

      Some additional info:

       

      Jenkins masters and workers are running on EC2. No containers.
      Jenkins is installed and configured using ansible.
      Agents are created dynamically using ec2 cloud plugin.
      Jenkins masters created and terminates multiple workers/agents in a day.

      Plugins used:

       

      amazon-ecr:1.6
      ansible:1.1
      ansicolor:1.0.0
      artifactory:3.10.6
      aws-device-farm:1.30
      aws-secrets-manager-credentials-provider:0.5.3
      blueocean:1.24.6
      build-blocker-plugin:1.7.7
      build-name-setter:2.1.0
      build-timeout:1.20
      build-user-vars-plugin:1.7
      command-launcher:1.6
      configuration-as-code:1.51
      configurationslicing:1.52
      convert-to-pipeline:1.0
      copyartifact:1.46
      cron_column:1.4
      datadog:2.11.0
      delivery-pipeline-plugin:1.4.2
      discard-old-build:1.05
      docker-build-publish:1.3.3
      ec2:1.58
      email-ext:2.82
      enhanced-old-build-discarder:1.4
      extended-choice-parameter:0.82
      extended-read-permission:3.2
      external-monitor-job:1.7
      ez-templates:1.3.4
      ghprb:1.42.2
      git-parameter:0.9.13
      google-login:1.6
      groovy:2.3
      hockeyapp:1.5.1
      http_request:1.9.0
      icon-shim:2.0.3
      jenkins-multijob-plugin:1.36
      job-dsl:1.77
      job-restrictions:0.8
      jobConfigHistory:2.27
      jobgenerator:1.22
      ldap:1.26
      multiple-scms:0.6
      notification:1.14
      pagerduty:0.7.0
      pam-auth:1.6
      parameterized-scheduler:1.0
      prometheus:2.0.10
      promoted-builds:3.9.1
      rake:1.8.0
      rebuild:1.32
      role-strategy:3.1.1
      saml:1.1.7
      shelve-project-plugin:3.1
      simple-theme-plugin:0.6
      slack:2.48
      slave-setup:1.10
      ssh-agent:1.22
      ssh-slaves:1.31.5
      ssh:2.6.1
      terraform:1.0.10
      trilead-api:1.0.13
      uno-choice:2.5.6
      windows-slaves:1.8
      working-hours:1.1
      ws-cleanup:0.39
      xml-job-to-job-dsl:0.1.13

       

            Unassigned Unassigned
            gopivalleru Gopi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: